Text this: Self-Supervised Spatiotemporal Representation Learning for Skeleton-Based Human Action Recognition