Text this: Exploring Attention Sparsity to Accelerate Transformer Training on GPUs