Text this: Multi-Step Span Loss Prediction in Optical Networks Using Multi-Head Attention Transformers