Text this: A Large-Scale Spatio-Temporal Multimodal Fusion Framework for Traffic Prediction