Text this: Multimodal Raga Classification from Vocal Performances with Disentanglement and Contrastive Loss