Text this: Augmenting Multimodal Content Representation with Transformers for Misinformation Detection