Text this: Bridging the gap: multi-granularity representation learning for text-based vehicle retrieval