Text this: Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions With Multi-Level Representations