Text this: Dual-Layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features