Text this: An adaptive feature fusion strategy using dual-layer attention and multi-modal deep reinforcement learning for all-media similarity search