Text this: Brain-inspired multimodal motion and fine-grained action recognition