Text this: Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos