Text this: Video-Based Facial Emotion Recognition using YOLO and Vision Transformer