Video-Based Facial Emotion Recognition using YOLO and Vision Transformer

This paper presents a video-based FER approach using a combination of the YOLOv8 model for face detection and a pre-trained Vision Transformer (ViT) for emotion classification. Our methodology involves extracting the middle frame from the RAVDESS dataset, which is then used for face detection using...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sareen Vidhi, Seeja K.R.
Format:	Article
Language:	English
Published:	EDP Sciences 2025-01-01
Series:	EPJ Web of Conferences
Subjects:	facial emotion recognition (fer) yolo vision transformer (vit)
Online Access:	https://www.epj-conferences.org/articles/epjconf/pdf/2025/13/epjconf_icetsf2025_01040.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850211495454965760
author	Sareen Vidhi Seeja K.R.
author_facet	Sareen Vidhi Seeja K.R.
author_sort	Sareen Vidhi
collection	DOAJ
description	This paper presents a video-based FER approach using a combination of the YOLOv8 model for face detection and a pre-trained Vision Transformer (ViT) for emotion classification. Our methodology involves extracting the middle frame from the RAVDESS dataset, which is then used for face detection using the YOLOv8 algorithm. The detected facial region is then processed through the Vit model to classify emotions into seven categories like Neutral, Happy, Sad, Angry, Fearful, Disgust, and Surprised. To enhance model robustness and generalization, data augmentation techniques such as horizontal flipping, brightness adjustment, and Gaussian noise injection were applied during preprocessing. The combination of precise face localization by YOLOv8 and powerful feature extraction of ViT contributes to the system’s performance. The proposed FER framework achieved an accuracy of 90.81%, surpassing several existing state-of-the-art FER systems. This research shows the strength of combining advanced face detection with transformer-based architecture for accurate emotion recognition from facial expressions in videos.
format	Article
id	doaj-art-d5b347bf6b254e26af077eedc91f3b5a
institution	OA Journals
issn	2100-014X
language	English
publishDate	2025-01-01
publisher	EDP Sciences
record_format	Article
series	EPJ Web of Conferences
spelling	doaj-art-d5b347bf6b254e26af077eedc91f3b5a2025-08-20T02:09:33ZengEDP SciencesEPJ Web of Conferences2100-014X2025-01-013280104010.1051/epjconf/202532801040epjconf_icetsf2025_01040Video-Based Facial Emotion Recognition using YOLO and Vision TransformerSareen VidhiSeeja K.R.0Indira Gandhi Delhi Technical University for WomenThis paper presents a video-based FER approach using a combination of the YOLOv8 model for face detection and a pre-trained Vision Transformer (ViT) for emotion classification. Our methodology involves extracting the middle frame from the RAVDESS dataset, which is then used for face detection using the YOLOv8 algorithm. The detected facial region is then processed through the Vit model to classify emotions into seven categories like Neutral, Happy, Sad, Angry, Fearful, Disgust, and Surprised. To enhance model robustness and generalization, data augmentation techniques such as horizontal flipping, brightness adjustment, and Gaussian noise injection were applied during preprocessing. The combination of precise face localization by YOLOv8 and powerful feature extraction of ViT contributes to the system’s performance. The proposed FER framework achieved an accuracy of 90.81%, surpassing several existing state-of-the-art FER systems. This research shows the strength of combining advanced face detection with transformer-based architecture for accurate emotion recognition from facial expressions in videos.https://www.epj-conferences.org/articles/epjconf/pdf/2025/13/epjconf_icetsf2025_01040.pdffacial emotion recognition (fer)yolovision transformer (vit)
spellingShingle	Sareen Vidhi Seeja K.R. Video-Based Facial Emotion Recognition using YOLO and Vision Transformer EPJ Web of Conferences facial emotion recognition (fer) yolo vision transformer (vit)
title	Video-Based Facial Emotion Recognition using YOLO and Vision Transformer
title_full	Video-Based Facial Emotion Recognition using YOLO and Vision Transformer
title_fullStr	Video-Based Facial Emotion Recognition using YOLO and Vision Transformer
title_full_unstemmed	Video-Based Facial Emotion Recognition using YOLO and Vision Transformer
title_short	Video-Based Facial Emotion Recognition using YOLO and Vision Transformer
title_sort	video based facial emotion recognition using yolo and vision transformer
topic	facial emotion recognition (fer) yolo vision transformer (vit)
url	https://www.epj-conferences.org/articles/epjconf/pdf/2025/13/epjconf_icetsf2025_01040.pdf
work_keys_str_mv	AT sareenvidhi videobasedfacialemotionrecognitionusingyoloandvisiontransformer AT seejakr videobasedfacialemotionrecognitionusingyoloandvisiontransformer

Video-Based Facial Emotion Recognition using YOLO and Vision Transformer

Similar Items