A Study on Deep Learning Performances of Identifying Images’ Emotion: Comparing Performances of Three Algorithms to Analyze Fashion Items

Emotion recognition using AI has garnered significant attention in recent years, particularly in areas such as fashion, where understanding consumer sentiment can drive more personalized and effective marketing strategies. This study aims to propose an AI model that automatically analyzes the emotio...

Full description

Saved in:
Bibliographic Details
Main Authors: Gaeun Lee, Seoyun Yi, Jongtae Lee
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/6/3318
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Emotion recognition using AI has garnered significant attention in recent years, particularly in areas such as fashion, where understanding consumer sentiment can drive more personalized and effective marketing strategies. This study aims to propose an AI model that automatically analyzes the emotional emotions of fashion images and compares the performance of CNN, ViT, and ResNet to determine the most suitable model. The experimental results showed that the vision transformer (ViT) model outperformed both ResNet50 and CNN models. This is due to the fact that transformer-based models, like ViT, offer greater scalability compared to CNN-based models. Specifically, ViT utilizes the transformer structure directly, which requires fewer computational resources during transfer learning compared to CNNs. This study illustrates that vision transformer (ViT) demonstrates higher performances with fewer computational resources than CNN during transfer learning. For academic and practical implications, the strong performance of ViT demonstrates the scalability and efficiency of transformer structures, indicating the need for further research applying transformer-based models to diverse datasets and environments.
ISSN:2076-3417