A New Lightweight Hybrid Model for Pistachio Classification Using Transformers and EfficientNet

In recent years, Vision Transformers (ViTs) have gained prominence as a highly effective method for image classification, often outperforming traditional Convolutional Neural Networks (CNNs). However, their relatively slow processing speed limits their practical use, particularly in real-time applic...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammet Cakmak
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10990223/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, Vision Transformers (ViTs) have gained prominence as a highly effective method for image classification, often outperforming traditional Convolutional Neural Networks (CNNs). However, their relatively slow processing speed limits their practical use, particularly in real-time applications. Conversely, CNN-based transfer learning models provide faster inference but may struggle with classification accuracy on complex datasets. To address these challenges, Temporal Coordinate Attention (TCA) modules have been introduced to optimize efficiency and performance. This study proposes a hybrid architecture combining EfficientNet, Vision Transformer, and Temporal Channel Attention modules that integrates the accuracy of ViTs, the computational efficiency of CNNs, and the enhancement capabilities of TCA modules. The model is designed to classify Siirt and Kirmizi pistachio varieties with high precision. It achieves outstanding results, including 99.07% accuracy, 99.12% recall, and a Cohen’s Kappa score of 98.10%. These findings highlight the model’s robustness, demonstrating its ability to perform reliable classifications with minimal bias, making it well-suited for real-world applications.
ISSN:2169-3536