Convolutional neural networks and vision transformers for Plankton Classification

In this paper, we present a study on plankton classification for automated underwater ecosystems monitoring. The study considers the creation of ensembles combining different Convolutional Neural Network (CNN) models and transformer architectures to understand whether different optimization algorith...

Full description

Saved in:
Bibliographic Details
Main Authors: Loris Nanni, Alessandra Lumini, Leonardo Barcellona, Stefano Ghidoni
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S157495412500281X
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present a study on plankton classification for automated underwater ecosystems monitoring. The study considers the creation of ensembles combining different Convolutional Neural Network (CNN) models and transformer architectures to understand whether different optimization algorithms can result in more robust and efficient classification across various plankton datasets. Tests involved different variants of the Adam optimizer and multiple learning rate variation strategies applied to several CNN architectures, building an ensemble of classifiers. Such ensembles were tested together with transformer-based models in a detailed comparative analysis considering feature extraction efficiency, computational cost, and robustness to species imbalances. The study highlights the performance of individual nets and ensembles on multiple plankton datasets, and discusses the potential for generalizing this approach to broader aquatic ecosystems. Experiments demonstrate that combining diverse neural network models in a heterogeneous ensemble significantly improves performance with respect to other state-of-the-art approaches across all the problems considered. Final results show that the ensemble-based approach achieves a remarkable accuracy improvement over individual CNN models and over standalone Vision Transformers.
ISSN:1574-9541