Identifying Cocoa Flower Visitors: A Deep Learning Dataset

Abstract Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower vi...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenxiu Xu, Saba Ghorbani Barzegar, Dong Sheng, Manuel Toledo-Hernández, ZhenZhong Lan, Thomas Cherico Wanger
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05631-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849767029139046400
author Wenxiu Xu
Saba Ghorbani Barzegar
Dong Sheng
Manuel Toledo-Hernández
ZhenZhong Lan
Thomas Cherico Wanger
author_facet Wenxiu Xu
Saba Ghorbani Barzegar
Dong Sheng
Manuel Toledo-Hernández
ZhenZhong Lan
Thomas Cherico Wanger
author_sort Wenxiu Xu
collection DOAJ
description Abstract Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower visitor dataset containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and Encyrtidae, and 1,082 background cocoa flower images. This dataset was curated from 23 million images collected over two years by embedded cameras in cocoa plantations in Hainan province, China. We exemplify the use of the dataset with different sizes of YOLOv8 models and by progressively increasing the background image ratio in the training set to identify the best-performing model. The medium-sized YOLOv8 model achieved the best results with 8% background images (F1 Score of 0.71, mAP50 of 0.70). Overall, this dataset is useful to compare the performance of deep learning model architectures on images with low contrast images and difficult detection targets. The data can support future efforts to advance sustainable cocoa production through pollination monitoring projects.
format Article
id doaj-art-e6b2d218429540a6afe056ba843ff7ee
institution DOAJ
issn 2052-4463
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-e6b2d218429540a6afe056ba843ff7ee2025-08-20T03:04:22ZengNature PortfolioScientific Data2052-44632025-07-0112111010.1038/s41597-025-05631-3Identifying Cocoa Flower Visitors: A Deep Learning DatasetWenxiu Xu0Saba Ghorbani Barzegar1Dong Sheng2Manuel Toledo-Hernández3ZhenZhong Lan4Thomas Cherico Wanger5College of Environmental and Resource Sciences, Zhejiang UniversitySustainable Agricultural Systems & Engineering Laboratory, School of Engineering, Westlake UniversityCollege of Environmental and Resource Sciences, Zhejiang UniversitySustainable Agricultural Systems & Engineering Laboratory, School of Engineering, Westlake UniversitySchool of Engineering, Westlake UniversitySustainable Agricultural Systems & Engineering Laboratory, School of Engineering, Westlake UniversityAbstract Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower visitor dataset containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and Encyrtidae, and 1,082 background cocoa flower images. This dataset was curated from 23 million images collected over two years by embedded cameras in cocoa plantations in Hainan province, China. We exemplify the use of the dataset with different sizes of YOLOv8 models and by progressively increasing the background image ratio in the training set to identify the best-performing model. The medium-sized YOLOv8 model achieved the best results with 8% background images (F1 Score of 0.71, mAP50 of 0.70). Overall, this dataset is useful to compare the performance of deep learning model architectures on images with low contrast images and difficult detection targets. The data can support future efforts to advance sustainable cocoa production through pollination monitoring projects.https://doi.org/10.1038/s41597-025-05631-3
spellingShingle Wenxiu Xu
Saba Ghorbani Barzegar
Dong Sheng
Manuel Toledo-Hernández
ZhenZhong Lan
Thomas Cherico Wanger
Identifying Cocoa Flower Visitors: A Deep Learning Dataset
Scientific Data
title Identifying Cocoa Flower Visitors: A Deep Learning Dataset
title_full Identifying Cocoa Flower Visitors: A Deep Learning Dataset
title_fullStr Identifying Cocoa Flower Visitors: A Deep Learning Dataset
title_full_unstemmed Identifying Cocoa Flower Visitors: A Deep Learning Dataset
title_short Identifying Cocoa Flower Visitors: A Deep Learning Dataset
title_sort identifying cocoa flower visitors a deep learning dataset
url https://doi.org/10.1038/s41597-025-05631-3
work_keys_str_mv AT wenxiuxu identifyingcocoaflowervisitorsadeeplearningdataset
AT sabaghorbanibarzegar identifyingcocoaflowervisitorsadeeplearningdataset
AT dongsheng identifyingcocoaflowervisitorsadeeplearningdataset
AT manueltoledohernandez identifyingcocoaflowervisitorsadeeplearningdataset
AT zhenzhonglan identifyingcocoaflowervisitorsadeeplearningdataset
AT thomaschericowanger identifyingcocoaflowervisitorsadeeplearningdataset