Semi-Automated Training of AI Vision Models

The adoption of AI vision models in specialized industries is often hindered by the substantial requirement for extensive, manually annotated image datasets. Even when employing transfer learning, robust model development typically necessitates tens of thousands of such images, a process that is tim...

Full description

Saved in:
Bibliographic Details
Main Authors: Mathew G. Pelletier, John D. Wanjura, Greg A. Holt
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:AgriEngineering
Subjects:
Online Access:https://www.mdpi.com/2624-7402/7/7/225
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850067586267480064
author Mathew G. Pelletier
John D. Wanjura
Greg A. Holt
author_facet Mathew G. Pelletier
John D. Wanjura
Greg A. Holt
author_sort Mathew G. Pelletier
collection DOAJ
description The adoption of AI vision models in specialized industries is often hindered by the substantial requirement for extensive, manually annotated image datasets. Even when employing transfer learning, robust model development typically necessitates tens of thousands of such images, a process that is time-consuming, costly, and demands consistent expert annotation. This technical note introduces a semi-automated method to significantly reduce this annotation burden. The proposed approach utilizes two general-purpose vision-transformer-to-caption (GP-ViTC) models to generate descriptive text from images. These captions are then processed by a custom-developed semantic classifier (SC), which requires only minimal training to predict the correct image class. This GP-ViTC + SC system demonstrated exemplary classification rates in test cases and can subsequently be used to automatically annotate large image datasets. While the inference speed of the GP-ViTC models is not suited for real-time applications (approximately 10 s per image), this method substantially lessens the labor and expertise required for dataset creation, thereby facilitating the development of new, high-speed, custom AI vision models for niche applications. This work details the approach and its successful application, offering a cost-effective pathway for generating tailored image training sets.
format Article
id doaj-art-bfa4e81413804730b2bf03cdb0ee5297
institution DOAJ
issn 2624-7402
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series AgriEngineering
spelling doaj-art-bfa4e81413804730b2bf03cdb0ee52972025-08-20T02:48:16ZengMDPI AGAgriEngineering2624-74022025-07-017722510.3390/agriengineering7070225Semi-Automated Training of AI Vision ModelsMathew G. Pelletier0John D. Wanjura1Greg A. Holt2Lubbock Gin-Laboratory, Cotton Production and Processing Research Unit, United States Department of Agriculture, Agricultural Research Services, Lubbock, TX 79403, USALubbock Gin-Laboratory, Cotton Production and Processing Research Unit, United States Department of Agriculture, Agricultural Research Services, Lubbock, TX 79403, USALubbock Gin-Laboratory, Cotton Production and Processing Research Unit, United States Department of Agriculture, Agricultural Research Services, Lubbock, TX 79403, USAThe adoption of AI vision models in specialized industries is often hindered by the substantial requirement for extensive, manually annotated image datasets. Even when employing transfer learning, robust model development typically necessitates tens of thousands of such images, a process that is time-consuming, costly, and demands consistent expert annotation. This technical note introduces a semi-automated method to significantly reduce this annotation burden. The proposed approach utilizes two general-purpose vision-transformer-to-caption (GP-ViTC) models to generate descriptive text from images. These captions are then processed by a custom-developed semantic classifier (SC), which requires only minimal training to predict the correct image class. This GP-ViTC + SC system demonstrated exemplary classification rates in test cases and can subsequently be used to automatically annotate large image datasets. While the inference speed of the GP-ViTC models is not suited for real-time applications (approximately 10 s per image), this method substantially lessens the labor and expertise required for dataset creation, thereby facilitating the development of new, high-speed, custom AI vision models for niche applications. This work details the approach and its successful application, offering a cost-effective pathway for generating tailored image training sets.https://www.mdpi.com/2624-7402/7/7/225machine visionplastic contaminationcottonautomated inspection
spellingShingle Mathew G. Pelletier
John D. Wanjura
Greg A. Holt
Semi-Automated Training of AI Vision Models
AgriEngineering
machine vision
plastic contamination
cotton
automated inspection
title Semi-Automated Training of AI Vision Models
title_full Semi-Automated Training of AI Vision Models
title_fullStr Semi-Automated Training of AI Vision Models
title_full_unstemmed Semi-Automated Training of AI Vision Models
title_short Semi-Automated Training of AI Vision Models
title_sort semi automated training of ai vision models
topic machine vision
plastic contamination
cotton
automated inspection
url https://www.mdpi.com/2624-7402/7/7/225
work_keys_str_mv AT mathewgpelletier semiautomatedtrainingofaivisionmodels
AT johndwanjura semiautomatedtrainingofaivisionmodels
AT gregaholt semiautomatedtrainingofaivisionmodels