Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of larg...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jacob Beck, Lukas Malte Kemeter, Konrad Durrbeck, Mohamed Hesham Ibrahim Abdalla, Frauke Kreuter
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Automated annotations ChatGPT label quality large language models (LLMs) satellite image annotation
Online Access:	https://ieeexplore.ieee.org/document/10841407/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832542595082878976
author	Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter
author_facet	Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter
author_sort	Jacob Beck
collection	DOAJ
description	High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.
format	Article
id	doaj-art-983087046e3d4db6ad9860b37b0c775f
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-983087046e3d4db6ad9860b37b0c775f2025-02-04T00:00:14ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184366438110.1109/JSTARS.2025.352819210841407Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated AnnotatorsJacob Beck0https://orcid.org/0000-0002-7587-7064Lukas Malte Kemeter1https://orcid.org/0000-0001-9109-3625Konrad Durrbeck2https://orcid.org/0009-0003-8661-6227Mohamed Hesham Ibrahim Abdalla3https://orcid.org/0009-0000-4744-9030Frauke Kreuter4https://orcid.org/0000-0002-7339-2645Munich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyMunich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyHigh-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.https://ieeexplore.ieee.org/document/10841407/Automated annotationsChatGPTlabel qualitylarge language models (LLMs)satellite image annotation
spellingShingle	Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Automated annotations ChatGPT label quality large language models (LLMs) satellite image annotation
title	Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_full	Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_fullStr	Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_full_unstemmed	Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_short	Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_sort	toward integrating chatgpt into satellite image annotation workflows a comparison of label quality and costs of human and automated annotators
topic	Automated annotations ChatGPT label quality large language models (LLMs) satellite image annotation
url	https://ieeexplore.ieee.org/document/10841407/
work_keys_str_mv	AT jacobbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT lukasmaltekemeter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT konraddurrbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT mohamedheshamibrahimabdalla towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT fraukekreuter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators

Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators

Similar Items