Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of larg...

Full description

Saved in:
Bibliographic Details
Main Authors: Jacob Beck, Lukas Malte Kemeter, Konrad Durrbeck, Mohamed Hesham Ibrahim Abdalla, Frauke Kreuter
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10841407/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542595082878976
author Jacob Beck
Lukas Malte Kemeter
Konrad Durrbeck
Mohamed Hesham Ibrahim Abdalla
Frauke Kreuter
author_facet Jacob Beck
Lukas Malte Kemeter
Konrad Durrbeck
Mohamed Hesham Ibrahim Abdalla
Frauke Kreuter
author_sort Jacob Beck
collection DOAJ
description High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.
format Article
id doaj-art-983087046e3d4db6ad9860b37b0c775f
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-983087046e3d4db6ad9860b37b0c775f2025-02-04T00:00:14ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184366438110.1109/JSTARS.2025.352819210841407Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated AnnotatorsJacob Beck0https://orcid.org/0000-0002-7587-7064Lukas Malte Kemeter1https://orcid.org/0000-0001-9109-3625Konrad Durrbeck2https://orcid.org/0009-0003-8661-6227Mohamed Hesham Ibrahim Abdalla3https://orcid.org/0009-0000-4744-9030Frauke Kreuter4https://orcid.org/0000-0002-7339-2645Munich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyMunich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyHigh-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.https://ieeexplore.ieee.org/document/10841407/Automated annotationsChatGPTlabel qualitylarge language models (LLMs)satellite image annotation
spellingShingle Jacob Beck
Lukas Malte Kemeter
Konrad Durrbeck
Mohamed Hesham Ibrahim Abdalla
Frauke Kreuter
Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Automated annotations
ChatGPT
label quality
large language models (LLMs)
satellite image annotation
title Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_full Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_fullStr Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_full_unstemmed Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_short Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
title_sort toward integrating chatgpt into satellite image annotation workflows a comparison of label quality and costs of human and automated annotators
topic Automated annotations
ChatGPT
label quality
large language models (LLMs)
satellite image annotation
url https://ieeexplore.ieee.org/document/10841407/
work_keys_str_mv AT jacobbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators
AT lukasmaltekemeter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators
AT konraddurrbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators
AT mohamedheshamibrahimabdalla towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators
AT fraukekreuter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators