Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators
High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of larg...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10841407/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832542595082878976 |
---|---|
author | Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter |
author_facet | Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter |
author_sort | Jacob Beck |
collection | DOAJ |
description | High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise. |
format | Article |
id | doaj-art-983087046e3d4db6ad9860b37b0c775f |
institution | Kabale University |
issn | 1939-1404 2151-1535 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj-art-983087046e3d4db6ad9860b37b0c775f2025-02-04T00:00:14ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184366438110.1109/JSTARS.2025.352819210841407Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated AnnotatorsJacob Beck0https://orcid.org/0000-0002-7587-7064Lukas Malte Kemeter1https://orcid.org/0000-0001-9109-3625Konrad Durrbeck2https://orcid.org/0009-0003-8661-6227Mohamed Hesham Ibrahim Abdalla3https://orcid.org/0009-0000-4744-9030Frauke Kreuter4https://orcid.org/0000-0002-7339-2645Munich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyCenter for Applied Research on Supply Chain Services, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, GermanyMunich Center for Machine Learning (MCML), Ludwig-Maximilians-Universität München Institut für Informatik, München, GermanyHigh-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.https://ieeexplore.ieee.org/document/10841407/Automated annotationsChatGPTlabel qualitylarge language models (LLMs)satellite image annotation |
spellingShingle | Jacob Beck Lukas Malte Kemeter Konrad Durrbeck Mohamed Hesham Ibrahim Abdalla Frauke Kreuter Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Automated annotations ChatGPT label quality large language models (LLMs) satellite image annotation |
title | Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators |
title_full | Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators |
title_fullStr | Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators |
title_full_unstemmed | Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators |
title_short | Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators |
title_sort | toward integrating chatgpt into satellite image annotation workflows a comparison of label quality and costs of human and automated annotators |
topic | Automated annotations ChatGPT label quality large language models (LLMs) satellite image annotation |
url | https://ieeexplore.ieee.org/document/10841407/ |
work_keys_str_mv | AT jacobbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT lukasmaltekemeter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT konraddurrbeck towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT mohamedheshamibrahimabdalla towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators AT fraukekreuter towardintegratingchatgptintosatelliteimageannotationworkflowsacomparisonoflabelqualityandcostsofhumanandautomatedannotators |