Benchmarking with a Language Model Initial Selection for Text Classification Tasks

The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions w...

Full description

Saved in:
Bibliographic Details
Main Authors: Agus Riyadi, Mate Kovacs, Uwe Serdült, Victor Kryssanov
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/7/1/3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850279721930063872
author Agus Riyadi
Mate Kovacs
Uwe Serdült
Victor Kryssanov
author_facet Agus Riyadi
Mate Kovacs
Uwe Serdült
Victor Kryssanov
author_sort Agus Riyadi
collection DOAJ
description The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process.
format Article
id doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b
institution OA Journals
issn 2504-4990
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b2025-08-20T01:49:00ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902025-01-0171310.3390/make7010003Benchmarking with a Language Model Initial Selection for Text Classification TasksAgus Riyadi0Mate Kovacs1Uwe Serdült2Victor Kryssanov3Graduate School of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanThe now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process.https://www.mdpi.com/2504-4990/7/1/3language model benchmarkingmachine learning model selectioncarbon emission reduction
spellingShingle Agus Riyadi
Mate Kovacs
Uwe Serdült
Victor Kryssanov
Benchmarking with a Language Model Initial Selection for Text Classification Tasks
Machine Learning and Knowledge Extraction
language model benchmarking
machine learning model selection
carbon emission reduction
title Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_full Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_fullStr Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_full_unstemmed Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_short Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_sort benchmarking with a language model initial selection for text classification tasks
topic language model benchmarking
machine learning model selection
carbon emission reduction
url https://www.mdpi.com/2504-4990/7/1/3
work_keys_str_mv AT agusriyadi benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks
AT matekovacs benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks
AT uweserdult benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks
AT victorkryssanov benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks