Benchmarking with a Language Model Initial Selection for Text Classification Tasks

The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions w...

Full description

Saved in:

Bibliographic Details
Main Authors:	Agus Riyadi, Mate Kovacs, Uwe Serdült, Victor Kryssanov
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Machine Learning and Knowledge Extraction
Subjects:	language model benchmarking machine learning model selection carbon emission reduction
Online Access:	https://www.mdpi.com/2504-4990/7/1/3
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850279721930063872
author	Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov
author_facet	Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov
author_sort	Agus Riyadi
collection	DOAJ
description	The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process.
format	Article
id	doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b
institution	OA Journals
issn	2504-4990
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Machine Learning and Knowledge Extraction
spelling	doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b2025-08-20T01:49:00ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902025-01-0171310.3390/make7010003Benchmarking with a Language Model Initial Selection for Text Classification TasksAgus Riyadi0Mate Kovacs1Uwe Serdült2Victor Kryssanov3Graduate School of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanThe now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process.https://www.mdpi.com/2504-4990/7/1/3language model benchmarkingmachine learning model selectioncarbon emission reduction
spellingShingle	Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov Benchmarking with a Language Model Initial Selection for Text Classification Tasks Machine Learning and Knowledge Extraction language model benchmarking machine learning model selection carbon emission reduction
title	Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_full	Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_fullStr	Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_full_unstemmed	Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_short	Benchmarking with a Language Model Initial Selection for Text Classification Tasks
title_sort	benchmarking with a language model initial selection for text classification tasks
topic	language model benchmarking machine learning model selection carbon emission reduction
url	https://www.mdpi.com/2504-4990/7/1/3
work_keys_str_mv	AT agusriyadi benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT matekovacs benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT uweserdult benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT victorkryssanov benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks

Benchmarking with a Language Model Initial Selection for Text Classification Tasks

Similar Items