Benchmarking with a Language Model Initial Selection for Text Classification Tasks
The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions w...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-01-01
|
| Series: | Machine Learning and Knowledge Extraction |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-4990/7/1/3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850279721930063872 |
|---|---|
| author | Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov |
| author_facet | Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov |
| author_sort | Agus Riyadi |
| collection | DOAJ |
| description | The now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process. |
| format | Article |
| id | doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b |
| institution | OA Journals |
| issn | 2504-4990 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Machine Learning and Knowledge Extraction |
| spelling | doaj-art-bb8a35377ce7424c9c2ea93b6a4eb06b2025-08-20T01:49:00ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902025-01-0171310.3390/make7010003Benchmarking with a Language Model Initial Selection for Text Classification TasksAgus Riyadi0Mate Kovacs1Uwe Serdült2Victor Kryssanov3Graduate School of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanCollege of Information Science and Engineering, Ritsumeikan University, Ibaraki 5678570, Osaka, JapanThe now-globally recognized concerns of AI’s environmental implications resulted in a growing awareness of the need to reduce AI carbon footprints, as well as to carry out AI processes responsibly and in an environmentally friendly manner. Benchmarking, a critical step when evaluating AI solutions with machine learning models, particularly with language models, has recently become a focal point of research aimed at reducing AI carbon emissions. Contemporary approaches to AI model benchmarking, however, do not enforce (nor do they assume) a model initial selection process. Consequently, modern model benchmarking is no different from a “brute force” testing of all candidate models before the best-performing one could be deployed. Obviously, the latter approach is inefficient and environmentally harmful. To address the carbon footprint challenges associated with language model selection, this study presents an original benchmarking approach with a model initial selection on a proxy evaluative task. The proposed approach, referred to as Language Model-Dataset Fit (LMDFit) benchmarking, is devised to complement the standard model benchmarking process with a procedure that would eliminate underperforming models from computationally extensive and, therefore, environmentally unfriendly tests. The LMDFit approach draws parallels from the organizational personnel selection process, where job candidates are first evaluated by conducting a number of basic skill assessments before they would be hired, thus mitigating the consequences of hiring unfit candidates for the organization. LMDFit benchmarking compares candidate model performances on a target-task small dataset to disqualify less-relevant models from further testing. A semantic similarity assessment of random texts is used as the proxy task for the initial selection, and the approach is explicated in the context of various text classification assignments. Extensive experiments across eight text classification tasks (both single- and multi-class) from diverse domains are conducted with seven popular pre-trained language models (both general-purpose and domain-specific). The results obtained demonstrate the efficiency of the proposed LMDFit approach in terms of the overall benchmarking time as well as estimated emissions (a 37% reduction, on average) in comparison to the conventional benchmarking process.https://www.mdpi.com/2504-4990/7/1/3language model benchmarkingmachine learning model selectioncarbon emission reduction |
| spellingShingle | Agus Riyadi Mate Kovacs Uwe Serdült Victor Kryssanov Benchmarking with a Language Model Initial Selection for Text Classification Tasks Machine Learning and Knowledge Extraction language model benchmarking machine learning model selection carbon emission reduction |
| title | Benchmarking with a Language Model Initial Selection for Text Classification Tasks |
| title_full | Benchmarking with a Language Model Initial Selection for Text Classification Tasks |
| title_fullStr | Benchmarking with a Language Model Initial Selection for Text Classification Tasks |
| title_full_unstemmed | Benchmarking with a Language Model Initial Selection for Text Classification Tasks |
| title_short | Benchmarking with a Language Model Initial Selection for Text Classification Tasks |
| title_sort | benchmarking with a language model initial selection for text classification tasks |
| topic | language model benchmarking machine learning model selection carbon emission reduction |
| url | https://www.mdpi.com/2504-4990/7/1/3 |
| work_keys_str_mv | AT agusriyadi benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT matekovacs benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT uweserdult benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks AT victorkryssanov benchmarkingwithalanguagemodelinitialselectionfortextclassificationtasks |