Predicting classification errors using NLP-based machine learning algorithms and expert opinions

Various intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and v...

Full description

Saved in:
Bibliographic Details
Main Authors: Peiheng Gao, Chen Yang, Ning Sun, Ričardas Zitikis
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827025000131
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850077212305260544
author Peiheng Gao
Chen Yang
Ning Sun
Ričardas Zitikis
author_facet Peiheng Gao
Chen Yang
Ning Sun
Ričardas Zitikis
author_sort Peiheng Gao
collection DOAJ
description Various intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and very importantly, when training and fine-tuning ML algorithms. In the process of doing so and when facing well-known inefficiencies of the traditional F1 score, especially when working with unbalanced datasets, we suggest a modification of the score by incorporating human-experience-trained algorithms, which include both expert-trained algorithms (i.e., with the involvement of expert experiences in classification tasks) and staff-trained algorithms (i.e., with the involvement of experiences of those staff who have been trained by experts). Our findings reveal that the modified F1 score diverges from the traditional staff F1 score when the staff labels exhibit weak correlation with expert labels, which indicates insufficient staff training. Furthermore, the Long Short-Term Memory (LSTM) model outperforms other classifiers in terms of the modified F1 score when applied to the classification of textual narratives in consumer complaints.
format Article
id doaj-art-e636e020e3f14ca399b7d1f6b3ae430f
institution DOAJ
issn 2666-8270
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Machine Learning with Applications
spelling doaj-art-e636e020e3f14ca399b7d1f6b3ae430f2025-08-20T02:45:50ZengElsevierMachine Learning with Applications2666-82702025-03-011910063010.1016/j.mlwa.2025.100630Predicting classification errors using NLP-based machine learning algorithms and expert opinionsPeiheng Gao0Chen Yang1Ning Sun2Ričardas Zitikis3School of Mathematical and Statistical Sciences, Western University, London, N6A 3K7, Ontario, CanadaDepartment of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA; Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA; Corresponding author at: Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.Agri-Food Analytics Lab, Dalhousie University, Halifax, B3H 4R2, Nova Scotia, CanadaSchool of Mathematical and Statistical Sciences, Western University, London, N6A 3K7, Ontario, CanadaVarious intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and very importantly, when training and fine-tuning ML algorithms. In the process of doing so and when facing well-known inefficiencies of the traditional F1 score, especially when working with unbalanced datasets, we suggest a modification of the score by incorporating human-experience-trained algorithms, which include both expert-trained algorithms (i.e., with the involvement of expert experiences in classification tasks) and staff-trained algorithms (i.e., with the involvement of experiences of those staff who have been trained by experts). Our findings reveal that the modified F1 score diverges from the traditional staff F1 score when the staff labels exhibit weak correlation with expert labels, which indicates insufficient staff training. Furthermore, the Long Short-Term Memory (LSTM) model outperforms other classifiers in terms of the modified F1 score when applied to the classification of textual narratives in consumer complaints.http://www.sciencedirect.com/science/article/pii/S2666827025000131Consumer complaintsNatural language processingImbalanced classificationHuman-experience-trained algorithms
spellingShingle Peiheng Gao
Chen Yang
Ning Sun
Ričardas Zitikis
Predicting classification errors using NLP-based machine learning algorithms and expert opinions
Machine Learning with Applications
Consumer complaints
Natural language processing
Imbalanced classification
Human-experience-trained algorithms
title Predicting classification errors using NLP-based machine learning algorithms and expert opinions
title_full Predicting classification errors using NLP-based machine learning algorithms and expert opinions
title_fullStr Predicting classification errors using NLP-based machine learning algorithms and expert opinions
title_full_unstemmed Predicting classification errors using NLP-based machine learning algorithms and expert opinions
title_short Predicting classification errors using NLP-based machine learning algorithms and expert opinions
title_sort predicting classification errors using nlp based machine learning algorithms and expert opinions
topic Consumer complaints
Natural language processing
Imbalanced classification
Human-experience-trained algorithms
url http://www.sciencedirect.com/science/article/pii/S2666827025000131
work_keys_str_mv AT peihenggao predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions
AT chenyang predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions
AT ningsun predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions
AT ricardaszitikis predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions