Predicting classification errors using NLP-based machine learning algorithms and expert opinions
Various intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and v...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-03-01
|
| Series: | Machine Learning with Applications |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666827025000131 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850077212305260544 |
|---|---|
| author | Peiheng Gao Chen Yang Ning Sun Ričardas Zitikis |
| author_facet | Peiheng Gao Chen Yang Ning Sun Ričardas Zitikis |
| author_sort | Peiheng Gao |
| collection | DOAJ |
| description | Various intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and very importantly, when training and fine-tuning ML algorithms. In the process of doing so and when facing well-known inefficiencies of the traditional F1 score, especially when working with unbalanced datasets, we suggest a modification of the score by incorporating human-experience-trained algorithms, which include both expert-trained algorithms (i.e., with the involvement of expert experiences in classification tasks) and staff-trained algorithms (i.e., with the involvement of experiences of those staff who have been trained by experts). Our findings reveal that the modified F1 score diverges from the traditional staff F1 score when the staff labels exhibit weak correlation with expert labels, which indicates insufficient staff training. Furthermore, the Long Short-Term Memory (LSTM) model outperforms other classifiers in terms of the modified F1 score when applied to the classification of textual narratives in consumer complaints. |
| format | Article |
| id | doaj-art-e636e020e3f14ca399b7d1f6b3ae430f |
| institution | DOAJ |
| issn | 2666-8270 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Machine Learning with Applications |
| spelling | doaj-art-e636e020e3f14ca399b7d1f6b3ae430f2025-08-20T02:45:50ZengElsevierMachine Learning with Applications2666-82702025-03-011910063010.1016/j.mlwa.2025.100630Predicting classification errors using NLP-based machine learning algorithms and expert opinionsPeiheng Gao0Chen Yang1Ning Sun2Ričardas Zitikis3School of Mathematical and Statistical Sciences, Western University, London, N6A 3K7, Ontario, CanadaDepartment of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA; Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA; Corresponding author at: Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.Agri-Food Analytics Lab, Dalhousie University, Halifax, B3H 4R2, Nova Scotia, CanadaSchool of Mathematical and Statistical Sciences, Western University, London, N6A 3K7, Ontario, CanadaVarious intentional and unintentional biases of humans manifest in classification tasks, such as those related to risk management. In this paper we demonstrate the role of ML algorithms when accomplishing these tasks and highlight the role of expert know-how when training the staff as well as, and very importantly, when training and fine-tuning ML algorithms. In the process of doing so and when facing well-known inefficiencies of the traditional F1 score, especially when working with unbalanced datasets, we suggest a modification of the score by incorporating human-experience-trained algorithms, which include both expert-trained algorithms (i.e., with the involvement of expert experiences in classification tasks) and staff-trained algorithms (i.e., with the involvement of experiences of those staff who have been trained by experts). Our findings reveal that the modified F1 score diverges from the traditional staff F1 score when the staff labels exhibit weak correlation with expert labels, which indicates insufficient staff training. Furthermore, the Long Short-Term Memory (LSTM) model outperforms other classifiers in terms of the modified F1 score when applied to the classification of textual narratives in consumer complaints.http://www.sciencedirect.com/science/article/pii/S2666827025000131Consumer complaintsNatural language processingImbalanced classificationHuman-experience-trained algorithms |
| spellingShingle | Peiheng Gao Chen Yang Ning Sun Ričardas Zitikis Predicting classification errors using NLP-based machine learning algorithms and expert opinions Machine Learning with Applications Consumer complaints Natural language processing Imbalanced classification Human-experience-trained algorithms |
| title | Predicting classification errors using NLP-based machine learning algorithms and expert opinions |
| title_full | Predicting classification errors using NLP-based machine learning algorithms and expert opinions |
| title_fullStr | Predicting classification errors using NLP-based machine learning algorithms and expert opinions |
| title_full_unstemmed | Predicting classification errors using NLP-based machine learning algorithms and expert opinions |
| title_short | Predicting classification errors using NLP-based machine learning algorithms and expert opinions |
| title_sort | predicting classification errors using nlp based machine learning algorithms and expert opinions |
| topic | Consumer complaints Natural language processing Imbalanced classification Human-experience-trained algorithms |
| url | http://www.sciencedirect.com/science/article/pii/S2666827025000131 |
| work_keys_str_mv | AT peihenggao predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions AT chenyang predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions AT ningsun predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions AT ricardaszitikis predictingclassificationerrorsusingnlpbasedmachinelearningalgorithmsandexpertopinions |