Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques

Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oro...

Full description

Saved in:
Bibliographic Details
Main Authors: Etana Fikadu Dinsa, Mrinal Das, Teklu Urgessa Abebe, Krishnaraj Ramaswamy
Format: Article
Language:English
Published: Springer 2024-10-01
Series:Discover Applied Sciences
Subjects:
Online Access:https://doi.org/10.1007/s42452-024-06307-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179710598774784
author Etana Fikadu Dinsa
Mrinal Das
Teklu Urgessa Abebe
Krishnaraj Ramaswamy
author_facet Etana Fikadu Dinsa
Mrinal Das
Teklu Urgessa Abebe
Krishnaraj Ramaswamy
author_sort Etana Fikadu Dinsa
collection DOAJ
description Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oromo. The main tasks in this work are preparing the corpus, pre-processing, training the models, and the classification process. We used the term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) feature extraction methods. An ensemble technique in this work is that it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, and logistic regression and then combines them to advance a reliable and more accurate classifier. Evaluation measures were employed using accuracy, F1-score, recall, and precision for performance comparison. The efficiency of the proposed method is compared with the two existing boosting approaches, namely gradient boosting and adaboost. The experimental result shows the efficiency of BOW feature extraction over TF-IDF in this work on our dataset. These results also illustrated the effectiveness of the proposed model by scoring 94.81% accuracy and 94.84% F1-score. This work significantly contributes to the technological enhancement of service delivery, managing documents through classification methods, and advancing the data processing systems in healthcare sectors.
format Article
id doaj-art-8c82d1ffb61a42778f005bcb675db911
institution OA Journals
issn 3004-9261
language English
publishDate 2024-10-01
publisher Springer
record_format Article
series Discover Applied Sciences
spelling doaj-art-8c82d1ffb61a42778f005bcb675db9112025-08-20T02:18:25ZengSpringerDiscover Applied Sciences3004-92612024-10-0161111710.1007/s42452-024-06307-0Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniquesEtana Fikadu Dinsa0Mrinal Das1Teklu Urgessa Abebe2Krishnaraj Ramaswamy3Department of Computer Science and Engineering, Engineering and Technology, Wollega UniversityDepartment of Data Science, Indian Institute of Technology Palakkad (IIT Palakkad)Department of Computer Science and Engineering, Adama Science and Technology UniversityDirector Center for Excellence in Incubation Indigenous Knowledge Innovative Technology Transfer and Entrepreneurship, Dambi Dollo UniversityAbstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oromo. The main tasks in this work are preparing the corpus, pre-processing, training the models, and the classification process. We used the term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) feature extraction methods. An ensemble technique in this work is that it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, and logistic regression and then combines them to advance a reliable and more accurate classifier. Evaluation measures were employed using accuracy, F1-score, recall, and precision for performance comparison. The efficiency of the proposed method is compared with the two existing boosting approaches, namely gradient boosting and adaboost. The experimental result shows the efficiency of BOW feature extraction over TF-IDF in this work on our dataset. These results also illustrated the effectiveness of the proposed model by scoring 94.81% accuracy and 94.84% F1-score. This work significantly contributes to the technological enhancement of service delivery, managing documents through classification methods, and advancing the data processing systems in healthcare sectors.https://doi.org/10.1007/s42452-024-06307-0ClassificationEnsemble learningNatural languages processingMachine learningModel
spellingShingle Etana Fikadu Dinsa
Mrinal Das
Teklu Urgessa Abebe
Krishnaraj Ramaswamy
Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
Discover Applied Sciences
Classification
Ensemble learning
Natural languages processing
Machine learning
Model
title Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
title_full Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
title_fullStr Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
title_full_unstemmed Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
title_short Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
title_sort automatic categorization of medical documents in afaan oromo using ensemble machine learning techniques
topic Classification
Ensemble learning
Natural languages processing
Machine learning
Model
url https://doi.org/10.1007/s42452-024-06307-0
work_keys_str_mv AT etanafikadudinsa automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques
AT mrinaldas automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques
AT tekluurgessaabebe automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques
AT krishnarajramaswamy automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques