Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques

Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oro...

Full description

Saved in:
Bibliographic Details
Main Authors: Etana Fikadu Dinsa, Mrinal Das, Teklu Urgessa Abebe, Krishnaraj Ramaswamy
Format: Article
Language:English
Published: Springer 2024-10-01
Series:Discover Applied Sciences
Subjects:
Online Access:https://doi.org/10.1007/s42452-024-06307-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oromo. The main tasks in this work are preparing the corpus, pre-processing, training the models, and the classification process. We used the term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) feature extraction methods. An ensemble technique in this work is that it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, and logistic regression and then combines them to advance a reliable and more accurate classifier. Evaluation measures were employed using accuracy, F1-score, recall, and precision for performance comparison. The efficiency of the proposed method is compared with the two existing boosting approaches, namely gradient boosting and adaboost. The experimental result shows the efficiency of BOW feature extraction over TF-IDF in this work on our dataset. These results also illustrated the effectiveness of the proposed model by scoring 94.81% accuracy and 94.84% F1-score. This work significantly contributes to the technological enhancement of service delivery, managing documents through classification methods, and advancing the data processing systems in healthcare sectors.
ISSN:3004-9261