Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques
Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oro...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-10-01
|
| Series: | Discover Applied Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s42452-024-06307-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850179710598774784 |
|---|---|
| author | Etana Fikadu Dinsa Mrinal Das Teklu Urgessa Abebe Krishnaraj Ramaswamy |
| author_facet | Etana Fikadu Dinsa Mrinal Das Teklu Urgessa Abebe Krishnaraj Ramaswamy |
| author_sort | Etana Fikadu Dinsa |
| collection | DOAJ |
| description | Abstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oromo. The main tasks in this work are preparing the corpus, pre-processing, training the models, and the classification process. We used the term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) feature extraction methods. An ensemble technique in this work is that it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, and logistic regression and then combines them to advance a reliable and more accurate classifier. Evaluation measures were employed using accuracy, F1-score, recall, and precision for performance comparison. The efficiency of the proposed method is compared with the two existing boosting approaches, namely gradient boosting and adaboost. The experimental result shows the efficiency of BOW feature extraction over TF-IDF in this work on our dataset. These results also illustrated the effectiveness of the proposed model by scoring 94.81% accuracy and 94.84% F1-score. This work significantly contributes to the technological enhancement of service delivery, managing documents through classification methods, and advancing the data processing systems in healthcare sectors. |
| format | Article |
| id | doaj-art-8c82d1ffb61a42778f005bcb675db911 |
| institution | OA Journals |
| issn | 3004-9261 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Applied Sciences |
| spelling | doaj-art-8c82d1ffb61a42778f005bcb675db9112025-08-20T02:18:25ZengSpringerDiscover Applied Sciences3004-92612024-10-0161111710.1007/s42452-024-06307-0Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniquesEtana Fikadu Dinsa0Mrinal Das1Teklu Urgessa Abebe2Krishnaraj Ramaswamy3Department of Computer Science and Engineering, Engineering and Technology, Wollega UniversityDepartment of Data Science, Indian Institute of Technology Palakkad (IIT Palakkad)Department of Computer Science and Engineering, Adama Science and Technology UniversityDirector Center for Excellence in Incubation Indigenous Knowledge Innovative Technology Transfer and Entrepreneurship, Dambi Dollo UniversityAbstract Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble learning approach to develop a model that classifies electronic medical documents in Afaan Oromo. The main tasks in this work are preparing the corpus, pre-processing, training the models, and the classification process. We used the term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) feature extraction methods. An ensemble technique in this work is that it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, and logistic regression and then combines them to advance a reliable and more accurate classifier. Evaluation measures were employed using accuracy, F1-score, recall, and precision for performance comparison. The efficiency of the proposed method is compared with the two existing boosting approaches, namely gradient boosting and adaboost. The experimental result shows the efficiency of BOW feature extraction over TF-IDF in this work on our dataset. These results also illustrated the effectiveness of the proposed model by scoring 94.81% accuracy and 94.84% F1-score. This work significantly contributes to the technological enhancement of service delivery, managing documents through classification methods, and advancing the data processing systems in healthcare sectors.https://doi.org/10.1007/s42452-024-06307-0ClassificationEnsemble learningNatural languages processingMachine learningModel |
| spellingShingle | Etana Fikadu Dinsa Mrinal Das Teklu Urgessa Abebe Krishnaraj Ramaswamy Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques Discover Applied Sciences Classification Ensemble learning Natural languages processing Machine learning Model |
| title | Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques |
| title_full | Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques |
| title_fullStr | Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques |
| title_full_unstemmed | Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques |
| title_short | Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques |
| title_sort | automatic categorization of medical documents in afaan oromo using ensemble machine learning techniques |
| topic | Classification Ensemble learning Natural languages processing Machine learning Model |
| url | https://doi.org/10.1007/s42452-024-06307-0 |
| work_keys_str_mv | AT etanafikadudinsa automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques AT mrinaldas automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques AT tekluurgessaabebe automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques AT krishnarajramaswamy automaticcategorizationofmedicaldocumentsinafaanoromousingensemblemachinelearningtechniques |