Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy

HIV/AIDS remains a significant global health challenge, requiring accurate predictive models for early detection and improved clinical decision-making. However, developing an effective predictive model faces challenges such as data imbalance and the presence of irrelevant features, which can comprom...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdul Mizwar A Rahim, Ahmad Ridwan, Bambang Pilu Hartato, Firman Asharudin
Format: Article
Language:English
Published: Politeknik Negeri Batam 2025-03-01
Series:Journal of Applied Informatics and Computing
Subjects:
Online Access:https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9125
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850172225129283584
author Abdul Mizwar A Rahim
Ahmad Ridwan
Bambang Pilu Hartato
Firman Asharudin
author_facet Abdul Mizwar A Rahim
Ahmad Ridwan
Bambang Pilu Hartato
Firman Asharudin
author_sort Abdul Mizwar A Rahim
collection DOAJ
description HIV/AIDS remains a significant global health challenge, requiring accurate predictive models for early detection and improved clinical decision-making. However, developing an effective predictive model faces challenges such as data imbalance and the presence of irrelevant features, which can compromise model accuracy. This study aims to enhance the performance of AIDS infection prediction models by integrating feature selection, data balancing, and machine learning classification techniques. Feature selection is conducted using Pearson Correlation, Mutual Information, and Chi-Square tests to retain only the most relevant features. Random Oversampling, SMOTE, and ADASYN are employed to address data imbalance and improve model robustness. Nine machine learning algorithms, including Decision Tree, Random Forest, XGBoost, LightGBM, Gradient Boosting, Support Vector Machine, AdaBoost, and Logistic Regression, are tested for classification. Performance evaluation using confusion matrix, precision, recall, F1-score, and AUC-ROC shows that tree-based models (Random Forest, Extra Trees, and XGBoost) achieve the best results, particularly in handling minority class predictions. The study concludes that combining feature selection, data balancing, and machine learning techniques significantly improves predictive performance, making it a valuable approach for early detection and clinical decision support in HIV/AIDS diagnosis. Future research may explore hyperparameter tuning and real-world clinical data integration to enhance practical applicability.
format Article
id doaj-art-9ffb623ddff74bf09f5bc32b833b93de
institution OA Journals
issn 2548-6861
language English
publishDate 2025-03-01
publisher Politeknik Negeri Batam
record_format Article
series Journal of Applied Informatics and Computing
spelling doaj-art-9ffb623ddff74bf09f5bc32b833b93de2025-08-20T02:20:08ZengPoliteknik Negeri BatamJournal of Applied Informatics and Computing2548-68612025-03-019233834710.30871/jaic.v9i2.91256675Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing StrategyAbdul Mizwar A RahimAhmad RidwanBambang Pilu HartatoFirman AsharudinHIV/AIDS remains a significant global health challenge, requiring accurate predictive models for early detection and improved clinical decision-making. However, developing an effective predictive model faces challenges such as data imbalance and the presence of irrelevant features, which can compromise model accuracy. This study aims to enhance the performance of AIDS infection prediction models by integrating feature selection, data balancing, and machine learning classification techniques. Feature selection is conducted using Pearson Correlation, Mutual Information, and Chi-Square tests to retain only the most relevant features. Random Oversampling, SMOTE, and ADASYN are employed to address data imbalance and improve model robustness. Nine machine learning algorithms, including Decision Tree, Random Forest, XGBoost, LightGBM, Gradient Boosting, Support Vector Machine, AdaBoost, and Logistic Regression, are tested for classification. Performance evaluation using confusion matrix, precision, recall, F1-score, and AUC-ROC shows that tree-based models (Random Forest, Extra Trees, and XGBoost) achieve the best results, particularly in handling minority class predictions. The study concludes that combining feature selection, data balancing, and machine learning techniques significantly improves predictive performance, making it a valuable approach for early detection and clinical decision support in HIV/AIDS diagnosis. Future research may explore hyperparameter tuning and real-world clinical data integration to enhance practical applicability.https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9125machine learning classificationfeature selectionrandom oversamplinghiv/aids predictionensemble learning.
spellingShingle Abdul Mizwar A Rahim
Ahmad Ridwan
Bambang Pilu Hartato
Firman Asharudin
Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
Journal of Applied Informatics and Computing
machine learning classification
feature selection
random oversampling
hiv/aids prediction
ensemble learning.
title Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
title_full Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
title_fullStr Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
title_full_unstemmed Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
title_short Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy
title_sort machine learning based approach for hiv aids prediction feature selection and data balancing strategy
topic machine learning classification
feature selection
random oversampling
hiv/aids prediction
ensemble learning.
url https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9125
work_keys_str_mv AT abdulmizwararahim machinelearningbasedapproachforhivaidspredictionfeatureselectionanddatabalancingstrategy
AT ahmadridwan machinelearningbasedapproachforhivaidspredictionfeatureselectionanddatabalancingstrategy
AT bambangpiluhartato machinelearningbasedapproachforhivaidspredictionfeatureselectionanddatabalancingstrategy
AT firmanasharudin machinelearningbasedapproachforhivaidspredictionfeatureselectionanddatabalancingstrategy