Application of Machine Learning Models for Baseball Outcome Prediction
Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/13/7081 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849429020686417920 |
|---|---|
| author | Tzu-Chien Lo Chen-Yin Lee Chien-Lin Chen Tsung-Yu Hsieh Che-Hsiu Chen Yen-Kuang Lin |
| author_facet | Tzu-Chien Lo Chen-Yin Lee Chien-Lin Chen Tsung-Yu Hsieh Che-Hsiu Chen Yen-Kuang Lin |
| author_sort | Tzu-Chien Lo |
| collection | DOAJ |
| description | Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total of 859 games from the 2021 to 2023 regular seasons were analyzed, using both traditional baseball statistics and advanced sabermetric indicators such as the Weighted Runs Created Plus (wRC+), Weighted Runs Above Average (wRAA), and Percentage of Leadoff Batters on Base (PLOB%). Five machine learning models—decision tree, logistic regression, Neural Network, Random Forest, and XGBoost—were constructed and assessed through a five-fold cross-validation. Evaluation metrics included accuracy, F1 scores, sensitivity, specificity, and the AUC-ROC. Results: Among the models, logistic regression and XGBoost achieved the highest performance, with an accuracy ranging from 0.89 to 0.93 and an AUC-ROC from 0.97 to 0.98. The feature importance and SHapley Additive exPlanations (SHAP) analysis revealed that the wRC+ and PLOB% were the most influential predictors, reflecting the offensive efficiency and pitching control. Conclusion: The results suggest that combining interpretable machine learning with sabermetrics provides valuable insights for coaches and analysts in professional baseball. Furthermore, incorporating performance weighting based on game context may further enhance model accuracy. This research demonstrates the potential of data-driven strategies in sports analytics and decision-making. |
| format | Article |
| id | doaj-art-b43af96f032e43d2b68fb46c9d02791b |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-b43af96f032e43d2b68fb46c9d02791b2025-08-20T03:28:29ZengMDPI AGApplied Sciences2076-34172025-06-011513708110.3390/app15137081Application of Machine Learning Models for Baseball Outcome PredictionTzu-Chien Lo0Chen-Yin Lee1Chien-Lin Chen2Tsung-Yu Hsieh3Che-Hsiu Chen4Yen-Kuang Lin5Department of Physical Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanCenter for Teacher Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanPhysical Education Office, Fu Jen Catholic University, New Taipei City 24205, TaiwanDepartment of Physical Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanDepartment of Sports Performance, National Taiwan University of Sport, Taichung 40401, TaiwanGraduate Institute of Athletics and Coaching Science, National Taiwan Sport University, Taoyuan 33301, TaiwanData science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total of 859 games from the 2021 to 2023 regular seasons were analyzed, using both traditional baseball statistics and advanced sabermetric indicators such as the Weighted Runs Created Plus (wRC+), Weighted Runs Above Average (wRAA), and Percentage of Leadoff Batters on Base (PLOB%). Five machine learning models—decision tree, logistic regression, Neural Network, Random Forest, and XGBoost—were constructed and assessed through a five-fold cross-validation. Evaluation metrics included accuracy, F1 scores, sensitivity, specificity, and the AUC-ROC. Results: Among the models, logistic regression and XGBoost achieved the highest performance, with an accuracy ranging from 0.89 to 0.93 and an AUC-ROC from 0.97 to 0.98. The feature importance and SHapley Additive exPlanations (SHAP) analysis revealed that the wRC+ and PLOB% were the most influential predictors, reflecting the offensive efficiency and pitching control. Conclusion: The results suggest that combining interpretable machine learning with sabermetrics provides valuable insights for coaches and analysts in professional baseball. Furthermore, incorporating performance weighting based on game context may further enhance model accuracy. This research demonstrates the potential of data-driven strategies in sports analytics and decision-making.https://www.mdpi.com/2076-3417/15/13/7081Chinese Professional Baseball Leagueperformance analysissport big dataweight indexexplainable AI |
| spellingShingle | Tzu-Chien Lo Chen-Yin Lee Chien-Lin Chen Tsung-Yu Hsieh Che-Hsiu Chen Yen-Kuang Lin Application of Machine Learning Models for Baseball Outcome Prediction Applied Sciences Chinese Professional Baseball League performance analysis sport big data weight index explainable AI |
| title | Application of Machine Learning Models for Baseball Outcome Prediction |
| title_full | Application of Machine Learning Models for Baseball Outcome Prediction |
| title_fullStr | Application of Machine Learning Models for Baseball Outcome Prediction |
| title_full_unstemmed | Application of Machine Learning Models for Baseball Outcome Prediction |
| title_short | Application of Machine Learning Models for Baseball Outcome Prediction |
| title_sort | application of machine learning models for baseball outcome prediction |
| topic | Chinese Professional Baseball League performance analysis sport big data weight index explainable AI |
| url | https://www.mdpi.com/2076-3417/15/13/7081 |
| work_keys_str_mv | AT tzuchienlo applicationofmachinelearningmodelsforbaseballoutcomeprediction AT chenyinlee applicationofmachinelearningmodelsforbaseballoutcomeprediction AT chienlinchen applicationofmachinelearningmodelsforbaseballoutcomeprediction AT tsungyuhsieh applicationofmachinelearningmodelsforbaseballoutcomeprediction AT chehsiuchen applicationofmachinelearningmodelsforbaseballoutcomeprediction AT yenkuanglin applicationofmachinelearningmodelsforbaseballoutcomeprediction |