Application of Machine Learning Models for Baseball Outcome Prediction

Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total...

Full description

Saved in:
Bibliographic Details
Main Authors: Tzu-Chien Lo, Chen-Yin Lee, Chien-Lin Chen, Tsung-Yu Hsieh, Che-Hsiu Chen, Yen-Kuang Lin
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/13/7081
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849429020686417920
author Tzu-Chien Lo
Chen-Yin Lee
Chien-Lin Chen
Tsung-Yu Hsieh
Che-Hsiu Chen
Yen-Kuang Lin
author_facet Tzu-Chien Lo
Chen-Yin Lee
Chien-Lin Chen
Tsung-Yu Hsieh
Che-Hsiu Chen
Yen-Kuang Lin
author_sort Tzu-Chien Lo
collection DOAJ
description Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total of 859 games from the 2021 to 2023 regular seasons were analyzed, using both traditional baseball statistics and advanced sabermetric indicators such as the Weighted Runs Created Plus (wRC+), Weighted Runs Above Average (wRAA), and Percentage of Leadoff Batters on Base (PLOB%). Five machine learning models—decision tree, logistic regression, Neural Network, Random Forest, and XGBoost—were constructed and assessed through a five-fold cross-validation. Evaluation metrics included accuracy, F1 scores, sensitivity, specificity, and the AUC-ROC. Results: Among the models, logistic regression and XGBoost achieved the highest performance, with an accuracy ranging from 0.89 to 0.93 and an AUC-ROC from 0.97 to 0.98. The feature importance and SHapley Additive exPlanations (SHAP) analysis revealed that the wRC+ and PLOB% were the most influential predictors, reflecting the offensive efficiency and pitching control. Conclusion: The results suggest that combining interpretable machine learning with sabermetrics provides valuable insights for coaches and analysts in professional baseball. Furthermore, incorporating performance weighting based on game context may further enhance model accuracy. This research demonstrates the potential of data-driven strategies in sports analytics and decision-making.
format Article
id doaj-art-b43af96f032e43d2b68fb46c9d02791b
institution Kabale University
issn 2076-3417
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-b43af96f032e43d2b68fb46c9d02791b2025-08-20T03:28:29ZengMDPI AGApplied Sciences2076-34172025-06-011513708110.3390/app15137081Application of Machine Learning Models for Baseball Outcome PredictionTzu-Chien Lo0Chen-Yin Lee1Chien-Lin Chen2Tsung-Yu Hsieh3Che-Hsiu Chen4Yen-Kuang Lin5Department of Physical Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanCenter for Teacher Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanPhysical Education Office, Fu Jen Catholic University, New Taipei City 24205, TaiwanDepartment of Physical Education, Fu Jen Catholic University, New Taipei City 24205, TaiwanDepartment of Sports Performance, National Taiwan University of Sport, Taichung 40401, TaiwanGraduate Institute of Athletics and Coaching Science, National Taiwan Sport University, Taoyuan 33301, TaiwanData science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total of 859 games from the 2021 to 2023 regular seasons were analyzed, using both traditional baseball statistics and advanced sabermetric indicators such as the Weighted Runs Created Plus (wRC+), Weighted Runs Above Average (wRAA), and Percentage of Leadoff Batters on Base (PLOB%). Five machine learning models—decision tree, logistic regression, Neural Network, Random Forest, and XGBoost—were constructed and assessed through a five-fold cross-validation. Evaluation metrics included accuracy, F1 scores, sensitivity, specificity, and the AUC-ROC. Results: Among the models, logistic regression and XGBoost achieved the highest performance, with an accuracy ranging from 0.89 to 0.93 and an AUC-ROC from 0.97 to 0.98. The feature importance and SHapley Additive exPlanations (SHAP) analysis revealed that the wRC+ and PLOB% were the most influential predictors, reflecting the offensive efficiency and pitching control. Conclusion: The results suggest that combining interpretable machine learning with sabermetrics provides valuable insights for coaches and analysts in professional baseball. Furthermore, incorporating performance weighting based on game context may further enhance model accuracy. This research demonstrates the potential of data-driven strategies in sports analytics and decision-making.https://www.mdpi.com/2076-3417/15/13/7081Chinese Professional Baseball Leagueperformance analysissport big dataweight indexexplainable AI
spellingShingle Tzu-Chien Lo
Chen-Yin Lee
Chien-Lin Chen
Tsung-Yu Hsieh
Che-Hsiu Chen
Yen-Kuang Lin
Application of Machine Learning Models for Baseball Outcome Prediction
Applied Sciences
Chinese Professional Baseball League
performance analysis
sport big data
weight index
explainable AI
title Application of Machine Learning Models for Baseball Outcome Prediction
title_full Application of Machine Learning Models for Baseball Outcome Prediction
title_fullStr Application of Machine Learning Models for Baseball Outcome Prediction
title_full_unstemmed Application of Machine Learning Models for Baseball Outcome Prediction
title_short Application of Machine Learning Models for Baseball Outcome Prediction
title_sort application of machine learning models for baseball outcome prediction
topic Chinese Professional Baseball League
performance analysis
sport big data
weight index
explainable AI
url https://www.mdpi.com/2076-3417/15/13/7081
work_keys_str_mv AT tzuchienlo applicationofmachinelearningmodelsforbaseballoutcomeprediction
AT chenyinlee applicationofmachinelearningmodelsforbaseballoutcomeprediction
AT chienlinchen applicationofmachinelearningmodelsforbaseballoutcomeprediction
AT tsungyuhsieh applicationofmachinelearningmodelsforbaseballoutcomeprediction
AT chehsiuchen applicationofmachinelearningmodelsforbaseballoutcomeprediction
AT yenkuanglin applicationofmachinelearningmodelsforbaseballoutcomeprediction