Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects

This study develops two machine learning models to predict the medal performance of countries at the 2028 Olympic Games while systematically analyzing and quantifying the impacts of the host effect and exceptional coaching on medal gains. The dataset encompasses records of total medals by country, e...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhenkai Zhang, Tengfei Ma, Yunpeng Yao, Ningjia Xu, Yujie Gao, Wanwan Xia
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/14/7793
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849246749246357504
author Zhenkai Zhang
Tengfei Ma
Yunpeng Yao
Ningjia Xu
Yujie Gao
Wanwan Xia
author_facet Zhenkai Zhang
Tengfei Ma
Yunpeng Yao
Ningjia Xu
Yujie Gao
Wanwan Xia
author_sort Zhenkai Zhang
collection DOAJ
description This study develops two machine learning models to predict the medal performance of countries at the 2028 Olympic Games while systematically analyzing and quantifying the impacts of the host effect and exceptional coaching on medal gains. The dataset encompasses records of total medals by country, event categories, and athletes’ participation from the Olympic Games held between 1896 and 2024. We use K-means clustering to analyze medal trends, categorizing 234 nations into four groups (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>1</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>2</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>3</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula>). Among these, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>1</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>2</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>3</mn></msub></semantics></math></inline-formula> represent medal-winning countries, while <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula> consists of non-medal-winning nations. For the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>1, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>2, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>3 groups, 2–3 representative countries from each are selected for trend analysis, with the United States serving as a case study. This study extracts ten factors that may influence medal wins from the dataset, including participant data, the number of events, and medal growth rates. Factor analysis is used to reduce them into three principal components: Factor analysis condenses ten influencing factors into three principal components: the event scale factor (F1), the medal trend factor (F2), and the gender and athletic ability factor (F3). An ARIMA model predicts the factor coefficients for 2028 as 0.9539, 0.7999, and 0.2937, respectively. Four models (random forest, BP Neural Network, XGBoost, and SVM) are employed to predict medal outcomes, using historical data split into training and testing sets to compare their predictive performance. The research results show that XGBoost is the optimal medal predicted model, with the United States projected to win 57 gold medals and a total of 135 medals in 2028. For non-medal-winning countries (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula>), a three-layer fully connected neural network (FCNN) is constructed, achieving an accuracy of 85.5% during testing. Additionally, a formula to calculate the host effect and a Bayesian linear regression model to assess the impact of exceptional coaching on athletes’ medal performance are proposed. The overall trend of countries in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>1 group is stable, but they are significantly affected by the host effect; the trend in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>2 group shows an upward trend; the trend in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>3 group depend on the athletes’ conditions and whether the events they excel in are included in that year’s Olympics. In the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>4 group, the probabilities of the United Arab Republic (UAR) and Mali (MLI) winning medals in the 2028 Olympic Games are 77.47% and 58.47%, respectively, and there are another four countries with probabilities exceeding 30%. For the eight most recent Olympic Games, the gain rate of the host effect is 74%. Great coaches can bring an average increase of 0.2 to 0.5 medals for each athlete. The proposed models, through an innovative integration of clustering, dimensionality reduction, and predictive algorithms, provide reliable forecasts and data-driven insights for optimizing national sports strategies. These contributions not only address the gap in predicting first-time medal wins for non-medal-winning nations but also offer guidance for policymakers and sports organizations, though they are constrained by assumptions of stable historical trends, minimal external disruptions, and the exclusion of unknown athletes.
format Article
id doaj-art-a3c761734ab8463ca66ce061a04d4aae
institution Kabale University
issn 2076-3417
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-a3c761734ab8463ca66ce061a04d4aae2025-08-20T03:58:25ZengMDPI AGApplied Sciences2076-34172025-07-011514779310.3390/app15147793Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching EffectsZhenkai Zhang0Tengfei Ma1Yunpeng Yao2Ningjia Xu3Yujie Gao4Wanwan Xia5School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Foreign Languages and Literature, Nanjing Tech University, Nanjing 211816, ChinaSchool of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211816, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaThis study develops two machine learning models to predict the medal performance of countries at the 2028 Olympic Games while systematically analyzing and quantifying the impacts of the host effect and exceptional coaching on medal gains. The dataset encompasses records of total medals by country, event categories, and athletes’ participation from the Olympic Games held between 1896 and 2024. We use K-means clustering to analyze medal trends, categorizing 234 nations into four groups (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>1</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>2</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>3</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula>). Among these, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>1</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>2</mn></msub></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>3</mn></msub></semantics></math></inline-formula> represent medal-winning countries, while <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula> consists of non-medal-winning nations. For the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>1, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>2, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>3 groups, 2–3 representative countries from each are selected for trend analysis, with the United States serving as a case study. This study extracts ten factors that may influence medal wins from the dataset, including participant data, the number of events, and medal growth rates. Factor analysis is used to reduce them into three principal components: Factor analysis condenses ten influencing factors into three principal components: the event scale factor (F1), the medal trend factor (F2), and the gender and athletic ability factor (F3). An ARIMA model predicts the factor coefficients for 2028 as 0.9539, 0.7999, and 0.2937, respectively. Four models (random forest, BP Neural Network, XGBoost, and SVM) are employed to predict medal outcomes, using historical data split into training and testing sets to compare their predictive performance. The research results show that XGBoost is the optimal medal predicted model, with the United States projected to win 57 gold medals and a total of 135 medals in 2028. For non-medal-winning countries (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>α</mi><mn>4</mn></msub></semantics></math></inline-formula>), a three-layer fully connected neural network (FCNN) is constructed, achieving an accuracy of 85.5% during testing. Additionally, a formula to calculate the host effect and a Bayesian linear regression model to assess the impact of exceptional coaching on athletes’ medal performance are proposed. The overall trend of countries in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>1 group is stable, but they are significantly affected by the host effect; the trend in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>2 group shows an upward trend; the trend in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>3 group depend on the athletes’ conditions and whether the events they excel in are included in that year’s Olympics. In the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>α</mi></semantics></math></inline-formula>4 group, the probabilities of the United Arab Republic (UAR) and Mali (MLI) winning medals in the 2028 Olympic Games are 77.47% and 58.47%, respectively, and there are another four countries with probabilities exceeding 30%. For the eight most recent Olympic Games, the gain rate of the host effect is 74%. Great coaches can bring an average increase of 0.2 to 0.5 medals for each athlete. The proposed models, through an innovative integration of clustering, dimensionality reduction, and predictive algorithms, provide reliable forecasts and data-driven insights for optimizing national sports strategies. These contributions not only address the gap in predicting first-time medal wins for non-medal-winning nations but also offer guidance for policymakers and sports organizations, though they are constrained by assumptions of stable historical trends, minimal external disruptions, and the exclusion of unknown athletes.https://www.mdpi.com/2076-3417/15/14/7793Olympic medalsK-meansfactor analysisfully connected neural networkXGBoosthost effect
spellingShingle Zhenkai Zhang
Tengfei Ma
Yunpeng Yao
Ningjia Xu
Yujie Gao
Wanwan Xia
Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
Applied Sciences
Olympic medals
K-means
factor analysis
fully connected neural network
XGBoost
host effect
title Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
title_full Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
title_fullStr Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
title_full_unstemmed Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
title_short Predicting Olympic Medal Performance for 2028: Machine Learning Models and the Impact of Host and Coaching Effects
title_sort predicting olympic medal performance for 2028 machine learning models and the impact of host and coaching effects
topic Olympic medals
K-means
factor analysis
fully connected neural network
XGBoost
host effect
url https://www.mdpi.com/2076-3417/15/14/7793
work_keys_str_mv AT zhenkaizhang predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects
AT tengfeima predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects
AT yunpengyao predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects
AT ningjiaxu predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects
AT yujiegao predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects
AT wanwanxia predictingolympicmedalperformancefor2028machinelearningmodelsandtheimpactofhostandcoachingeffects