Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features

Dynamic oxygen uptake (VO<sub>2</sub>) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhen Wang, Yingzhe Song, Lei Pang, Shanjun Li, Gang Sun
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4062
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849701815389519872
author Zhen Wang
Yingzhe Song
Lei Pang
Shanjun Li
Gang Sun
author_facet Zhen Wang
Yingzhe Song
Lei Pang
Shanjun Li
Gang Sun
author_sort Zhen Wang
collection DOAJ
description Dynamic oxygen uptake (VO<sub>2</sub>) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate modeling of long-range temporal patterns—by integrating wearable accelerometer and heart-rate streams with a convolutional neural network–LSTM (CNN-LSTM) architecture and optional attention modules. Physiological signals and VO<sub>2</sub> were recorded from 21 adults through resting assessment and cardiopulmonary exercise testing. The results showed that pairing accelerometer with heart-rate inputs improves prediction compared with considering the heart rate alone. The baseline CNN-LSTM reached <i>R</i><sup>2</sup> = 0.946, outperforming a plain LSTM (<i>R</i><sup>2</sup> = 0.926) thanks to stronger local spatio-temporal feature extraction. Introducing a spatial attention mechanism raised accuracy further (<i>R</i><sup>2</sup> = 0.962), whereas temporal attention reduced it (<i>R</i><sup>2</sup> = 0.930), indicating that attention success depends on how well the attended features align with exercise dynamics. Stacking both attentions (spatio-temporal) yielded <i>R</i><sup>2</sup> = 0.960, slightly below the value for spatial attention alone, implying that added complexity does not guarantee better performance. Across all models, prediction errors grew during high-intensity bouts, highlighting a bottleneck in capturing non-linear physiological responses under heavy load. These findings inform architecture selection for wearable metabolic monitoring and clarify when attention mechanisms add value.
format Article
id doaj-art-4f2bd56c8fd14b87a32fb41cf1469b02
institution DOAJ
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-4f2bd56c8fd14b87a32fb41cf1469b022025-08-20T03:17:51ZengMDPI AGSensors1424-82202025-06-012513406210.3390/s25134062Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal FeaturesZhen Wang0Yingzhe Song1Lei Pang2Shanjun Li3Gang Sun4Institute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, ChinaInstitute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, ChinaInstitute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, ChinaInstitute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, ChinaInstitute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, ChinaDynamic oxygen uptake (VO<sub>2</sub>) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate modeling of long-range temporal patterns—by integrating wearable accelerometer and heart-rate streams with a convolutional neural network–LSTM (CNN-LSTM) architecture and optional attention modules. Physiological signals and VO<sub>2</sub> were recorded from 21 adults through resting assessment and cardiopulmonary exercise testing. The results showed that pairing accelerometer with heart-rate inputs improves prediction compared with considering the heart rate alone. The baseline CNN-LSTM reached <i>R</i><sup>2</sup> = 0.946, outperforming a plain LSTM (<i>R</i><sup>2</sup> = 0.926) thanks to stronger local spatio-temporal feature extraction. Introducing a spatial attention mechanism raised accuracy further (<i>R</i><sup>2</sup> = 0.962), whereas temporal attention reduced it (<i>R</i><sup>2</sup> = 0.930), indicating that attention success depends on how well the attended features align with exercise dynamics. Stacking both attentions (spatio-temporal) yielded <i>R</i><sup>2</sup> = 0.960, slightly below the value for spatial attention alone, implying that added complexity does not guarantee better performance. Across all models, prediction errors grew during high-intensity bouts, highlighting a bottleneck in capturing non-linear physiological responses under heavy load. These findings inform architecture selection for wearable metabolic monitoring and clarify when attention mechanisms add value.https://www.mdpi.com/1424-8220/25/13/4062oxygen uptakedeep learningneural networkattention mechanism
spellingShingle Zhen Wang
Yingzhe Song
Lei Pang
Shanjun Li
Gang Sun
Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
Sensors
oxygen uptake
deep learning
neural network
attention mechanism
title Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
title_full Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
title_fullStr Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
title_full_unstemmed Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
title_short Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
title_sort attention enhanced cnn lstm model for exercise oxygen consumption prediction with multi source temporal features
topic oxygen uptake
deep learning
neural network
attention mechanism
url https://www.mdpi.com/1424-8220/25/13/4062
work_keys_str_mv AT zhenwang attentionenhancedcnnlstmmodelforexerciseoxygenconsumptionpredictionwithmultisourcetemporalfeatures
AT yingzhesong attentionenhancedcnnlstmmodelforexerciseoxygenconsumptionpredictionwithmultisourcetemporalfeatures
AT leipang attentionenhancedcnnlstmmodelforexerciseoxygenconsumptionpredictionwithmultisourcetemporalfeatures
AT shanjunli attentionenhancedcnnlstmmodelforexerciseoxygenconsumptionpredictionwithmultisourcetemporalfeatures
AT gangsun attentionenhancedcnnlstmmodelforexerciseoxygenconsumptionpredictionwithmultisourcetemporalfeatures