An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities
IntroductionRecently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human act...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-02-01
|
| Series: | Frontiers in Neuroinformatics |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fninf.2025.1526259/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849716487411990528 |
|---|---|
| author | Yuanyuan Zhang Manli Tian Baolin Liu |
| author_facet | Yuanyuan Zhang Manli Tian Baolin Liu |
| author_sort | Yuanyuan Zhang |
| collection | DOAJ |
| description | IntroductionRecently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions.MethodsTo effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.Results and discussionOur findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex. |
| format | Article |
| id | doaj-art-6db79ea8334b442394386e6e00f8eee7 |
| institution | DOAJ |
| issn | 1662-5196 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Neuroinformatics |
| spelling | doaj-art-6db79ea8334b442394386e6e00f8eee72025-08-20T03:12:58ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962025-02-011910.3389/fninf.2025.15262591526259An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activitiesYuanyuan ZhangManli TianBaolin LiuIntroductionRecently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions.MethodsTo effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.Results and discussionOur findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex.https://www.frontiersin.org/articles/10.3389/fninf.2025.1526259/fullfunctional magnetic resonance imagingdecodingaction semanticthree-dimension convolutional neural networkmulti-subject model |
| spellingShingle | Yuanyuan Zhang Manli Tian Baolin Liu An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities Frontiers in Neuroinformatics functional magnetic resonance imaging decoding action semantic three-dimension convolutional neural network multi-subject model |
| title | An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| title_full | An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| title_fullStr | An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| title_full_unstemmed | An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| title_short | An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| title_sort | action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities |
| topic | functional magnetic resonance imaging decoding action semantic three-dimension convolutional neural network multi-subject model |
| url | https://www.frontiersin.org/articles/10.3389/fninf.2025.1526259/full |
| work_keys_str_mv | AT yuanyuanzhang anactiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities AT manlitian anactiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities AT baolinliu anactiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities AT yuanyuanzhang actiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities AT manlitian actiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities AT baolinliu actiondecodingframeworkcombinedwithdeepneuralnetworkforpredictingthesemanticsofhumanactionsinvideosfromevokedbrainactivities |