Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.

<h4>Background</h4>In recent decades, studies modeling the neural processing of continuous, naturalistic, speech provided new insights into how speech and language are represented in the brain. However, the linear encoder models commonly used in such studies assume that the underlying da...

Full description

Saved in:
Bibliographic Details
Main Authors: Ole Bialas, Edmund C Lalor
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0323276
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850124878988967936
author Ole Bialas
Edmund C Lalor
author_facet Ole Bialas
Edmund C Lalor
author_sort Ole Bialas
collection DOAJ
description <h4>Background</h4>In recent decades, studies modeling the neural processing of continuous, naturalistic, speech provided new insights into how speech and language are represented in the brain. However, the linear encoder models commonly used in such studies assume that the underlying data are stationary, varying to a fixed degree around a constant mean. Long, continuous, neural recordings may violate this assumption leading to impaired model performance. We aimed to examine the effect of non-stationary trends in continuous neural recordings on the performance of linear speech encoding models.<h4>Methods</h4>We used temporal response functions (TRFs) to predict continuous neural responses to speech while splitting the data into segments of varying length, prior to model fitting. Our Hypothesis was that if the data were non-stationary, segmentation should improve model performance by making individual segments approximately stationary. We simulated and predicted stationary and non-stationary recordings to test our hypothesis under a known ground truth and predicted the brain activity of participants who listened to a narrated story, to test our hypothesis on actual neural recordings.<h4>Results</h4>Simulations showed that, for stationary data, increasing segmentation steadily decreased model performance. For non-stationary data however, segmentation initially improved model performance. Modeling of neural recordings yielded similar results: segments of intermediate length (5-15 s) led to improved model performance compared to very short (1-2 s) and very long (30-120 s) segments.<h4>Conclusions</h4>We showed that data segmentation improves the performance of encoding models for both simulated and real neural data and that this can be explained by the fact that shorter segments approximate stationarity more closely. Thus, the common practice of applying encoding models to long continuous segments of data is suboptimal and recordings should be segmented prior to modeling.
format Article
id doaj-art-df1db4d9da8e40a286e85b2d9a29d365
institution OA Journals
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-df1db4d9da8e40a286e85b2d9a29d3652025-08-20T02:34:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032327610.1371/journal.pone.0323276Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.Ole BialasEdmund C Lalor<h4>Background</h4>In recent decades, studies modeling the neural processing of continuous, naturalistic, speech provided new insights into how speech and language are represented in the brain. However, the linear encoder models commonly used in such studies assume that the underlying data are stationary, varying to a fixed degree around a constant mean. Long, continuous, neural recordings may violate this assumption leading to impaired model performance. We aimed to examine the effect of non-stationary trends in continuous neural recordings on the performance of linear speech encoding models.<h4>Methods</h4>We used temporal response functions (TRFs) to predict continuous neural responses to speech while splitting the data into segments of varying length, prior to model fitting. Our Hypothesis was that if the data were non-stationary, segmentation should improve model performance by making individual segments approximately stationary. We simulated and predicted stationary and non-stationary recordings to test our hypothesis under a known ground truth and predicted the brain activity of participants who listened to a narrated story, to test our hypothesis on actual neural recordings.<h4>Results</h4>Simulations showed that, for stationary data, increasing segmentation steadily decreased model performance. For non-stationary data however, segmentation initially improved model performance. Modeling of neural recordings yielded similar results: segments of intermediate length (5-15 s) led to improved model performance compared to very short (1-2 s) and very long (30-120 s) segments.<h4>Conclusions</h4>We showed that data segmentation improves the performance of encoding models for both simulated and real neural data and that this can be explained by the fact that shorter segments approximate stationarity more closely. Thus, the common practice of applying encoding models to long continuous segments of data is suboptimal and recordings should be segmented prior to modeling.https://doi.org/10.1371/journal.pone.0323276
spellingShingle Ole Bialas
Edmund C Lalor
Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
PLoS ONE
title Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
title_full Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
title_fullStr Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
title_full_unstemmed Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
title_short Appropriate data segmentation improves speech encoding models: Analysis and simulation of electrophysiological recordings.
title_sort appropriate data segmentation improves speech encoding models analysis and simulation of electrophysiological recordings
url https://doi.org/10.1371/journal.pone.0323276
work_keys_str_mv AT olebialas appropriatedatasegmentationimprovesspeechencodingmodelsanalysisandsimulationofelectrophysiologicalrecordings
AT edmundclalor appropriatedatasegmentationimprovesspeechencodingmodelsanalysisandsimulationofelectrophysiologicalrecordings