A framework of variable-length sequence data preprocessing based on semantic perception

Deep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address thi...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG Xiaodong, WANG Jiwei, ZHONG Zhihao, YANG Huan, YAO Hongjing, GUO Yangming
Format: Article
Language:zho
Published: EDP Sciences 2025-04-01
Series:Xibei Gongye Daxue Xuebao
Subjects:
Online Access:https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849335943954169856
author WANG Xiaodong
WANG Jiwei
ZHONG Zhihao
YANG Huan
YAO Hongjing
GUO Yangming
author_facet WANG Xiaodong
WANG Jiwei
ZHONG Zhihao
YANG Huan
YAO Hongjing
GUO Yangming
author_sort WANG Xiaodong
collection DOAJ
description Deep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address this dilemma, a variable-length sequence preprocessing framework based on semantic perception is proposed, which leverages a typical unsupervised learning method to reduce the different dimensionality to the exact size and minimize information loss. Under the theoretical umbrella of minimizing information loss, information entropy is adopted to measure the semantic richness, weights to variable-length representations is assigned, and the semantic richness is used to fuse them. Extensive experiments show that the information loss of the present strategy is less than the truncated embeddings, and the apparent superiority of the present method in gaining more information capability and achieving promising performance on several text classification datasets.
format Article
id doaj-art-e1707804c70a476cb8483ee2a6e08740
institution Kabale University
issn 1000-2758
2609-7125
language zho
publishDate 2025-04-01
publisher EDP Sciences
record_format Article
series Xibei Gongye Daxue Xuebao
spelling doaj-art-e1707804c70a476cb8483ee2a6e087402025-08-20T03:45:07ZzhoEDP SciencesXibei Gongye Daxue Xuebao1000-27582609-71252025-04-0143238839710.1051/jnwpu/20254320388jnwpu2025432p388A framework of variable-length sequence data preprocessing based on semantic perceptionWANG Xiaodong0WANG Jiwei1ZHONG Zhihao2YANG Huan3YAO Hongjing4GUO Yangming5School of Computer Science, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversitySchool of Software, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversitySchool of Cybersecurity, Northwestern Polytechnical UniversitySchool of Cybersecurity, Northwestern Polytechnical UniversityDeep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address this dilemma, a variable-length sequence preprocessing framework based on semantic perception is proposed, which leverages a typical unsupervised learning method to reduce the different dimensionality to the exact size and minimize information loss. Under the theoretical umbrella of minimizing information loss, information entropy is adopted to measure the semantic richness, weights to variable-length representations is assigned, and the semantic richness is used to fuse them. Extensive experiments show that the information loss of the present strategy is less than the truncated embeddings, and the apparent superiority of the present method in gaining more information capability and achieving promising performance on several text classification datasets.https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html变长序列数据预处理填充截断语义信息最大化信息
spellingShingle WANG Xiaodong
WANG Jiwei
ZHONG Zhihao
YANG Huan
YAO Hongjing
GUO Yangming
A framework of variable-length sequence data preprocessing based on semantic perception
Xibei Gongye Daxue Xuebao
变长序列
数据预处理
填充
截断
语义信息
最大化信息
title A framework of variable-length sequence data preprocessing based on semantic perception
title_full A framework of variable-length sequence data preprocessing based on semantic perception
title_fullStr A framework of variable-length sequence data preprocessing based on semantic perception
title_full_unstemmed A framework of variable-length sequence data preprocessing based on semantic perception
title_short A framework of variable-length sequence data preprocessing based on semantic perception
title_sort framework of variable length sequence data preprocessing based on semantic perception
topic 变长序列
数据预处理
填充
截断
语义信息
最大化信息
url https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html
work_keys_str_mv AT wangxiaodong aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT wangjiwei aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT zhongzhihao aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT yanghuan aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT yaohongjing aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT guoyangming aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT wangxiaodong frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT wangjiwei frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT zhongzhihao frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT yanghuan frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT yaohongjing frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception
AT guoyangming frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception