A framework of variable-length sequence data preprocessing based on semantic perception
Deep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address thi...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
EDP Sciences
2025-04-01
|
| Series: | Xibei Gongye Daxue Xuebao |
| Subjects: | |
| Online Access: | https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849335943954169856 |
|---|---|
| author | WANG Xiaodong WANG Jiwei ZHONG Zhihao YANG Huan YAO Hongjing GUO Yangming |
| author_facet | WANG Xiaodong WANG Jiwei ZHONG Zhihao YANG Huan YAO Hongjing GUO Yangming |
| author_sort | WANG Xiaodong |
| collection | DOAJ |
| description | Deep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address this dilemma, a variable-length sequence preprocessing framework based on semantic perception is proposed, which leverages a typical unsupervised learning method to reduce the different dimensionality to the exact size and minimize information loss. Under the theoretical umbrella of minimizing information loss, information entropy is adopted to measure the semantic richness, weights to variable-length representations is assigned, and the semantic richness is used to fuse them. Extensive experiments show that the information loss of the present strategy is less than the truncated embeddings, and the apparent superiority of the present method in gaining more information capability and achieving promising performance on several text classification datasets. |
| format | Article |
| id | doaj-art-e1707804c70a476cb8483ee2a6e08740 |
| institution | Kabale University |
| issn | 1000-2758 2609-7125 |
| language | zho |
| publishDate | 2025-04-01 |
| publisher | EDP Sciences |
| record_format | Article |
| series | Xibei Gongye Daxue Xuebao |
| spelling | doaj-art-e1707804c70a476cb8483ee2a6e087402025-08-20T03:45:07ZzhoEDP SciencesXibei Gongye Daxue Xuebao1000-27582609-71252025-04-0143238839710.1051/jnwpu/20254320388jnwpu2025432p388A framework of variable-length sequence data preprocessing based on semantic perceptionWANG Xiaodong0WANG Jiwei1ZHONG Zhihao2YANG Huan3YAO Hongjing4GUO Yangming5School of Computer Science, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversitySchool of Software, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversitySchool of Cybersecurity, Northwestern Polytechnical UniversitySchool of Cybersecurity, Northwestern Polytechnical UniversityDeep learning frameworks generally adopt padding or truncation operations toward variable-length sequences in order to use efficient yet intensive batch training. However, padding leads to intensive memory consumption, and truncation inevitably loses the original semantic information. To address this dilemma, a variable-length sequence preprocessing framework based on semantic perception is proposed, which leverages a typical unsupervised learning method to reduce the different dimensionality to the exact size and minimize information loss. Under the theoretical umbrella of minimizing information loss, information entropy is adopted to measure the semantic richness, weights to variable-length representations is assigned, and the semantic richness is used to fuse them. Extensive experiments show that the information loss of the present strategy is less than the truncated embeddings, and the apparent superiority of the present method in gaining more information capability and achieving promising performance on several text classification datasets.https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html变长序列数据预处理填充截断语义信息最大化信息 |
| spellingShingle | WANG Xiaodong WANG Jiwei ZHONG Zhihao YANG Huan YAO Hongjing GUO Yangming A framework of variable-length sequence data preprocessing based on semantic perception Xibei Gongye Daxue Xuebao 变长序列 数据预处理 填充 截断 语义信息 最大化信息 |
| title | A framework of variable-length sequence data preprocessing based on semantic perception |
| title_full | A framework of variable-length sequence data preprocessing based on semantic perception |
| title_fullStr | A framework of variable-length sequence data preprocessing based on semantic perception |
| title_full_unstemmed | A framework of variable-length sequence data preprocessing based on semantic perception |
| title_short | A framework of variable-length sequence data preprocessing based on semantic perception |
| title_sort | framework of variable length sequence data preprocessing based on semantic perception |
| topic | 变长序列 数据预处理 填充 截断 语义信息 最大化信息 |
| url | https://www.jnwpu.org/articles/jnwpu/full_html/2025/02/jnwpu2025432p388/jnwpu2025432p388.html |
| work_keys_str_mv | AT wangxiaodong aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT wangjiwei aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT zhongzhihao aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT yanghuan aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT yaohongjing aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT guoyangming aframeworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT wangxiaodong frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT wangjiwei frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT zhongzhihao frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT yanghuan frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT yaohongjing frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception AT guoyangming frameworkofvariablelengthsequencedatapreprocessingbasedonsemanticperception |