Chinese adversarial text generation method based on punctuation insertion
The susceptibility of natural language processing models to adversarial texts has been a significant concern. Current methods for generating adversarial texts in Chinese were mainly based on replacing characters with visually similar or homophonic ones. However, when faced with robust pre-trained mo...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
POSTS&TELECOM PRESS Co., LTD
2025-04-01
|
| Series: | 网络与信息安全学报 |
| Subjects: | |
| Online Access: | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2025026 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850132876792692736 |
|---|---|
| author | ZHANG Qian YAN Qiao |
| author_facet | ZHANG Qian YAN Qiao |
| author_sort | ZHANG Qian |
| collection | DOAJ |
| description | The susceptibility of natural language processing models to adversarial texts has been a significant concern. Current methods for generating adversarial texts in Chinese were mainly based on replacing characters with visually similar or homophonic ones. However, when faced with robust pre-trained models, these methods led to increased perturbations in adversarial texts, resulting in reduced fluency and readability, and thus generating low-quality adversarial texts. Moreover, symbol insertion methods used in English adversarial texts were not entirely applicable to Chinese. Additionally, in a black-box scenario, the lack of prior knowledge made it difficult to generate high-quality adversarial texts. A punctuation-based method for generating adversarial texts for Chinese text classification tasks was proposed. Under a black-box setting, a novel part-of-speech importance calculation was utilized and combined with punctuation insertion to design a character-level perturbation approach suitable for Chinese, achieving the generation of adversarial texts. Experiments were conducted, and the results demonstrated that for text classification tasks, the proposed method significantly improved the attack success rate on LSTM and BERT models trained with two real-world datasets. Furthermore, the method successfully avoided direct destruction of the original sentences and maintained the original meaning. In the tests, a semantic similarity of up to 97% was achieved, which was significantly better than the baseline methods. |
| format | Article |
| id | doaj-art-cb6d97c9929a48c68f69e95a05442bad |
| institution | OA Journals |
| issn | 2096-109X |
| language | English |
| publishDate | 2025-04-01 |
| publisher | POSTS&TELECOM PRESS Co., LTD |
| record_format | Article |
| series | 网络与信息安全学报 |
| spelling | doaj-art-cb6d97c9929a48c68f69e95a05442bad2025-08-20T02:32:07ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2025-04-011116117499195861Chinese adversarial text generation method based on punctuation insertionZHANG QianYAN QiaoThe susceptibility of natural language processing models to adversarial texts has been a significant concern. Current methods for generating adversarial texts in Chinese were mainly based on replacing characters with visually similar or homophonic ones. However, when faced with robust pre-trained models, these methods led to increased perturbations in adversarial texts, resulting in reduced fluency and readability, and thus generating low-quality adversarial texts. Moreover, symbol insertion methods used in English adversarial texts were not entirely applicable to Chinese. Additionally, in a black-box scenario, the lack of prior knowledge made it difficult to generate high-quality adversarial texts. A punctuation-based method for generating adversarial texts for Chinese text classification tasks was proposed. Under a black-box setting, a novel part-of-speech importance calculation was utilized and combined with punctuation insertion to design a character-level perturbation approach suitable for Chinese, achieving the generation of adversarial texts. Experiments were conducted, and the results demonstrated that for text classification tasks, the proposed method significantly improved the attack success rate on LSTM and BERT models trained with two real-world datasets. Furthermore, the method successfully avoided direct destruction of the original sentences and maintained the original meaning. In the tests, a semantic similarity of up to 97% was achieved, which was significantly better than the baseline methods.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2025026Chinese text classificationadversarial text generationblack-box attack |
| spellingShingle | ZHANG Qian YAN Qiao Chinese adversarial text generation method based on punctuation insertion 网络与信息安全学报 Chinese text classification adversarial text generation black-box attack |
| title | Chinese adversarial text generation method based on punctuation insertion |
| title_full | Chinese adversarial text generation method based on punctuation insertion |
| title_fullStr | Chinese adversarial text generation method based on punctuation insertion |
| title_full_unstemmed | Chinese adversarial text generation method based on punctuation insertion |
| title_short | Chinese adversarial text generation method based on punctuation insertion |
| title_sort | chinese adversarial text generation method based on punctuation insertion |
| topic | Chinese text classification adversarial text generation black-box attack |
| url | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2025026 |
| work_keys_str_mv | AT zhangqian chineseadversarialtextgenerationmethodbasedonpunctuationinsertion AT yanqiao chineseadversarialtextgenerationmethodbasedonpunctuationinsertion |