Traffic concealed data detection method based on contrastive learning and pre-trained Transformer

To solve the problems of characterizing representing massive encrypted traffic, perceiving malicious behaviors, and identifying the ownership of privacy data, a traffic concealed data detection method was proposed based on contrastive learning and pre-trained Transformer. Considering the high comple...

Full description

Saved in:
Bibliographic Details
Main Authors: HE Shuai, ZHANG Jingchao, XU Di, JIANG Shuai, GUO Xiaowei, FU Cai
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2025-03-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/thesisDetails#10.11959/j.issn.1000-436x.2025043
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To solve the problems of characterizing representing massive encrypted traffic, perceiving malicious behaviors, and identifying the ownership of privacy data, a traffic concealed data detection method was proposed based on contrastive learning and pre-trained Transformer. Considering the high complexity, unstructured nature of encrypted traffic, and the insufficient performance of traditional fine-tuning methods for downstream tasks in the encrypted traffic domain, data packets were first transformed into tokens which was similar to those used in natural language processing. Then, a pre-trained Transformer model was utilized to convert shallow representations into a general traffic representation, which was suitable for various encrypted traffic downstream tasks. By transforming the problem of concealed data detection into a similarity analysis problem, a diversity-sensitive Transformer architecture was developed leveraging contrastive learning, which enhanced the model’s sensitivity to traffic differences through the use of positive and negative sample pairs, and using information noise contrastive estimation (Info NCE) as the loss function for fine-tuning downstream tasks of encrypted traffic. Experimental results show that the proposed method outperforms mainstream methods in terms of accuracy, precision, recall and F1 score.
ISSN:1000-436X