Fusion of Focal Loss’s cyber threat intelligence entity extraction

Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuanbo GUO, Yongfei LI, Qingli CHEN, Chen FANG, Yangyang HU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2022-07-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539984365977600
author Yuanbo GUO
Yongfei LI
Qingli CHEN
Chen FANG
Yangyang HU
author_facet Yuanbo GUO
Yongfei LI
Qingli CHEN
Chen FANG
Yangyang HU
author_sort Yuanbo GUO
collection DOAJ
description Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features.
format Article
id doaj-art-9a110627b3eb4d808509722b478562a7
institution Kabale University
issn 1000-436X
language zho
publishDate 2022-07-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-9a110627b3eb4d808509722b478562a72025-01-14T06:29:40ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2022-07-0143859259394917Fusion of Focal Loss’s cyber threat intelligence entity extractionYuanbo GUOYongfei LIQingli CHENChen FANGYangyang HUCyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/cyber securitythreat intelligenceentity extractionlabel imbalance
spellingShingle Yuanbo GUO
Yongfei LI
Qingli CHEN
Chen FANG
Yangyang HU
Fusion of Focal Loss’s cyber threat intelligence entity extraction
Tongxin xuebao
cyber security
threat intelligence
entity extraction
label imbalance
title Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_full Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_fullStr Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_full_unstemmed Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_short Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_sort fusion of focal loss s cyber threat intelligence entity extraction
topic cyber security
threat intelligence
entity extraction
label imbalance
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/
work_keys_str_mv AT yuanboguo fusionoffocallossscyberthreatintelligenceentityextraction
AT yongfeili fusionoffocallossscyberthreatintelligenceentityextraction
AT qinglichen fusionoffocallossscyberthreatintelligenceentityextraction
AT chenfang fusionoffocallossscyberthreatintelligenceentityextraction
AT yangyanghu fusionoffocallossscyberthreatintelligenceentityextraction