Fusion of Focal Loss’s cyber threat intelligence entity extraction

Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuanbo GUO, Yongfei LI, Qingli CHEN, Chen FANG, Yangyang HU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2022-07-01
Series:	Tongxin xuebao
Subjects:	cyber security threat intelligence entity extraction label imbalance
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539984365977600
author	Yuanbo GUO Yongfei LI Qingli CHEN Chen FANG Yangyang HU
author_facet	Yuanbo GUO Yongfei LI Qingli CHEN Chen FANG Yangyang HU
author_sort	Yuanbo GUO
collection	DOAJ
description	Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features.
format	Article
id	doaj-art-9a110627b3eb4d808509722b478562a7
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2022-07-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-9a110627b3eb4d808509722b478562a72025-01-14T06:29:40ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2022-07-0143859259394917Fusion of Focal Loss’s cyber threat intelligence entity extractionYuanbo GUOYongfei LIQingli CHENChen FANGYangyang HUCyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/cyber securitythreat intelligenceentity extractionlabel imbalance
spellingShingle	Yuanbo GUO Yongfei LI Qingli CHEN Chen FANG Yangyang HU Fusion of Focal Loss’s cyber threat intelligence entity extraction Tongxin xuebao cyber security threat intelligence entity extraction label imbalance
title	Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_full	Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_fullStr	Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_full_unstemmed	Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_short	Fusion of Focal Loss’s cyber threat intelligence entity extraction
title_sort	fusion of focal loss s cyber threat intelligence entity extraction
topic	cyber security threat intelligence entity extraction label imbalance
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022132/
work_keys_str_mv	AT yuanboguo fusionoffocallossscyberthreatintelligenceentityextraction AT yongfeili fusionoffocallossscyberthreatintelligenceentityextraction AT qinglichen fusionoffocallossscyberthreatintelligenceentityextraction AT chenfang fusionoffocallossscyberthreatintelligenceentityextraction AT yangyanghu fusionoffocallossscyberthreatintelligenceentityextraction

Fusion of Focal Loss’s cyber threat intelligence entity extraction

Similar Items