计算机科学
聚类分析
特征(语言学)
相似性(几何)
信息过载
信息敏感性
主题模型
数据挖掘
信息抽取
命名实体识别
情报检索
数据科学
计算机安全
万维网
机器学习
人工智能
任务(项目管理)
图像(数学)
管理
经济
语言学
哲学
作者
Chunyan Ma,Jun Jiang,Kai Zhang,Zhengwei Jiang,Peian Yang,Xuren Wang,Huamin Feng
标识
DOI:10.1109/trustcom60117.2023.00085
摘要
To timely respond to cyber threats related to a specific IT infrastructure called fine-grained (e.g., Windows or Linux), security analysts need to require timely and comprehensive threat information. Twitter, as a vital source of real-time threat information, provides abundant but overwhelming information due to the increased data sources. Automatically mining and summarizing fine-grained threat information from Twitter can help security analysts maintain the infrastructure's security. Most existing studies focus on classification, which carries less threat information. Some works use clustering based on text similarity relying on the embedding of text obtained from pre-trained models, which cannot be applied to short text, resulting in noisy clusters. Several works build topic models. However, the incoherent topic keywords are difficult to understand and analyze. To overcome these challenges, we design a FineCTI framework to mine the threat information related to the specific infrastructure on Twitter and generate a detailed threat information summary that is machine-readable and human-readable, efficiently reducing information overload. FineCTI optimizes the feature extraction part based on the named entity recognition model and performs clustering based on features extracted, thus effectively reducing the influence of sparsity of tweets on the clustering result and with the V-measure score improved by 7%. The cluster analysis results show that we can mine the fine-grained threats up to 15 days before the official disclosure date.
科研通智能强力驱动
Strongly Powered by AbleSci AI