自动汇总
计算机科学
判决
人工智能
自然语言处理
任务(项目管理)
语义学(计算机科学)
代表(政治)
多文档摘要
领域(数学分析)
情报检索
经济
管理
程序设计语言
法学
数学分析
政治
数学
政治学
作者
Jizhao Zhu,Wenyu Duan,Naitong Yu,Xinlong Pan,Chunlong Fan
标识
DOI:10.1007/978-3-031-46664-9_22
摘要
The extractive automatic summarization method is capable of quickly and efficiently generating summaries through the steps of scoring, extracting and eliminating redundant sentences. Currently, most extractive methods utilize deep learning technology to treat automatic summarization as a binary classification task. However, the effectiveness of automatic summarization for Chinese long text is limited by the maximum input length of the model, and it requires a large amount of training data. This paper proposes an unsupervised extractive automatic summarization method which solves the long text encoding problem by incorporating contextual semantics into sentence-level encoding. Firstly, we obtain the semantic representation of sentences by using the RoBERTa model. Secondly, we propose an improved k-Means algorithm to cluster sentence representations. By defining sparse and dense clusters, we improve the accuracy of summary sentence selection while preserving maximum semantic information from the original text. Experimental results on the CAIL2020 dataset show that our method outperforms baselines by 6.64/7.68/7.14% respectively on ROUGE-1/2/L. Moreover, we further enhance the automatic summarization results by 4.5/5.36/3.24% by adding domain rules tailored to the dataset’s characteristics.
科研通智能强力驱动
Strongly Powered by AbleSci AI