自动汇总
计算机科学
人工智能
自然语言处理
可解释性
判决
文本图
词汇
图形
情报检索
理论计算机科学
语言学
哲学
作者
Xiangyu Luo,Jianguo Li,Zhanxuan Chen
出处
期刊:Communications in computer and information science
日期:2024-01-01
卷期号:: 3-16
标识
DOI:10.1007/978-981-99-9637-7_1
摘要
Currently, the performance of abstractive summarization is generally superior to extractive summarization, but it is inhibited for the long text by the complexity. Extractive summarization, though high speed, is redundancy and lack of sentences coherence. Based on the above, for Chinese long text summarization, we propose a hybrid model HGNN-T5 PEGASUS which includes two stages. In the first stage, we use a heterogeneous graph-based extractive model to obtain a shorter extractive summarization than the original text. The heterogeneous graph-based neural network (HGNN) incorporates sentence-level and word-level semantic nodes, which can enrich the relationship between sentences. In the second stage, we choose the T5 PEGASUS as our base model and add a category prediction mechanism to alleviate the Out-Of-Vocabulary problem and improve fidelity. Moreover, a more simplified sparse softmax is introduced to T5 PEGASUS to avoid overfitting. To demonstrate the effectiveness of our model, we constructed the SCHOLAT text summarization dataset. The results of our experiments show that the proposed model outperforms other baseline models on both the NLPCC 2018 and SCHOLAT datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI