Measuring the Novelty of Scientific Literature Through Contribution Sentence Analysis Using Deep Learning and Cloud Model

新颖性云计算判决深度学习计算机科学自然语言处理人工智能数据科学语言学心理学哲学社会心理学操作系统

作者

Zhongyi Wang,Haoxuan Zhang,Jiangping Chen,Haihua Chen

标识

摘要

The ex-ante novelty measurement of scientific literature is an essential tool for academic data mining and scientific communication. It can help researchers and peer experts quickly identify highly creative articles among a large number of papers. This paper proposes a framework for novelty measurement of scientific literature based on contribution sentence analysis. In the framework, to obtain the best models for contribution sentence identification and classification, we first implement eight state-of-the-art deep learning models, and compare their performances on contribution sentence identification and classification respectively. The selected contribution sentence identification model achieves the best recall and F1 scores, whose values are 0.963, and 0.929, respectively. The best contribution sentence classification model score 0.897 on Micro F1. Second, to represent each contribution sentence, we generate the contribution sentence cloud in the second part using the BERTopic model and the backward normal cloud generator. In the third part, we calculate the novelty scores of scientific literature using the cloud similarity algorithm. Finally, with the gold standard constructed manually, we perform three comparative experiments with the semantic novelty measurement on the International Conference on Learning Representations (ICLR 2017-2022) dataset. In terms of the correlation analysis results, our measurement has a bigger correlation coefficient with the gold standard than the semantic novelty measurement (0.805>0.580) at a p-value less than 0.0001. In the distribution of differences from the gold standard, our measurement has 2,584 (79.2%) articles falling within the range of ±1.5, compared to 1,519 (46.6%) articles for the semantic novelty measurement. As for boxplots, the results of our measurement are also closer to the gold standard than the semantic novelty measurement. The above experimental results show that our measurement is more feasible and effective than the semantic novelty measurement. Our framework benefits several communities, such as researchers, librarians, science evaluation institutions, policymakers, funding agencies, and others.

求助该文献

最长约 10秒，即可获得该文献文件

Measuring the Novelty of Scientific Literature Through Contribution Sentence Analysis Using Deep Learning and Cloud Model

今日热心研友