发布文献求助

A learning-based approach for automatic construction of domain glossary from source code and documentation

词汇表计算机科学文档标识符自然语言处理领域（数学分析）软件文档人工智能启发式背景（考古学）情报检索集合（抽象数据类型）内部文档源代码自然语言软件软件开发程序设计语言语言学软件开发过程软件建设哲学古生物学数学分析操作系统生物数学

作者

Chong Wang,Xin Peng,Mingwei Liu,Zhenchang Xing,Xuefang Bai,Bing Xie,Tuo Wang

标识

DOI：10.1145/3338906.3338963

摘要

A domain glossary that organizes domain-specific concepts and their aliases and relations is essential for knowledge acquisition and software development. Existing approaches use linguistic heuristics or term-frequency-based statistics to identify domain specific terms from software documentation, and thus the accuracy is often low. In this paper, we propose a learning-based approach for automatic construction of domain glossary from source code and software documentation. The approach uses a set of high-quality seed terms identified from code identifiers and natural language concept definitions to train a domain-specific prediction model to recognize glossary terms based on the lexical and semantic context of the sentences mentioning domain-specific concepts. It then merges the aliases of the same concepts to their canonical names, selects a set of explanation sentences for each concept, and identifies "is a", "has a", and "related to" relations between the concepts. We apply our approach to deep learning domain and Hadoop domain and harvest 5,382 and 2,069 concepts together with 16,962 and 6,815 relations respectively. Our evaluation validates the accuracy of the extracted domain glossary and its usefulness for the fusion and acquisition of knowledge from different documents of different projects.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 论文查重

更新

大幅提高文件上传限制，最高150M (2024-4-1)

更新

新增期刊收藏功能 (2024-03-23)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: luodd完成签到，获得积分10

刚刚; 顾矜上传了应助文件

1秒前; 涂涂完成签到，获得积分10

1秒前; Rafa发布了新的文献求助10

1秒前; Sunny发布了新的文献求助10

2秒前; CodeCraft上传了应助文件

3秒前; 1484发布了新的文献求助10

3秒前; 学习快乐上传了应助文件

4秒前; xiutang发布了新的文献求助10

6秒前; YanZhe完成签到，获得积分10

6秒前; 田様上传了应助文件

6秒前; 所所上传了应助文件

6秒前; 天道酬勤完成签到，获得积分10

7秒前; 顾矜的应助被我爱电催化采纳，获得10

7秒前; 大个的应助被冷酷的风华采纳，获得10

8秒前; 团团发布了新的文献求助10

9秒前; 1484完成签到，获得积分10

10秒前; 研友_LOqqmZ发布了新的文献求助10

11秒前; 刍青完成签到，获得积分10

12秒前; liang发布了新的文献求助10

12秒前; 罗布林卡的应助被瑶瑶爱七七采纳，获得30

14秒前; Rafa完成签到，获得积分10

14秒前; Solar energy完成签到，获得积分10

16秒前; Rousongxiaobei发布了新的文献求助10

16秒前; Hello的应助被研友_LOqqmZ采纳，获得20

17秒前; 激昂的沂完成签到，获得积分10

18秒前; 大个的应助被科研dog采纳，获得10

18秒前; ganson完成签到，获得积分10

18秒前; 有机化学驳回了忐忑的远山的应助

19秒前; fh发布了新的文献求助10

21秒前; 科研通AI2.0上传了应助文件

22秒前; 独特的莫言完成签到，获得积分10

28秒前; huijie完成签到，获得积分10

29秒前; 大个上传了应助文件

29秒前; dada发布了新的文献求助10

31秒前; kk发布了新的文献求助10

33秒前; 赘婿的应助被sunidea采纳，获得10

33秒前; 小马甲的应助被瑶瑶爱七七采纳，获得30

34秒前; 灵溪完成签到，获得积分10

34秒前; 杨羕发布了新的文献求助10

35秒前

高分求助中: Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000; Sport in der Antike 800; De arte gymnastica. The art of gymnastics 600; 少脉山油柑叶的化学成分研究 530; Mechanical Methods of the Activation of Chemical Processes 510; Berns Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500; Stephen R. Mackinnon - Chen Hansheng: China’s Last Romantic Revolutionary (2023) 500

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 2420584; 求助须知：如何正确求助？哪些是违规求助？ 2110954; 关于积分的说明 5341899; 捐赠科研通 1838193; 什么是DOI，文献DOI怎么找？ 915271; 版权声明 561142; 科研通“疑难数据库（出版商）”最低求助积分说明 489400

今日热心研友

个性的紫菜

酸化土壤改良

忐忑的远山

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2024 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：826996720【点击一键加群】如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通