计算机科学
推荐系统
情报检索
数据挖掘
能见度
云计算
数据科学
领域(数学分析)
数学分析
物理
数学
光学
操作系统
作者
Zitong Zhang,Yaseen Ashraf
标识
DOI:10.1109/icict58900.2023.00040
摘要
Nowadays, with the rapid development of cloud data and online collaboration platforms, there is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. On one hand, sharing data with collaborators increases the visibility of the work. On the other hand, the abundance of data on multiple platforms makes it hard for researchers to find data relevant to their own research. To overcome this challenge, a dataset recommendation system capable of finding relevant datasets from multiple resources would be helpful. In the past two decades, few dataset recommendation methods have been implemented, that are mostly domain-specific or simply recommend datasets based on keywords. We believe a general dataset recommender system that recommends datasets with information either extracted from another dataset or supplied by researchers can enhance researchers’ efficiency in searching for relevant data and significantly improve their research efficiency. This work adopts an information retrieval (IR) paradigm for dataset recommendation. By extracting summary information from each dataset and generating a profile for each, we use and compare multiple content-based recommendation methods to recommend the most-relevant datasets in GEO, SRA, and several other repositories. Our results and evaluations prove the usefulness and need for such system.
科研通智能强力驱动
Strongly Powered by AbleSci AI