计算机科学
语义相似性
相似性(几何)
可扩展性
语义分析
数据挖掘
语义网
管道(软件)
情报检索
语义计算
人工智能
理论计算机科学
数据库
程序设计语言
图像(数学)
作者
Carsten Felix Draschner,Hajira Jabeen,Jens Lehmann
出处
期刊:International journal of semantic computing
[World Scientific]
日期:2023-04-18
卷期号:17 (02): 199-221
标识
DOI:10.1142/s1793351x23600012
摘要
In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).
科研通智能强力驱动
Strongly Powered by AbleSci AI