计算机科学
多义
自然语言处理
人工智能
语义相似性
文字嵌入
词典
学期
相似性(几何)
词(群论)
嵌入
语义压缩
语义计算
语言学
语义网
语义技术
任务(项目管理)
管理
经济
哲学
图像(数学)
作者
Zhuo Zhuang,Yuquan Chen
标识
DOI:10.1007/978-3-030-01716-3_17
摘要
Word embeddings have recently been widely used to model words in Natural Language Processing (NLP) tasks including semantic similarity measurement. However, word embeddings are not able to capture polysemy, because a polysemous word is represented by a single vector. To address this problem, learning multiple embedding vectors for different senses of a word is necessary and intuitive. We present a novel approach based on a Chinese lexicon to learn sense embeddings. Every sense is represented by a vector that consists of semantic contributions made by senses explaining it. To make full use of the lexicon’s advantages and address its drawbacks, we perform representation expansion to make sparse embedding vectors dense and disambiguate in gloss polysemous words by semantic contribution allocation. Thanks to the use of an intuitive way of noise filtering, we achieve noticeable improvement both in dimensionality reduction and semantic similarity measurement. We perform experiments on a translated version of Miller-Charles dataset and report state-of-the-art performance on semantic similarity measurement. We also apply our approach to SemEval-2012 Task4: Evaluating Chinese Word Similarity, which uses a translated version of wordsim353 as the standard dataset, and our approach also noticeably outperforms conventional approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI