计算机科学
向量空间模型
可解释性
特征向量
代表(政治)
排名(信息检索)
词汇
人工智能
向量空间
支持向量机
情报检索
数学
语言学
哲学
几何学
政治
政治学
法学
标识
DOI:10.1145/3583780.3615282
摘要
While dense retrieval has become a promising alternative to the traditional text retrieval models, such as BM25, some recent studies show that multi-vector dense retrieval models are more effective than the single-vector method in retrieval tasks. However, due to a lack of interpretability, why the multi-vector method outperforms its single-vector counterpart has not been fully studied. To fill this research gap, in this work, we investigate and compare the behaviors of single-vector and multi-vector models in retrieval. Specifically, we analyze the vocabulary distribution of dense representations by mapping them back to the sparse, vocabulary space. Our empirical findings show that the multi-vector representation has more lexical overlaps between queries and passages. Additionally, we show that this feature of multi-vector representation can enhance its ranking performance when a given passage can fulfill different information needs and thus can be retrieved by different queries. These results shed light on the internal mechanisms of multi-vector representation and may provide new perspectives for future research.
科研通智能强力驱动
Strongly Powered by AbleSci AI