秩(图论)
计算机科学
相似性(几何)
匹配(统计)
嵌入
统一
相似
学习排名
数学
理论计算机科学
人工智能
排名(信息检索)
统计
图像(数学)
组合数学
程序设计语言
作者
Behrooz Mansouri,Richard Zanibbi,Douglas W. Oard
标识
DOI:10.1145/3404835.3462956
摘要
In Mathematical Information Retrieval (MIR), formulae can be used in a query to match other similar formulae in documents. However, due to the structural complexity of formulae, specialized processing is needed for formula matching. Formulae may be represented by their appearance in Symbol Layout Trees (SLTs) or by their syntax in Operator Trees (OPTs). Previous approaches for formula retrieval used one or both of these representations and used unification to improve search results for inexact matches (e.g., allowing different variable names to match). On these representations, models for matching full expressions (trees), subexpressions, and paths have been used. Recently embedding models were used to represent formulae as vectors. In this paper, the effectiveness of retrieval models and formula representations are studied to identify their relative strengths and weaknesses. Then, a learning to rank model is proposed, using SVM-rank over similarity scores from different formula retrieval models as features. Experiments on the ARQMath formula retrieval task results show that the proposed learning to rank model is effective, producing new state-of-the-art results.
科研通智能强力驱动
Strongly Powered by AbleSci AI