相关性(法律)
计算机科学
情报检索
水准点(测量)
查询扩展
选择(遗传算法)
光学(聚焦)
数据挖掘
构造(python库)
数据科学
人工智能
法学
政治学
大地测量学
物理
程序设计语言
地理
光学
作者
Yixiao Ma,Yunqiu Shao,Yueyue Wu,Yiqun Liu,Ruizhe Zhang,Min Zhang,Shaoping Ma
标识
DOI:10.1145/3404835.3463250
摘要
Legal case retrieval is of vital importance for ensuring justice in different kinds of law systems and has recently received increasing attention in information retrieval (IR) research. However, the relevance judgment criteria of previous retrieval datasets are either not applicable to non-cited relationship cases or not instructive enough for future datasets to follow. Besides, most existing benchmark datasets do not focus on the selection of queries. In this paper, we construct the Chinese Legal Case Retrieval Dataset (LeCaRD), which contains 107 query cases and over 43,000 candidate cases. Queries and results are adopted from criminal cases published by the Supreme People's Court of China. In particular, to address the difficulty in relevance definition, we propose a series of relevance judgment criteria designed by our legal team and corresponding candidate case annotations are conducted by legal experts. Also, we develop a novel query sampling strategy that takes both query difficulty and diversity into consideration. For dataset evaluation, we implemented several existing retrieval models on LeCaRD as baselines. The dataset is now available to the public together with the complete data processing details.
科研通智能强力驱动
Strongly Powered by AbleSci AI