计算机科学
质量(理念)
词汇
任务(项目管理)
语言模型
信息抽取
数据挖掘
精确性和召回率
数据质量
召回率
情报检索
人工智能
工程类
公制(单位)
哲学
语言学
运营管理
系统工程
认识论
作者
Zongcai Huang,Peng Peng,Feng Lü,He Zhang
摘要
ABSTRACT Knowledge‐driven GIS increasingly requires multi‐source, multi‐type, and multi‐model crowd‐sensing spatiotemporal data, whose data quality is difficult to guarantee and determine. Hence, extracting quality indicator information, widely present in various unstructured web texts, is crucial to providing supplementary quality information for crowd‐sensing spatiotemporal data. Recent advances in large language models show potential in extracting quality indicator information. However, it is still hard to get accurate results from large language models that use different quality indicators for crowd‐sensing spatiotemporal data. Therefore, we have designed a large language model that is fine‐tuned for the extraction of spatiotemporal quality information from quality description text (LLMFT‐STQIE). Firstly, we establish a quality indicator vocabulary to determine whether the text includes quality indicator information from the spatiotemporal data. Then, we create a two‐stage prompt model with QILE and QIVE prompts that include input text, task type, instructions, the quality indicator vocabulary, output format, and a reference case. This model is based on the fine‐tuning technology of large language models. The results show that our LLMFT‐STQIE achieves an accuracy of 91% and a recall rate of 80%, respectively, representing improvements of 23% and 38% compared to untuned large language models. These results further show that the suggested method easily and accurately extracts quality indicator information from web texts for crowd‐sensing spatiotemporal data. The study helps investigate strategies for optimizing huge language models for specific scenarios or task specifications.
科研通智能强力驱动
Strongly Powered by AbleSci AI