航空
计算机科学
编码(内存)
航空事故
算法
人工智能
数据挖掘
工程类
航空航天工程
作者
Yubing Gao,Guangyu Zhu,Ya Duan,Jianfeng Mao
标识
DOI:10.1109/tase.2024.3359356
摘要
Automated analysis of aviation safety reports is helpful in effectively preventing future accidents and improving emergency response capabilities. To date, there are no publicly available large-scale aviation text similarity datasets, which hinders the successful application of NLP techniques in the aviation domain. We present an automatically created aviation text similarity dataset consisting of more than 500,000 pairs for fine-tuning pretrained language models. Since technical terms have specialized meanings that differ from everyday language, we propose an efficient semantic encoding algorithm to improve the ability of embeddings to adequately represent aviation terms. We provide new solutions and revised evaluation metrics for the classification and the retrieval of safety reports, confirming the reliability of our dataset and the superiority of our algorithm. Note to Practitioners —Text representation is an essential task in natural language processing(NLP). A crucial step towards the successful application of NLP in safety reports analysis is to ensure that aviation texts are adequately encoded. Aiming at the problem of poor ability of current embeddings to represent technical terms, we automatically create an aviation text similarity dataset and propose a semantic encoding algorithm for aviation terms. It is clear that the proposed method has great potential in representation of technical terms, thus providing assistance for downstream tasks such as text classification, information retrieval and question answering.
科研通智能强力驱动
Strongly Powered by AbleSci AI