A two-channel end-to-end network based on dynamic corpus of knowledge graph for intelligent recognition of Traditional Chinese Medicine terminology

术语计算机科学人工智能自然语言处理可读性领域（数学）发音分类器（UML）统一医学语言系统语言学数学哲学程序设计语言纯数学

作者

yulu Wu,Kun Wang,Xiufeng Liu

出处

期刊：Research Square - Research Square 日期：2023-12-07

链接

researchsquare.comdoi.org

标识

DOI：10.21203/rs.3.rs-3712568/v1

摘要

Abstract The accurate analysis of Traditional Chinese Medicine (TCM) terminology is a research hotspot in the field of TCM, as it can provide a convenient way of information exchange between TCM and patients, thus achieving accurate diagnosis and treatment. TCM terminology includes two forms: speech and text. Currently, the methods used for TCM terminology recognition often adopt deep learning models. However, the existing deep learning methods are hindered by insufficient corpus and defects of the end-to-end learning framework, which leads to the low accuracy of TCM terminology recognition. To solve the above problems, this paper first combines the information of text and picture of TCM terminology and proposes an extended model of TCM terminology corpus. Joint optimization of text and picture-based knowledge graph TCM terminology corpus expansion model is incorporated, and the traditional corpus is supplemented by incrementally constructing a dynamic TCM terminology corpus. Secondly, the text-speech end-to-end conversion mechanism is used to realize the synchronous incremental expansion of the TCM dynamic speech corpus. After that, the TCM dynamic speech corpus is deeply trained through a unified streaming and non-streaming two-pass end-to-end model, to realize the accurate recognition of the speech of TCM terminology. Due to the language habits of TCM experts, there are many redundant words in TCM pronunciation, which greatly reduces the readability of the text of TCM terminology. In this paper, we proposed a directed acyclic graph-(DAG) and dynamic programming-(DP) based redundant word detection framework to screen redundant words and realize the accurate identification of TCM terminology. The results show that the accuracy of the speech recognition algorithm proposed in this paper for the speech of TCM terminology is increased by 12.92%, 11.18%, and 19.95% compared with three public speech recognition engines of Iflytek, Aliyun, and Baidu, respectively. As an incremental contribution, this paper optimizes the redundant words based on speech recognition, which greatly improves the ability to conformant TCM terminology.

求助该文献

A two-channel end-to-end network based on dynamic corpus of knowledge graph for intelligent recognition of Traditional Chinese Medicine terminology

今日热心研友