判决
计算机科学
人工智能
词(群论)
任务(项目管理)
代表(政治)
自然语言处理
选择(遗传算法)
向量空间模型
文字嵌入
集合(抽象数据类型)
相似性(几何)
特征向量
特征(语言学)
嵌入
模式识别(心理学)
数学
语言学
哲学
几何学
管理
政治
政治学
法学
经济
图像(数学)
程序设计语言
作者
Hongpeng Tian,Jia Jiang
出处
期刊:2020 International Conference on Computer Engineering and Application (ICCEA)
日期:2020-03-01
卷期号:36: 659-663
被引量:1
标识
DOI:10.1109/iccea50009.2020.00144
摘要
The method of calculating the word vector using the neural network method provides the motivation for generating the representation model of the sentence. For the smooth inverse frequency sentence vector model, only the word frequency information on the general data set is considered to calculate the word weight, but when it is specific to the task, the problem that the different words contribute differently to the task and its weight correction is not considered. According to the distribution of characteristic words in different categories in the dataset, the task contribution factor (TCF) is proposed by using the improved information gain feature selection method. Based on this factor, a sentence vector representation model (IIG-SIF) based on task contribution is proposed. By testing on the standard text classification dataset 20 Newsgroups and the text similarity calculation dataset SICK, the IIG-SIF model has a greater improvement in the two tasks of text categorization and Text similarity calculation than the original SIF model.
科研通智能强力驱动
Strongly Powered by AbleSci AI