文字2vec
雅卡索引
计算机科学
相似性(几何)
索引(排版)
商标
潜在语义分析
情报检索
云计算
人工智能
数据挖掘
数据科学
万维网
聚类分析
图像(数学)
嵌入
操作系统
作者
João Marcos de Rezende,Izabella Martins da Costa Rodrigues,Leandro Colombi Resendo,Karin Satie Komati
标识
DOI:10.1080/09537325.2022.2110054
摘要
Keyword search is the most ordinary tool in patent offices; however, for more advanced research, free software is not presented on their websites. Thus, this paper has the purpose to provide a data-mining framework for patent documents, linking the natural language processing techniques and data analysis algorithms. The system has two main goals: the analysis of technological prospection and the evaluation of similarities among patents through titles and abstracts. For numerical experiments, we used the base of the US Patent and Trademark Office, with over a million documents. Analysing patents about TFT-LCD, Flash Memory and PDA, from 2010 to 2018, with S-Curve it was observed that the last two technologies decline. Using a cloud of words, it was possible to see the phone's evolution, from 2010 to 2015. To evaluate the degree of similarity among patents, we investigated Latent Semantic Analysis (LSA), Word2vec, Word Mover's Distance (WMD), in three different study cases. In addition, these methods were compared with the classical Jaccard index. Numerical results show that LSA and WMD obtained similar patent indications, and the Jaccard index presented different indications from the other three.
科研通智能强力驱动
Strongly Powered by AbleSci AI