计算机科学
聚类分析
主题模型
变压器
人工智能
班级(哲学)
文档聚类
代表(政治)
自然语言处理
多样性(控制论)
tf–国际设计公司
嵌入
情报检索
机器学习
数据挖掘
工程类
物理
电压
电气工程
政治
法学
量子力学
期限(时间)
政治学
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:964
标识
DOI:10.48550/arxiv.2203.05794
摘要
Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
科研通智能强力驱动
Strongly Powered by AbleSci AI