文档聚类
计算机科学
聚类分析
搜索引擎索引
非负矩阵分解
维数之咒
人工智能
语义学(计算机科学)
线性判别分析
模式识别(心理学)
矩阵分解
地点
数据挖掘
特征向量
量子力学
哲学
程序设计语言
语言学
物理
作者
Deng Cai,Xiaofei He,Jiawei Han
标识
DOI:10.1109/tkde.2005.198
摘要
We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using locality preserving indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on latent semantic indexing (LSI) or nonnegative matrix factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised linear discriminant analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.
科研通智能强力驱动
Strongly Powered by AbleSci AI