自编码
聚类分析
缺少数据
计算机科学
人工智能
模式识别(心理学)
数据挖掘
兰德指数
数据集
代表(政治)
模糊聚类
集合(抽象数据类型)
人工神经网络
机器学习
政治
政治学
程序设计语言
法学
作者
Suvra Jyoti Choudhury,Nikhil R. Pal
出处
期刊:IEEE transactions on emerging topics in computational intelligence
[Institute of Electrical and Electronics Engineers]
日期:2021-08-01
卷期号:5 (4): 639-650
被引量:7
标识
DOI:10.1109/tetci.2019.2949264
摘要
Most real-life data suffer from missing values. Here we deal with the problem of exploratory analysis, via clustering, of data with missing values. For this we need an effective mechanism to deal with missing features so that all available information can be used for clustering. We propose two autoencoder-based methods for handling of missing data for clustering. The autoencoder is trained in a two-phase scheme using only part of the given data set which does not have any incomplete instances in such a manner that the autoencoder is better equipped to deal with incomplete data. To cluster the entire data set which has instances with missing values, we generate the latent space representation of the all instances, with or without, missing information. Before the incomplete instances are submitted to the autoencoder, the missing inputs are filled in by a k-nearest neighbor-based rule. The clustering is then done in the latent space using the fuzzy-c-means (FCM) algorithm. In the second method, to preserve the “structure” of the input data in the latent space we extend our method by adding Sammon's stress as a regularizer to the objective function of the autoencoder. We test the effectiveness of the proposed algorithms on several data sets and compare the results with five state-of-the-art techniques. For comparison, we use two performance indicators: Normalized Mutual Information (NMI) and Adjusted Rand index (ARI).
科研通智能强力驱动
Strongly Powered by AbleSci AI