自编码
聚类分析
嵌入
计算机科学
人工智能
模式识别(心理学)
相关聚类
代表(政治)
特征学习
高斯分布
CURE数据聚类算法
高维数据聚类
数据挖掘
深度学习
物理
量子力学
政治
政治学
法学
作者
Manar D. Samad,Sakib Abrar,Mohammad Bataineh
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:1
标识
DOI:10.48550/arxiv.2301.00802
摘要
The latent space of autoencoders has been improved for clustering image data by jointly learning a t-distributed embedding with a clustering algorithm inspired by the neighborhood embedding concept proposed for data visualization. However, multivariate tabular data pose different challenges in representation learning than image data, where traditional machine learning is often superior to deep tabular data learning. In this paper, we address the challenges of learning tabular data in contrast to image data and present a novel Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS) algorithm by replacing t-distributions with multivariate Gaussian clusters. Unlike current methods, the proposed approach independently defines the Gaussian embedding and the target cluster distribution to accommodate any clustering algorithm in representation learning. A trained G-CEALS model extracts a quality embedding for unseen test data. Based on the embedding clustering accuracy, the average rank of the proposed G-CEALS method is 1.4 (0.7), which is superior to all eight baseline clustering and cluster embedding methods on seven tabular data sets. This paper shows one of the first algorithms to jointly learn embedding and clustering to improve multivariate tabular data representation in downstream clustering.
科研通智能强力驱动
Strongly Powered by AbleSci AI