范畴变量
可解释性
潜变量
计算机科学
潜变量模型
数据挖掘
人工智能
潜在类模型
概率逻辑
分段
数学
机器学习
数学分析
作者
Kai Wang,Jian Li,Fugee Tsung
标识
DOI:10.1080/24725854.2022.2106390
摘要
High-Dimensional (HD) processes have become prevalent in many data-intensive scientific domains and engineering applications. The monitoring of HD categorical data, where each variable of interest is evaluated by attribute levels or nominal values, however, has seldom been studied. As the joint distribution of HD categorical variables can be fully characterized by a high-way contingency table or a high-order tensor, we propose a Probabilistic Tensor Decomposition (PTD) which factorizes a huge tensor into a few latent classes (rank-one tensors) to dramatically reduce the number of model parameters. Moreover, to enable high interpretability of this latent-class-type PTD model, a novel polarization regularization is devised, which makes each latent class focus on only a few vital combinations of attribute levels of categorical variables. An Expectation-Maximization algorithm is designed for parameter estimation from a historical normal dataset in Phase I, and an exponentially weighted moving average control chart is built in Phase II to monitor the proportions of latent classes that act as surrogates for each original categorical vector. Extensive simulations and a real case study validate the superior inference and monitoring performance of our proposed efficient and interpretable method.
科研通智能强力驱动
Strongly Powered by AbleSci AI