聚类分析
计算机科学
中心(范畴论)
算法
人工智能
结晶学
化学
作者
Longkun Guo,Chaoqi Jia,Kewen Liao,Zhigang Lü,Minhui Xue
标识
DOI:10.1109/tnnls.2025.3574268
摘要
Many practical applications impose a new challenge of utilizing instance-level background knowledge (e.g., subsets of similar or dissimilar data points) within their input data to improve clustering results. In this work, we build on the widely adopted k-center clustering, modeling its input instance-level background knowledge as must-link (ML) and cannot-link (CL) constraint sets, and formulate the constrained k-center problem. Given the long-standing challenge of developing efficient algorithms for constrained clustering problems, we first derive an efficient approximation algorithm for constrained k-center at the best possible approximation ratio of 2 with linear programming (LP)-rounding technology. Recognizing the limitations of LP-rounding algorithms including high runtime complexity and challenges in parallelization, we subsequently develop a greedy algorithm that does not rely on the LP and can be efficiently parallelized. This algorithm also achieves the same approximation ratio 2 but with lower runtime complexity. Lastly, we empirically evaluate our approximation algorithm against baselines on various real datasets, validating our theoretical findings and demonstrating significant advantages of our algorithm in terms of clustering cost, quality, and runtime complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI