计算机科学
聚类分析
质心
系列(地层学)
预处理器
度量(数据仓库)
数据挖掘
稳健性(进化)
层次聚类
可扩展性
时间序列
算法
模式识别(心理学)
人工智能
机器学习
数据库
生物
基因
生物化学
古生物学
化学
作者
John Paparrizos,Luis Gravano
出处
期刊:Sigmod Record
[Association for Computing Machinery]
日期:2016-06-02
卷期号:45 (1): 69-76
被引量:161
标识
DOI:10.1145/2949741.2949758
摘要
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we describe k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. An extensive experimental evaluation against partitional, hierarchical, and spectral clustering methods, with the most competitive distance measures, showed the robustness of k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and efficient clustering approach for time series with broad applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI