# Detecting Low Complexity Clusters by Skewness and Kurtosis

Detecting Low Complexity Clusters by Skewness and Kurtosis in Data Stream Clustering Mingzhou (Joe) Song and Hongbin Wang Proc. 9 th International Symposium on Artificial Intelligence and Mathemati Presented by : Niwan Wattanakitrungroj 8 December 2009 1

Outline § § § Introduction Related work Overview Computation of statistics for a data stream Multivariate normality testing Example: Clustering a data stream from a Gaussian mixture model § Conclusion and future work 2

Introduction § Data stream analysis Financial application c i n u m Telecom gement na data ma nitoring o m k r o Netw tworks e n r o s n Se Statistical Representations ►How to manipulate clusters : • the recognition of a new cluster and a historical one which are supposed to be the same cluster. 3

Related work Cluster representation based on Literature BIRCH : T. Zhang at al. , 1999 Clu. Stream : C. C. Aggarwal at al. , 2003 M. Song and H. Wang , 2005 (their early work ) Merging strategy covarianc e intra-and inter-cluster distance mean variance Proposed algorithm : § mean, covariance + multivariate skewness and kurtosis § merge toward normal clusters based on hypothesis testing of multivariate skewness and kurtosis 4

Overview + Old Detect new clusters Merge? Ne w = > ? Expectation maximization (EM) algorithm 1 st phase: Mean and covariance 2 nd phase: Skewness and kurtosis 5

Computation of statistics for a data stream § Mean vector § Covariance matrix § Skewness § Kurtosis § Data in the old cluster are no longer available ! § How to compute cluster statistics 6

Computation of statistics for a data stream (cont. ) § Let 1 and 2 be the indices of the old and new clusters, respectively. old cluster + new cluster 7

Computation of statistics for a data stream (cont. ) § The cross moment § The central cross moment § Example: [ First (cross) ► moment ] [ Second ► cross moment ] ► [ Third 8

Computation of statistics for a data stream (cont. ) Multivariate skewness: 9

Computation of statistics for a data stream (cont. ) Multivariate kurtosis: 10

Multivariate normality testing § Multivariate skewness b 1, p § Multivariate Kurtosis b 2, p 11

Example: Clustering a data stream from a Gaussian mixture model Clusters in 1 st window Clusters in 2 nd window Clusters in 3 rd window Clusters in 4 th window 12

Example: Clustering a data stream from a Gaussian mixture model (cont. ) Clusters in 1 st window Clusters in 2 nd window Clusters in 3 rd window Clusters in 4 th window Clusters obtained after merging by mean and covariance, without detection (a) Clusters after 2 nd window (b) Clusters after 3 rd window (c) Final clusters 13

Example: Clustering a data stream from a Gaussian mixture model (cont. ) Clusters in 1 st window Clusters in 2 nd window Clusters in 3 rd window Clusters in 4 th window Clusters obtained after merging with detection of simple clusters by the multivariate skewness and kurtosis (a) Clusters after 2 nd window(b) Clusters after 3 rd window (c) Final clusters 14

Conclusion and future work § Higher order statistics (multivariate skewness and kurtosis) based cluster merging more flexible and accurate representation of clusters than distance-based approaches ► based on cross moments which can be maintained without storing historical data. ► § Future work : design higher order statistical testing strategies to uncover regularities beyond normality 15

Thank you Q&A 16

- Slides: 16