A LinkBased Cluster Ensemble Approach for Categorical Data
A Link-Based Cluster Ensemble Approach for Categorical Data Clustering Presenter : JIAN-REN CHEN Authors : Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price 2012 , IEEE Intelligent Database Systems Lab
Outlines n Motivation n Objectives n Methodology n Experiments n Conclusions n Comments Intelligent Database Systems Lab
Motivation • Cluster Ensembles: combine different clustering decisions in such a way as to achieve accuracy superior to that of any individual clustering. Intelligent Database Systems Lab
Objectives • A new link-based approach improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. Intelligent Database Systems Lab
Methodology Creating a Cluster Ensemble Generating a Refined Matrix Applying a Consensus Function to RM Intelligent Database Systems Lab
Methodology Type I (Direct ensemble): Type II (Full-space ensemble) Type III (Subspace ensemble) Creating a Cluster Ensemble Generating a Refined Matrix Applying a Consensus Function to RM Intelligent Database Systems Lab
Methodology Creating a Cluster Ensemble Generating a Refined Matrix Applying a Consensus Function to RM Intelligent Database Systems Lab
Methodology Creating a Cluster Ensemble Generating a Refined Matrix Applying a Consensus Function to RM Intelligent Database Systems Lab
Methodology • given a graph G = (V, W) • SPEC finds the K largest eigenvectors of W • formed another matrix U Creating a Cluster Ensemble Generating a Refined Matrix Applying a Consensus Function to RM Intelligent Database Systems Lab
Experiments • Investigated Data Sets Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Experiments Intelligent Database Systems Lab
Conclusions • Constructing the RM is efficiently resolved by the similarity among categorical labels, using the Weighted Triple-Quality similarity algorithm. • The link-based method usually achieves superior clustering results. Intelligent Database Systems Lab
Comments • Advantages – The link-based method is efficient. • Applications – Categorical Data Clustering Intelligent Database Systems Lab
- Slides: 15