Spectral Clustering Jianping Fan Dept of CS UNCCharlotte
- Slides: 93
Spectral Clustering Jianping Fan Dept of CS UNC-Charlotte http: //webpages. uncc. edu/jfan/itcs 4122. html
Key issues for Data Clustering Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Intra-cluster distances are minimized Objective Function Inter-cluster distances are maximized
SUMMARY OF K-MEANS Centers: random & density scan l K: start from small K & separate iteratively; start from large K and merge sequentially l Outliers: l Problems of K-means Locations of Centers Number of Clusters K Sensitive to Outliers Data Manifolds (Shapes of Data Distributions) Experiences
Problems of K-MEANs Intra-cluster distances are minimized Distance Function Optimization Step: Assignment Step: Inter-cluster distances are maximized Geometry Distance
Problems of K-MEANs l Similarity function cannot handle special data manifold effectively! l Intra-cluster similarity and inter-cluster similarity are not optimized jointly or simultaneously! l Pre-selected locations of cluster centers may not be acceptable!
K-Means Clustering Expected Why K-Means fails? Achieved
Why K-Means Clustering Fails? Expected Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Achieved Objective Function
Why K-Means Clustering Fails? Achieved Expected Number of clusters K may not be an issue here Objective function?
Why K-Means Clustering Fails? Expected Achieved Data Manifold: Relationship rather than distance Distance Function & Decision for Data Clustering
Key issues for Data Clustering Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Similarity or distance function
Lecture Outline l l Motivation Graph overview and construction Spectral Clustering Cool implementations 11
Spectral Clustering Example – 2 Spirals Dataset exhibits complex cluster shapes Þ K-means performs very poorly in this space due bias toward dense spherical clusters. Relationship vs. Geometry Distance In the embedded space given by two leading eigenvectors, clusters are trivial to separate. 12
Spectral Clustering Similarity representation Relationship Inter-cluster similarity Objective Function Intra-cluster similarity Number of clusters K Decision for clustering 13
Graph-Based Similarity Representation ---considering data manifold Relationship vs. Geometry Distance 14
Spectral Clustering Example Why k-means fails? Geometry vs. Manifold
Graph-Based Similarity Representation Distance vs. Relationship 16
Graph-Based Similarity Representation Distance vs. Relationship 17
Graph-Based Similarity Representation Distance vs. Relationship 18
Graph-Based Similarity Representation Number of clusters matters 19
Lecture Outline l l Motivation Graph overview and construction Spectral Clustering Cool implementation 20
Graph-based Representation of Data Similarity(Relationship) 21
Similarity (Relationship) Graph-based Representation of Data Similarity(Relationship) 22
Graph-based Representation of Data Relationship 23
Manifold (Shape of Data Distribution) 24
Graph-based Representation of Data Relationships Manifold 25
Graph-based Representation of Data Relationships 26
Graph-based Representation of Data Relationships How to generate such graph for data relationship representation? 27
Data Graph Construction 28
29 Graph-based Representation of Data Relationships
30 Graph-based Representation of Data Relationships
31
Graph-based Representation of Data Relationships 32
33 Graph-based Representation of Data Relationships
Graph Cut 34
Lecture Outline l l Motivation Graph overview and construction Spectral Clustering---considering intra-cluster similarity and inter-cluster similarity jointly! Cool implementations 35
Key issues for Spectral Clustering Relationship function for Graph construction Inter-cluster similarity or distance Intra-cluster similarity or distance Objective Function Number of clusters K Decision for data clustering
How to Do Graph Partitioning? Citation Group Identification 37
How to Do Graph Partitioning? Social Group Identification 38
How to Do Graph Partitioning? Hot Topic Detection 39
40 Graph-based Representation of Data Relationships
Intra-cluster similarity 41
Spectral Clustering cut Intra-Cluster Similarity: Inter-Cluster Similarity: 42
Spectral Clustering Graphcut Objective Function for Spectral Clustering 1. Maximize Intra-Cluster Similarity 2. Minimize Inter-Cluster Similarity
Spectral Clustering Graphcut Objective Function for Spectral Clustering Min
Spectral Clustering Graphcut Clustering via Graph Cut on weak connection points: Minimize inter-cluster similarity 45
Inter-cluster similarity 46
Inter-cluster similarity 47
48
49
50
Graph-based Representation of Data Relationships 51
Graph Cut 52
53
54
55
56
Eigenvectors & Eigenvalues 57
58
59
Normalized Cut A graph G(V, E) can be partitioned into two disjoint sets A, B Cut is defined as: Optimal partition of the graph G is achieved by minimizing the cut Min ( ) 60
Normalized Cut Association between partition set and whole graph 61
Normalized Cut 62
Normalized Cut 63
Normalized Cut 64
Normalized Cut becomes Normalized cut can be solved by eigenvalue equation: 65
Extending Binary Normalized Cut to Multi-Class 66
K-way Min-Max Cut Intra-cluster similarity Inter-cluster similarity Decision function for spectral clustering Minimize inter-cluster similarity but maximizing intra-cluster similarity 67
Mathematical Description of Spectral Clustering Refined decision function for spectral clustering We can further define: 68
Refined decision function for spectral clustering This decision function can be solved as 69
Spectral Clustering Algorithm Ng, Jordan, and Weiss l Motivation l Given a set of points l We would like to cluster them into k subsets 70
Algorithm l l Form the affinity matrix Define if l l Scaling parameter chosen by user Define D a diagonal matrix whose (i, i) element is the sum of A’s row i 71
Algorithm l Form the matrix l Find , the k largest eigenvectors of L These form the columns of the new matrix X l l Note: have reduced dimension from nxn to nxk 72
Algorithm l Form the matrix Y l l l Renormalize each of X’s rows to have unit length Y Treat each row of Y as a point in Cluster into k clusters via K-means 73
Algorithm l Final Cluster Assignment l Assign point to cluster j iff row i of Y was assigned to cluster j 74
Why? l If we eventually use K-means, why not just apply K-means to the original data? l This method allows us to cluster non-convex regions 75
l Some Examples 76
77
78
79
80
81
82
83
84
User’s Prerogative l Affinity matrix construction l Choice of scaling factor l Realistically, search over and pick value that gives the tightest clusters l Choice of k, the number of clusters l Choice of clustering method 85
How to select k? l Eigengap: the difference between two consecutive eigenvalues. l Most stable clustering is generally given by the value k that maximises the expression Largest eigenvalues of Cisi/Medline data Þ Choose k=2 λ 1 λ 2 86
Recap – The bottom line 87
Summary l l Spectral clustering can help us in hard clustering problems The technique is simple to understand The solution comes from solving a simple algebra problem which is not hard to implement Great care should be taken in choosing the “starting conditions” 88
Problems for Spectral Clustering l Number of Clusters K l Objective Function Optimization l Better Similarity (Relationship) Functions 89
What’s Visual Analytics? Initial Clustering Result & Visualization
What’s Visual Analytics? Initial Clustering Result & Visualization l l Similarity-preserving data projection: from high-dimensional space for data representation to 2 D space for visualization Data layout Mistakes induced by data projection
What’s Visual Analytics? Human Advising via HCI
What’s Visual Analytics? Computer Interpretation of Human Advices Must-Link vs. Not-Link Data Clustering with Constraints
- Jianping fan
- Eric xing
- Spectral clustering
- Spectral clustering
- Flat vs hierarchical clustering
- Partitional clustering
- Rumus euclidean distance
- Fan in and fan out in cmos
- Electrons flowing
- Spectral classes
- Spectral leakage
- Neon spectral lines
- Value at risk formula
- Spectral leakage
- Spectral unmixing
- Spectral class
- Adobe audition guide
- Rotational spectral lines
- Analytical spectral devices
- A brief introduction to spectral graph theory
- Shang hua teng
- Profil spectral rigel
- Minimum distance classifier
- Global spectral model
- Potassium lithium
- Daniel spielman spectral graph theory
- Spectral bands
- Spectral regrowth
- Atomic spectral lines
- Luminosity vs temperature
- Spectral graph
- Symmetric theorem
- Spectral efficiency
- Multitaper spectral analysis
- Spectral normalization gan
- Psd of ask
- Spectral transformation of iir filters
- Spectral graph theory spielman
- Chromosome number of domestic animals
- Selection rule for raman spectroscopy
- Foster freeman vsc 80
- Spectral regrowth
- Ravacha
- Spectral sensitivity
- The spectral sequence sorts stars according to
- Vernier spectroscopy
- Domaine spectral
- Calculation of a constant q spectral transform
- Spectral hashing
- Spectral imaging
- Oid in radiography
- Meridional rays in optical fiber
- Rotational spectral lines
- Spectral characteristics of angle modulated signals
- Spectral angle mapper
- Spectral classification
- Nys department of homeland security
- Maine dept of agriculture
- Florida dept of agriculture and consumer services
- Vaginal dept
- Dept nmr spectroscopy
- Albany county dss
- Nebraska dept of agriculture
- Dept of education
- Gome dept
- Dept. name of organization
- Affiliate disclodures
- Dept of education
- Dept a
- Gome dept
- Dept. name of organization
- Florida department of agriculture and consumer services
- Ee dept iitb
- Dept c13 nmr
- Hoe dept
- Ohio dept of developmental disabilities
- Geaux biz login
- Central islip fire department
- Gome dept
- Mn dept of education
- Finance department organizational chart
- Dept ind onegov
- Pt dept logistik
- Florida dept of agriculture and consumer services
- Lafd interview score
- Poster affiliation
- Oxford dept of continuing education
- Micah ennis
- Gome dept
- Worcester ma inspectional services
- Ms department of finance and administration
- A framework for clustering evolving data streams
- Disadvantages of k means clustering
- Clustering slides