Remaining Lectures in 2009 1 2 3 4
- Slides: 25
Remaining Lectures in 2009 1. 2. 3. 4. 5. Advanced Clustering and Outlier Detection Advanced Classification and Prediction Top Ten Data Mining Algorithms (short) Course Summary (short) Assignment 5 Student Presentations Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1
Clustering Part 2: Advanced Clustering and Outlier Detection 1. 2. 3. 4. 5. Hierarchical Clustering More on Density-based Clustering: DENCLUE [EM Top 10 -DM-Alg] Cluster Evaluation Measures Outlier Detection Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 2
More on Clustering 1. 2. Hierarchical Clustering to be discussed in Nov. 11 DBSCAN will be used in programming project Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree l Can be visualized as a dendrogram l – A tree like diagram that records the sequences of merges or splits Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
Agglomerative Clustering Algorithm l More popular hierarchical clustering technique l Basic algorithm is straightforward 1. 2. 3. 4. 5. 6. l Compute the proximity matrix Let each data point be a cluster Repeat Merge the two closest clusters Update the proximity matrix Until only a single cluster remains Key operation is the computation of the proximity of two clusters – Different approaches to defining the distance between clusters distinguish the different algorithms Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
Starting Situation l Start with clusters of individual points and a proximity matrix p 1 p 2 p 3 p 4 p 5. . . Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Proximity Matrix . . .
Intermediate Situation l After some merging steps, we have some clusters C 1 C 2 C 3 C 4 C 5 Proximity Matrix C 1 C 2 C 5 Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN C 5
Intermediate Situation l We want to merge the two closest clusters (C 2 and C 5) and update the proximity matrix. C 1 C 2 C 3 C 4 C 5 Proximity Matrix C 1 C 2 C 5 Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
After Merging l The question is “How do we update the proximity matrix? ” C 1 C 2 U C 5 C 3 C 4 ? ? ? C 3 ? C 4 ? Proximity Matrix C 1 C 2 U C 5 Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
How to Define Inter-Cluster Similarity p 1 Similarity? p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 l l l p 5 MIN. MAX. Group Average. Proximity Matrix Distance Between Centroids Other methods driven by an objective function – Ward’s Method uses squared error Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN . . .
How to Define Inter-Cluster Similarity p 1 p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 l l l p 5 MIN. MAX. Group Average. Proximity Matrix Distance Between Centroids Other methods driven by an objective function – Ward’s Method uses squared error Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN . . .
How to Define Inter-Cluster Similarity p 1 p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 l l l p 5 MIN. MAX. Group Average. Proximity Matrix Distance Between Centroids Other methods driven by an objective function – Ward’s Method uses squared error Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN . . .
How to Define Inter-Cluster Similarity p 1 p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 l l l p 5 MIN. MAX. Group Average. Proximity Matrix Distance Between Centroids Other methods driven by an objective function – Ward’s Method uses squared error Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN . . .
How to Define Inter-Cluster Similarity p 1 p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 l l l p 5 MIN. MAX. Group Average. Proximity Matrix Distance Between Centroids Other methods driven by an objective function – Ward’s Method uses squared error Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN . . .
Cluster Similarity: Group Average l Proximity of two clusters is the average of pairwise proximity between points in the two clusters. l Need to use average connectivity for scalability since total proximity favors large clusters 1 Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 2 3 4 5
2009 Teaching of Clustering Part 1: Basics (September/October) 1. 2. 3. 4. 5. 6. What is Clustering? Partitioning/Representative-based Clustering • K-means • K-medoids Density Based Clustering centering on DBSCAN Region Discovery Grid-based Clustering Similarity Assessment Clustering Part 2: Advanced Topics (November) Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 16
DBSCAN (http: //www 2. cs. uh. edu/~ceick/7363/Papers/dbscan. pdf ) l DBSCAN is a density-based algorithm. – – – Density = number of points within a specified radius (Eps) Input parameter: Min. Pts and Eps A point is a core point if it has more than a specified number of points (Min. Pts) within Eps u These are points that are at the interior of a cluster – A border point has fewer than Min. Pts within Eps, but is in the neighborhood of a core point – A noise point is any point that is not a core point or a border point. Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
DBSCAN: Core, Border, and Noise Points Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
DBSCAN Algorithm (simplified view for teaching) Create a graph whose nodes are the points to be clustered 2. For each core-point c create an edge from c to every point p in the -neighborhood of c 3. Set N to the nodes of the graph; 4. If N does not contain any core points terminate 5. Pick a core point c in N 6. Let X be the set of nodes that can be reached from c by going forward; 1. create a cluster containing X {c} 2. N=N/(X {c}) points that arestep not assigned to any cluster are outliers; 7. Remarks: Continue with 4 http: //www 2. cs. uh. edu/~ceick/7363/Papers/dbscan. pdf gives a more efficient implementation by 1. performing steps 2 and 6 in parallel Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
DBSCAN: Core, Border and Noise Points Original Points Point types: core, border and noise Eps = 10, Min. Pts = 4 Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
When DBSCAN Works Well Original Points Clusters • Resistant to Noise • Can handle clusters of different shapes and sizes Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
When DBSCAN Does NOT Work Well (Min. Pts=4, Eps=9. 75). Original Points Problems with • Varying densities • High-dimensional data Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN (Min. Pts=4, Eps=9. 12)
Assignment 3 Dataset: Earthquake Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
Assignment 3 Dataset: Complex 9 http: //www 2. cs. uh. edu/~ml_kdd/Complex&Diamond/2 DData. htm Dataset: http: //www 2. cs. uh. edu/~ml_kdd/Complex&Diamond/Complex 9. txt K-Means in Weka Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN in Weka
DBSCAN: Determining EPS and Min. Pts l l l Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance Noise points have the kth nearest neighbor at farther distance So, plot sorted distance of every point to its kth nearest neighbor Run DBSCAN for Minp=4 and =5 Core-points Non-Core-points Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN
- Dr asim lectures
- Software project management lectures
- Hugh blair lectures on rhetoric
- Bba lectures
- Haematology lectures
- Cdeep lectures
- Theory and practice of translation lectures
- Utilities and energy lectures
- Advanced medicinal chemistry
- Bhadeshia lectures
- Reinforcement learning lectures
- Hegel philosophy of fine art summary
- Aerodynamics lecture
- Frequency of xrays
- Ota resident lectures
- Molecular biology lectures
- Medical emergency student lectures
- Rick trebino
- Bureau of lectures
- Oral communication 3 lectures text
- Translation 1
- Introduction to web engineering
- Cern summer student lectures
- Orthopedic ppt lectures
- Yelena bogdan
- 13 lectures