Clustering Spatial Data Using Random Walks Author David
- Slides: 18
Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao
• • • Outline Motivation Objective Introduction Basic Notions Modeling The Data Clustering Using Random Walks – Separators and separating operators – Clustering by separation – Clustering spatial points • Integration with Agglomerative Clustering • Examples • Conclusion • Opinion
Motivation • The characteristics of spatial data pose several difficulties for clustering algorithms • The clusters may have arbitrary shapes and nonuniform sizes – Different cluster may have different densities • The existence of noise may interfere the clustering process
Objective • Present a new approach to clustering spatial data • Seeking efficient clustering algorithms. • Overcoming noise and outliers
Introduction • The heart of the method is in what we shall be calling separating operators. • Their effect is to sharpen the distinction between the weights of inter-cluster edges and intra-cluster edges – By decreasing the former and increasing the latter • It can be used on their own or can be embedded in a classical agglomerative clustering framework.
BASIC NOTIONS • graph-theoretic notions (A higher value means more similar)
BASIC NOTIONS • The probability of a transition from node i to node j • The probability that a random walk originating at s will reach t before returning to s
MODELINE THE DATA • Delaunay triangulation (DT) – Many O(n log n) time and O(n) space algorithms exist for computing the DT of a planar point set. • K-mutual neighborhood – The k-nearest neighbors of each point can be O(n log n) time O(n) space for any fixed arbitrary dimension. • The weight of the edge (a, b) is – d(a, b) is the Euclidean distance between a and b. – ave is the average Euclidean distance between two adjacent points.
CLUSTERING USING RANDOM WALKS • To identifying natural clusters in a graph is to find ways to compute an intimacy relation between the nodes incident to each of the graph’s edges. • Identifying separators is to use an iterative process of separation. – This is a kind of sharpening pass
NS : Separation by neighborhood similarity Definition :
CE : Separation by circular escape Definition :
Clustering spatial points
Integration with Agglomerative Clustering • The separation operators can be used as a preprocessing before activating agglomerative clustering on the graph • Can effectively prevent bad local merging opposing the graph structure. • It is equivalent to a “single link” algorithm preceded by a separation operation
Examples
Conclusion • It is robust in the presence of noise and outliers, and is flexible in handling data of different densities. • The CE operator yields better results than the NS operator • The time complexity of our algorithm applied to n data points is O(n log n)
Opinion • Since the algorithm does not rely on spatial knowledge, we can to try it on other types of data.
END
- Oflinemaps
- She who walks with integrity walks securely meaning
- Flat clustering vs hierarchical clustering
- Partitional clustering
- Flat clustering vs hierarchical clustering
- First author second author third author
- Physics review letter
- Spatial data and attribute data
- Spatial data and attribute data
- Random assignment vs random sampling
- Random assignment vs random selection
- Clustering by passing messages between data points
- Classification and clustering in data mining
- Clustering non numeric data
- Birch clustering algorithm in data mining
- A framework for clustering evolving data streams
- Cluster analysis in data mining
- K-means clustering algorithm in data mining
- Spatial order sentences examples