Densitybased Place Clustering in GeoSocial Networks Jieming Shi
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Dingming Wu, Nikos Mamoulis, David W. Cheung Department of Computer Science, The University of Hong Kong
Clustering Spatial clustering – grouping of spatial objects (geographic places in our case) into clusters Useful for marketing and urban planning Density based clustering divides a large collection of points into densely populated regions
DBSCAN algorithm DBSCAN is one of the most common data clustering algorithms – proposed in 1996 For each place p it finds all the places within the radius ε of p – eps-neighborhood. If the number of places in eps-neighborhood is no less than Min. Pts – p is called a core point -> it will form a cluster or will be a part of cluster Dense eps-neighborhoods are put into the same cluster if they contain the cores of each other
Example 1 Min. Pts = 4 2 3 ε ε … finish ε
DBSCAN result example
Use of geo-social network data Current spatial clustering models disregard information about the people who are related to the clustered places. Social Network with geographic checkins includes: Users Friendship Checkins connections
Motivation Urban planning: land managers are interested in identifying regions with uniform demographic statistics (for example, areas where elderly people prefer to visit or areas with people that have in common special transportation or living needs) Data cleaning: nearby Geo-Social Network locations collected by user check-ins could belong to the same physical place Marketing: if two or more places belong to the same geo-social cluster, the user who likes one place will probably be interested to visit the others
users friendship connections checkins places
Example 1 Example 2
Density-based Clustering Places in Geo-Social Networks (DCPGS)
Input
DCPGS - Geo-social ε-neighborhood definition
DCPGS algorithm idea
Distance functions
Social distance
Alternative ways to compute social distance – (1) Jaccard
Alternative ways to compute social distance – (2) Sim. Rank
Alternative ways to compute social distance – (2) Sim. Rank
Alternative ways to compute social distance – (3) Katz
Alternative ways to compute social distance – (4) Commute Time
Algorithms DCPGS-R and DCPGS-G
DCPGS-R: R-tree based The algorithm uses R-Tree to facilitate the search of geo-social ε-neighborhood for a given place For the sake of efficiency the social network is stored in a hash table – each pair of friends as an entry
Spatial query – uses R-tree The distance has already been computed Compute social and geo-social distance
DCPGS-G: Grid-based Individual R-tree based range queries find all the places within the radius max. D of the given geographic place in O(log n + < #of places within radius max. D >) which will be equal to O(log n) in most cases But when we have millions of places – we need to perform millions of such queries
DCPGS-G: Grid-based
DCPGS-G: Grid-based
DCPGS-G: Grid-based
DCPGS-G: Grid-based
Results
Visualization-based Analysys
Visualization-based Analysys
Visualization-based Analysys
Social Entropy based Evaluation
Social Entropy based Evaluation
Social Entropy based Evaluation Commute. Time, and Katz have the lowest social entropy however, these methods produce small clusters and have too many outliers Jaccard also has low social entropy for the same reason DCPGS is better than Sim. Rank
- Slides: 38