Testing Cluster Structure of Graphs Artur Czumaj DIMAP

Dealing with “Big. Data” in Graphs • We want to process graphs quickly –

Dealing with “Big. Data” in Graphs One approach: • How to test basic properties

Fast Testing of Graph Properties • from Fan Chung’s web page

Clustering in graphs • What is a good clustering? from Fan Chung’s web page

Clustering in graphs • Same cluster: points are wellconnected • Different cluster: points are

Clustering in graphs Sublinear algorithms parameterized complexity

Clustering in graphs • Same cluster: points have high conductance • Different cluster: points

Framework of property testing • We cannot quickly give 100% precise answer • We

• We would like to have the following algorithm:

Key properties (completeness) • Convergence within a cluster We prove this using higher-order Cheeger’s

Key properties (completeness) • Convergence outside a cluster

Key properties (completeness) • Bound for distribution for clusterable graphs

Conclusions Clustering (or clusterability) can be tested fast • by comparing distributions of random

Slides: 40

Download presentation

Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of. Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund)

Dealing with “Big. Data” in Graphs • We want to process graphs quickly – Detect basic properties – Analyze their structure • For large graphs, by “quickly” we often would mean: in time constant or sublinear in the size of the graph

Dealing with “Big. Data” in Graphs One approach: • How to test basic properties of graphs in the framework of property testing

Fast Testing of Graph Properties • from Fan Chung’s web page

Clustering in graphs • What is a good clustering? from Fan Chung’s web page

Clustering in graphs • Same cluster: points are wellconnected • Different cluster: points are poorly connected from Fan Chung’s web page

Clustering in graphs Sublinear algorithms parameterized complexity

Clustering in graphs • Same cluster: points have high conductance • Different cluster: points are separated by a cut from Fan Chung’s web page

Conductance •

Framework of property testing • We cannot quickly give 100% precise answer • We need to approximate • Distinguish graphs that have specific property from those that are far from having the property

Framework •

Goal •

Testing expansion •

Testing expansion and clustering •

• We would like to have the following algorithm:

Key theorem •

Key properties (completeness) •

Key properties (completeness) • Convergence within a cluster We prove this using higher-order Cheeger’s inequality Key challenge: even if expect to stay inside the cluster in a large fraction of random walks, we can sometime leave the cluster and then we don’t have “easy” control over the distribution of endpoints.

Key properties (completeness) •

Key properties (completeness) • Convergence outside a cluster

Key properties (completeness) •

Key properties (completeness) • Bound for distribution for clusterable graphs

Key properties (completeness) •

Key properties (soudness) •

Key properties (soudness) •

Key theorem •

Extensions •

Conclusions Clustering (or clusterability) can be tested fast • by comparing distributions of random walks • drawing conclusions from the distributions Tools: • Random sampling • Random walks • Spectral analysis