Testing Cluster Structure of Graphs Artur Czumaj DIMAP

  • Slides: 40
Download presentation
Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of. Computer Science University

Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of. Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund)

Dealing with “Big. Data” in Graphs • We want to process graphs quickly –

Dealing with “Big. Data” in Graphs • We want to process graphs quickly – Detect basic properties – Analyze their structure • For large graphs, by “quickly” we often would mean: in time constant or sublinear in the size of the graph

Dealing with “Big. Data” in Graphs One approach: • How to test basic properties

Dealing with “Big. Data” in Graphs One approach: • How to test basic properties of graphs in the framework of property testing

Fast Testing of Graph Properties • from Fan Chung’s web page

Fast Testing of Graph Properties • from Fan Chung’s web page

Clustering in graphs • What is a good clustering? from Fan Chung’s web page

Clustering in graphs • What is a good clustering? from Fan Chung’s web page

Clustering in graphs • Same cluster: points are wellconnected • Different cluster: points are

Clustering in graphs • Same cluster: points are wellconnected • Different cluster: points are poorly connected from Fan Chung’s web page

Clustering in graphs Sublinear algorithms parameterized complexity

Clustering in graphs Sublinear algorithms parameterized complexity

Clustering in graphs • Same cluster: points have high conductance • Different cluster: points

Clustering in graphs • Same cluster: points have high conductance • Different cluster: points are separated by a cut from Fan Chung’s web page

Conductance •

Conductance •

Framework of property testing • We cannot quickly give 100% precise answer • We

Framework of property testing • We cannot quickly give 100% precise answer • We need to approximate • Distinguish graphs that have specific property from those that are far from having the property

Framework •

Framework •

Goal •

Goal •

Testing expansion •

Testing expansion •

Testing expansion and clustering •

Testing expansion and clustering •

Testing expansion and clustering •

Testing expansion and clustering •

 • We would like to have the following algorithm:

• We would like to have the following algorithm:

Key theorem •

Key theorem •

Key properties (completeness) •

Key properties (completeness) •

Key properties (completeness) • Convergence within a cluster We prove this using higher-order Cheeger’s

Key properties (completeness) • Convergence within a cluster We prove this using higher-order Cheeger’s inequality Key challenge: even if expect to stay inside the cluster in a large fraction of random walks, we can sometime leave the cluster and then we don’t have “easy” control over the distribution of endpoints.

Key properties (completeness) •

Key properties (completeness) •

Key properties (completeness) • Convergence outside a cluster

Key properties (completeness) • Convergence outside a cluster

Key properties (completeness) •

Key properties (completeness) •

Key properties (completeness) • Bound for distribution for clusterable graphs

Key properties (completeness) • Bound for distribution for clusterable graphs

Key properties (completeness) •

Key properties (completeness) •

Key properties (completeness) •

Key properties (completeness) •

Key properties (soudness) •

Key properties (soudness) •

Key properties (soudness) •

Key properties (soudness) •

Key theorem •

Key theorem •

Extensions •

Extensions •

Conclusions Clustering (or clusterability) can be tested fast • by comparing distributions of random

Conclusions Clustering (or clusterability) can be tested fast • by comparing distributions of random walks • drawing conclusions from the distributions Tools: • Random sampling • Random walks • Spectral analysis