Multigraph Sampling of Online Social Networks Minas Gjoka

  • Slides: 20
Download presentation
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou

Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou Multigraph sampling 1

Outline • Multigraph sampling – Motivation – Sampling method – Internet Measurements – Conclusion

Outline • Multigraph sampling – Motivation – Sampling method – Internet Measurements – Conclusion Multigraph sampling Minas Gjoka 2

Problem statement • Obtain a representative sample of OSN users by exploration of the

Problem statement • Obtain a representative sample of OSN users by exploration of the social graph. E H B C F I A G D Multigraph sampling Minas Gjoka 3

Motivation for multiple relations • Principled methods for graph sampling – Metropolis Hastings Random

Motivation for multiple relations • Principled methods for graph sampling – Metropolis Hastings Random Walk – Re-weighted Random Walk “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs, ” INFOCOM ‘ 10 • But. . graph characteristics affect mixing and convergence • fragmented social graph • highly clustered areas Multigraph sampling Minas Gjoka 4

Fragmented social graph Largest Connected Component Other Connected Components Friendship Event attendance Group membership

Fragmented social graph Largest Connected Component Other Connected Components Friendship Event attendance Group membership Multigraph sampling Union 5

Highly clustered social graph Friendship Event attendance Union Multigraph sampling Minas Gjoka 6

Highly clustered social graph Friendship Event attendance Union Multigraph sampling Minas Gjoka 6

Proposal • Graph exploration using multiple user relations – perform random walk – re-weighting

Proposal • Graph exploration using multiple user relations – perform random walk – re-weighting at the end of the walk – online convergence diagnostics applicable • Theoretical benefits – faster mixing – discovery of isolated components • Open questions – how to combine relations – implementation efficiency – evaluation of sampling benefits in a realistic scenario Multigraph sampling Minas Gjoka 7

E B I D H A K Friends K Events F J C G

E B I D H A K Friends K Events F J C G E B I D H A F J C G E B I D A H K F Groups J C Multigraph sampling G Minas Gjoka 8

E B I D H A K Friends K Events F J C G

E B I D H A K Friends K Events F J C G E B I D H A F J C G E B I D A H K F Groups J C Multigraph sampling G Minas Gjoka 9

Combination of multiple relations E B I D H A G* = Friends +

Combination of multiple relations E B I D H A G* = Friends + Events + Groups K F ( G* is a union multigraph ) J C G B E I D A H K F J C G = Friends + Events + Groups ( G is a union graph ) G 10 Multigraph sampling Minas Gjoka

Multigraph sampling Implementation efficiency Degree information available without enumeration Take advantage of pages functionality

Multigraph sampling Implementation efficiency Degree information available without enumeration Take advantage of pages functionality Multigraph sampling Minas Gjoka 11

Multigraph sampling Internet Measurements • Last. fm, an Internet radio service – social networking

Multigraph sampling Internet Measurements • Last. fm, an Internet radio service – social networking features – multiple relations – fragmented graph components and highly clustered users expected • Last. fm relations used – Friends – Groups – Events – Neighbors Multigraph sampling Minas Gjoka 12

Data Collection Sampled node information • Crawling using Last. fm API and HTML scraping

Data Collection Sampled node information • Crawling using Last. fm API and HTML scraping user. ID country age registration time … Multigraph sampling Minas Gjoka 13

Summary of datasets Last. fm - July 2010 Crawl type # Total Users %

Summary of datasets Last. fm - July 2010 Crawl type # Total Users % Unique Users Friends 5 x 50 K 71% Events Groups 5 x 50 K 58% 74% Neighbors Friends-Events. Groups-Neighbors UNI 5 x 50 K 53% 5 x 50 K 76% 500 K 99% Multigraph sampling Minas Gjoka 15

Comparison to UNI % of Subscribers Multigraph sampling Minas Gjoka 16

Comparison to UNI % of Subscribers Multigraph sampling Minas Gjoka 16

Last. fm Charts Estimation Application of sampling Multigraph sampling Minas Gjoka 17

Last. fm Charts Estimation Application of sampling Multigraph sampling Minas Gjoka 17

Last. fm Charts Estimation Artist Charts Multigraph sampling Minas Gjoka 18

Last. fm Charts Estimation Artist Charts Multigraph sampling Minas Gjoka 18

Related Work • Fastest mixing Markov Chain – Boyd et al - SIAM Review

Related Work • Fastest mixing Markov Chain – Boyd et al - SIAM Review 2004 • Sampling in fragmented graphs – Ribeiro et al. Frontier Sampling – IMC 2010 • Last. fm studies – Konstas et al - SIGIR ‘ 09 – Schifanella et al - WSDM ‘ 10 Multigraph sampling Minas Gjoka 19

Conclusion • Introduced multigraph sampling – simple and efficient – discovers isolates components –

Conclusion • Introduced multigraph sampling – simple and efficient – discovers isolates components – better approximation of distributions and means – multigraph dataset planned for public release • Future work on multigraph sampling – selection of relations – weighted relations Multigraph sampling Minas Gjoka 20

Thank you Questions? Multigraph sampling Minas Gjoka 21

Thank you Questions? Multigraph sampling Minas Gjoka 21