Microsoft Instant Messenger Communication Network How does the








































![Network: Small-world § 6 degrees of separation [Milgram ’ 60 s] § Average distance Network: Small-world § 6 degrees of separation [Milgram ’ 60 s] § Average distance](https://slidetodoc.com/presentation_image_h/060c7df2d5d1191b2b03a7e6434898ea/image-41.jpg)










- Slides: 51
Microsoft Instant Messenger Communication Network How does the world communicate? Jure Leskovec (jure@cs. cmu. edu) Machine Learning Department http: //www. cs. cmu. edu/~jure Joint work with: Eric Horvitz, Microsoft Research
Networks: Why? § Today: large on-line systems leave detailed records of social activity § On-line communities: My. Scace, Facebook § Email, blogging, instant messaging § On-line publications repositories, ar. Xiv, Med. Line § Emerging behavior (need lots of data): § Actions of individual nodes are independent but global patterns and regularities emerge
The Largest Social Network § What is the largest social network in the world (that we can relatively easily obtain)? For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network) 3
Instant Messaging • Contact (buddy) list • Messaging window 4
Instant Messaging as a Network Buddy Conversation 5
IM – Phenomena at planetary scale Observe social phenomena at planetary scale: § How does communication change with user demographics (distance, age, sex)? § How does geography affect communication? § What is the structure of the communication network? 6
Communication data The record of communication § Presence data § user status events (login, status change) § Communication data § who talks to whom § Demographics data § user age, sex, location 7
Data description: Presence § Events: § § § Login, Logout Is this first ever login Add/Remove/Block buddy Add unregistered buddy (invite new user) Change of status (busy, away, BRB, Idle, …) § For each event: § User Id § Time 8
Data description: Communication § For every conversation (session) we have a list of users who participated in the conversation § There can be multiple people per conversation § For each conversation and each user: § § § User Id Time Joined Time Left Number of Messages Sent Number of Messages Received 9
Data description: Demographics § For every user (self reported): § § § Age Gender Location (Country, ZIP) Language IP address (we can do reverse geo IP lookup) 10
Data collection § § § Log size: 150 Gb/day Just copying over the network takes 8 to 10 h Parsing and processing takes another 4 to 6 h After parsing and compressing ~ 45 Gb/day Collected data for 30 days of June 2006: § Total: 1. 3 Tb of compressed data 11
Network: Conversations Conversation 12
Data statistics Activity over June 2006 (30 days) § 245 million users logged in § 180 million users engaged in conversations § 17, 5 million new accounts activated § More than 30 billion conversations 13
Data statistics per day Activity on June 1 2006 § 1 billion conversations § 93 million users login § 65 million different users talk (exchange messages) § 1. 5 million invitations for new accounts sent 14
User characteristics: age 15
Age piramid: MSN vs. the world 16
Conversation: Who talks to whom? § Cross gender edges: § 300 male-male and 235 female-female edges § 640 million female-male edges 17
Number of people per conversation § Max number of people simultaneously talking is 20, but conversation can have more people 18
Conversation duration § Most conversations are short 19
Conversations: number of messages Sessions between fewer people run out of steam 20
Time between conversations § Individuals are highly diverse § What is probability to login into the system after t minutes? § Power-law with exponent 1. 5 § Task queuing model [Barabasi] § My email, Darvin’s and Einstein’s letters follow the same pattern 21
Age: Number of conversations User self reported age High Low 22
Age: Total conversation duration User self reported age High Low 23
Age: Messages per conversation User self reported age High Low 24
Age: Messages per unit time User self reported age High Low 25
Who talks to whom: Number of conversations 26
Who talks to whom: Conversation duration 27
Geography and communication § Count the number of users logging in from particular location on the earth 28
How is Europe talking § Logins from Europe 29
Users per geo location Blue circles have more than 1 million logins. 30
Users per capita Fraction of population using MSN: • Iceland: 35% • Spain: 28% • Netherlands, Canada, Sweden, Norway: 26% • France, UK: 18% • USA, Brazil: 8% 31
Communication heat map § For each conversation between geo points (A, B) we increase the intensity on the line between A and B 32
Homophily (gliha v kup štriha) § Probability: Age vs. Age § Correlation: 33
Per country statistics § On a particular typical day… Country USA Brazil France Unknown Spain UK Canada China Turkey Mexico # of logins # of users # of messages Messages per user 38, 319, 363 13, 261, 337 412, 729, 278 31. 12 20, 582, 613 7, 864, 424 467, 972, 522 59. 50 19, 163, 131 6, 475, 858 518, 931, 785 80. 13 18, 444, 352 6, 872, 347 191, 167, 085 27. 81 16, 868, 549 6, 140, 895 503, 759, 240 82. 03 16, 659, 009 5, 724, 826 487, 018, 470 85. 07 14, 558, 692 5, 021, 185 160, 249, 686 31. 91 14, 225, 163 5, 314, 463 101, 003, 729 19. 00 13, 619, 789 4, 696, 555 353, 540, 475 75. 27 10, 756, 989 4, 359, 932 209, 195, 100 47. 98 Note that global usage and market share statistics are higher if we accumulate data over longer time periods. 34
Per typical user per country § On a typical day MSN user from a country … Logins on a Users on a Messages per Country particular day sent user Slovenia 364, 988 130, 884 15, 919, 892 121. 6335992 Malta 122, 846 41, 829 4, 993, 316 119. 3745009 Hungary 1, 214, 268 427, 320 47, 623, 604 111. 4471684 Bosnia 105, 584 35, 689 3, 254, 170 91. 18131637 Teunion 100, 335 33, 399 3, 041, 635 91. 0696428 Gibraltar 19, 096 6, 452 581, 195 90. 07982021 UK 16, 659, 009 5, 724, 826 487, 018, 470 85. 07131396 Macedonia 126, 729 43, 754 3, 669, 977 83. 87751977 Netherlands 7, 399, 160 2, 696, 669 221, 300, 210 82. 06428375 Spain 16, 868, 549 6, 140, 895 503, 759, 240 82. 03352117 Note that global usage and market share numbers are higher if we accumulate data over longer time periods. 35
What about Slovenia (per capita)? Statistic Conversations inside Conversation to outside Total conversations Avg. time inside Avg. time outside Avg. time inside (pct. ) Messages sent inside Messages sent outside Messages inside (pct. ) Number 19, 868, 886 7, 868, 483 27, 737, 369 309. 49 314. 39 0. 4960 9. 78 9. 46 0. 5083 Rank (per capita) 22 48 29 147 80 32 19 36
Who is Slovenia talking to? Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Target Pairs of Number of Avg. time Avg. # of Country people conversations per conv. messages Slovenia 13, 41, 250 19, 868, 886 309. 4 9. 78 USA 61, 794 922, 527 303. 4 9. 14 Spain 27, 650 310, 357 289. 4 7. 97 UK 14, 709 204, 335 325. 4 9. 02 Germany 9, 047 129, 551 350. 3 10. 20 Bosnia 9, 956 114, 509 385. 9 14. 62 Yugoslavia 8, 194 104, 270 381. 7 12. 55 Italy 8, 612 100, 698 358. 8 9. 89 Croatia 6, 838 84, 362 359. 0 11. 00 Turkey 10, 763 77, 651 292. 4 8. 08 Albania 9, 517 76, 440 320. 7 10. 88 Sweden 5, 083 69, 019 306. 9 8. 34 Netherlands 5, 061 68, 287 315. 9 8. 87 Canada 5, 003 60, 617 301. 8 7. 38 37
Instant Messaging as a Network Buddy 38
IM Communication Network § Buddy graph: § 240 million people (people that login in June ’ 06) § 9. 1 billion edges (friendship links) § Communication graph: § There is an edge if the users exchanged at least one message in June 2006 § 180 million people § 1. 3 billion edges § 30 billion conversations 39
Buddy network: Number of buddies § Buddy graph: 240 million nodes, 9. 1 billion edges (~40 buddies per user) 40
Network: Small-world § 6 degrees of separation [Milgram ’ 60 s] § Average distance 5. 5 § 90% of nodes can be reached in < 8 hops Hops Nodes 1 10 2 78 3 396 4 8648 5 3299252 6 28395849 7 79059497 8 52995778 9 10321008 10 1955007 11 518410 12 149945 13 44616 14 13740 15 4476 16 1542 17 536 18 167 19 71 20 29 21 16 22 10 23 3 24 2 25 3
Network: Searchability § Milgram’s experiment showed: v § (1) short paths exist in networks § (2) humans are able to find them § Assume the following setting: § Nodes are scattered on a plane § Given starting node u and we want to reach target node v § Algorithm: always navigate to a neighbor that is geographically closest to target node v § Surprise: Geo-routing finds the short paths (for appropriate distance measure) u 43
Communication network: Clustering § How many triangles are closed? § Clustering normally decays as k-1 § Communication network is highly clustered: k-0. 37 High clustering Low clustering 44
Communication Network Connectivity 45
k-Cores decomposition § What is the structure of the core of the network? 46
k-Cores: core of the network § People with k<20 are the periphery § Core is composed of 79 people, each having 68 edges among them 47
Network robustness § We delete nodes (in some order) and observe how network falls apart: § Number of edges deleted § Size of largest connected component 48
Robustness: Nodes vs. Edges 49
Robustness: Connectivity 50
Conclusion § A first look at planetary scale social network § The largest social network analyzed § Strong presence of homophily: people that communicate share attributes § Well connected: in only few hops one can research most of the network § Very robust: Many (random) people can be removed and the network is still connected 51
References § Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale Views on an Instant. Messaging Network, 2007 § http: //www. cs. cmu. edu/~jure 52