Maps of random walks on complex networks reveal

  • Slides: 67
Download presentation
Maps of random walks on complex networks reveal community structure Martin Rosvall 1, 2

Maps of random walks on complex networks reveal community structure Martin Rosvall 1, 2 and Carl T. Bergstrom 1, 2 1 Department of Biology, University of Washington, Seattle, WA 98195 -1800 2 Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501 Proceedings of the National Academy of Sciences (PNAS), 2007. Impact Factor: 9. 423 Presented by Ronen Hershman 1, 2 1 Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel 2 Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel

What is a Good Map? A good map balances between two qualities: o An

What is a Good Map? A good map balances between two qualities: o An oversimplification which leads to exclusion of important details. o An excess of data which can blur significant relationships. 2

Describing a Path on a Network A weighted network: 3

Describing a Path on a Network A weighted network: 3

Describing a Path on a Network A weighted network: 4

Describing a Path on a Network A weighted network: 4

5

5

Describing a Path on a Network Is there an alternative? ? Huffman Coding 6

Describing a Path on a Network Is there an alternative? ? Huffman Coding 6

Describing a Path on a Network Huffman Coding: Let’s look on the string: “Duke

Describing a Path on a Network Huffman Coding: Let’s look on the string: “Duke Blue Devils” The Duke Blue Devils men's basketball team is the college basketball program representing Duke University. 7

Describing a Path on a Network Huffman Coding: Let’s look on the string: “Duke

Describing a Path on a Network Huffman Coding: Let’s look on the string: “Duke Blue Devils” We have: e, 3 D, 2 u, 2 l, 2 sp, 2 k, 1 B, 1 v, 1 i, 1 s, 1 8

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2 sp, 2 k, 1 B, 1 v, 1 i, 1 s, 1 9

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2 sp, 2 k, 1 B, 1 v, 1 2 i, 1 s, 1 10

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2 sp, 2 k, 1 2 B, 1 v, 1 2 i, 1 s, 1 11

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2

Huffman Coding – how it works? e, 3 D, 2 u, 2 l, 2 sp, 2 3 k, 1 2 2 i, 1 s, 1 B, 1 v, 1 12

Huffman Coding – how it works? e, 3 D, 2 u, 2 4 l,

Huffman Coding – how it works? e, 3 D, 2 u, 2 4 l, 2 sp, 2 k, 1 3 2 2 i, 1 s, 1 B, 1 v, 1 13

Huffman Coding – how it works? e, 3 4 D, 2 4 u, 2

Huffman Coding – how it works? e, 3 4 D, 2 4 u, 2 l, 2 sp, 2 k, 1 3 2 2 i, 1 s, 1 B, 1 v, 1 14

Huffman Coding – how it works? e, 3 4 D, 2 4 u, 2

Huffman Coding – how it works? e, 3 4 D, 2 4 u, 2 5 l, 2 sp, 2 2 i, 1 3 s, 1 k, 1 2 B, 1 v, 1 15

Huffman Coding – how it works? 7 e, 3 4 4 D, 2 5

Huffman Coding – how it works? 7 e, 3 4 4 D, 2 5 l, 2 sp, 2 u, 2 2 i, 1 3 s, 1 k, 1 2 B, 1 v, 1 16

Huffman Coding – how it works? 7 e, 3 9 4 D, 2 4

Huffman Coding – how it works? 7 e, 3 9 4 D, 2 4 u, 2 5 l, 2 sp, 2 2 i, 1 3 s, 1 k, 1 2 B, 1 v, 1 17

Huffman Coding – how it works? 16 Now what? 7 e, 3 9 4

Huffman Coding – how it works? 16 Now what? 7 e, 3 9 4 D, 2 4 u, 2 5 l, 2 sp, 2 2 i, 1 3 s, 1 k, 1 2 B, 1 v, 1 18

Huffman Coding – how it works? 0 0 e, 3 7 1 16 0

Huffman Coding – how it works? 0 0 e, 3 7 1 16 0 1 0 D, 2 4 4 1 0 u, 2 l, 2 sp, 2 9 1 0 i, 1 2 5 1 1 0 s, 1 k, 1 3 1 0 2 1 B, 1 v, 1 e 00 D 010 u 011 l 100 sp 101 i 1100 s 1101 k 1110 B 11110 v 11111 19

Describing a Path on a Network A weighted network: 20

Describing a Path on a Network A weighted network: 20

Describing a Path on a Network A weighted network (Huffman coding): • Each codeword

Describing a Path on a Network A weighted network (Huffman coding): • Each codeword specifies a particular node, and the codeword lengths are derived from the frequencies of an infinitely long random walk. 21

Describing a Path on a Network A weighted network (Huffman coding): • we are

Describing a Path on a Network A weighted network (Huffman coding): • we are able to describe the specific 71 -step walk in 314 bits (instead of 355 bits in the uniform code). 22

Describing a Path on a Network A weighted network (Huffman coding): • 23

Describing a Path on a Network A weighted network (Huffman coding): • 23

Describing a Path on a Network • In general, we will not be interested

Describing a Path on a Network • In general, we will not be interested in the codewords themselves but rather in theoretical limit of how concisely we can specify the path. 24

25

25

26

26

27

27

28

28

29

29

Highlighting Important Objects We matched the length of codewords to the frequencies of their

Highlighting Important Objects We matched the length of codewords to the frequencies of their use. • This is efficient codewords for the nodes. • This is not a map! • We need to divide the network into two levels of description. 30

Highlighting Important Objects Two levels of description: • Retaining unique names for large-scale objects.

Highlighting Important Objects Two levels of description: • Retaining unique names for large-scale objects. • Reusing the names for the individual nodes within each module. 31

Highlighting Important Objects Two levels of description: How does it work? • Most U.

Highlighting Important Objects Two levels of description: How does it work? • Most U. S. cities have unique names. • Street names are reused from one city to the next: o “Main Street”. o “Broadway” and “Washington Avenue”. 32

Highlighting Important Objects Two levels of description: Allows us to describe the path in

Highlighting Important Objects Two levels of description: Allows us to describe the path in fewer bits than we could do with a one-level description. • A random walker is statistically likely to spend long periods of time within certain clusters of nodes. 33

Highlighting Important Objects Two levels of description: A weighted network: 34

Highlighting Important Objects Two levels of description: A weighted network: 34

Highlighting Important Objects Two levels of description: A weighted network: Huffman coding: 35

Highlighting Important Objects Two levels of description: A weighted network: Huffman coding: 35

Highlighting Important Objects Two levels of description: We give each cluster a unique name:

Highlighting Important Objects Two levels of description: We give each cluster a unique name: 36

Highlighting Important Objects Two levels of description: We use a different Huffman codes to

Highlighting Important Objects Two levels of description: We use a different Huffman codes to name the nodes within each cluster. • Module entries / exits coded as well. 37

Highlighting Important Objects Two levels of description: • 38

Highlighting Important Objects Two levels of description: • 38

Highlighting Important Objects Two levels of description: • 39

Highlighting Important Objects Two levels of description: • 39

40

40

Highlighting Important Objects We have duality: • Finding community structure in networks. • The

Highlighting Important Objects We have duality: • Finding community structure in networks. • The coding problem. Finding community structure in networks is equivalent to solving a coding problem! 41

42

42

Entropy of movement between modules Entropy of movements within modules 43

Entropy of movement between modules Entropy of movements within modules 43

44

44

45

45

Mapping Flow Compared with Maximizing Modularity • The traditional method o Disregarding the directions

Mapping Flow Compared with Maximizing Modularity • The traditional method o Disregarding the directions and the weights of the links. o Losing valuable information about the network structure. 46

47

47

48

48

Mapping Flow Compared with Maximizing Modularity Flow with long persistence times in, and limited

Mapping Flow Compared with Maximizing Modularity Flow with long persistence times in, and limited flow between. 49

Mapping Flow Compared with Maximizing Modularity Flow with long persistence times in, and limited

Mapping Flow Compared with Maximizing Modularity Flow with long persistence times in, and limited flow between. 50

Mapping Flow Compared with Maximizing Modularity Which method should a researcher use? It depends!

Mapping Flow Compared with Maximizing Modularity Which method should a researcher use? It depends! 51

Mapping Flow Compared with Maximizing Modularity Two methods: • Flow-based approaches such as the

Mapping Flow Compared with Maximizing Modularity Two methods: • Flow-based approaches such as the map equation. • Topological methods such as modularity Newman MEJ, Girvan M (2004) 52

Mapping Scientific Communication • Citation patterns among journals allow us to glimpse the flow

Mapping Scientific Communication • Citation patterns among journals allow us to glimpse the flow and provide the trace of communication between scientists. 53

Mapping Scientific Communication • 6, 128 journals in the sciences and social sciences. •

Mapping Scientific Communication • 6, 128 journals in the sciences and social sciences. • 6, 434, 916 citations in the cross-citation network represent a trace of the scientific activity during 2004. • Citations from articles published in 2004 to articles published in the previous 5 years. 54

Mapping Scientific Communication Exclude: • Journals that published < 12 articles per year •

Mapping Scientific Communication Exclude: • Journals that published < 12 articles per year • Journals that do not cite other journals within the data set. • The only three major journals that span a broad range of scientific disciplines: Science, Nature, and PNAS. 55

Mapping Scientific Communication 56

Mapping Scientific Communication 56

Mapping Scientific Communication 57

Mapping Scientific Communication 57

Mapping Scientific Communication 58

Mapping Scientific Communication 58

Mapping Scientific Communication 59

Mapping Scientific Communication 59

Mapping Scientific Communication 60

Mapping Scientific Communication 60

Mapping Scientific Communication 61

Mapping Scientific Communication 61

Mapping Scientific Communication 62

Mapping Scientific Communication 62

Mapping Scientific Communication 63

Mapping Scientific Communication 63

Mapping Scientific Communication 64

Mapping Scientific Communication 64

Mapping Scientific Communication 65

Mapping Scientific Communication 65

Mapping Scientific Communication 66

Mapping Scientific Communication 66

Discussion • The additional level of detail in the more narrowly focused map would

Discussion • The additional level of detail in the more narrowly focused map would have been clutter on the full map of science. • We must find that balance where we eliminate extraneous detail but highlight the relationships among important structures. 67