Applications of Relative Importance o Why is relative

  • Slides: 30
Download presentation
Applications of Relative Importance o Why is relative importance interesting? n n Web Social

Applications of Relative Importance o Why is relative importance interesting? n n Web Social Networks Citation Graphs Biological Data o Graphs become too complex for manual analysis

Existing Techniques o Web n Page. Rank (Google) o Social Networks n ‘Centrality’ o

Existing Techniques o Web n Page. Rank (Google) o Social Networks n ‘Centrality’ o All focus on global measures of node importance – we’re interested in importance relative to a set of root nodes R

Use Existing Techniques? o Use global algorithm on the subgraph surrounding root nodes? o

Use Existing Techniques? o Use global algorithm on the subgraph surrounding root nodes? o No preferential treatment of root nodes – just ranking surrounding nodes.

Organization: Relative importance Algorithms n n Notation Problem Formulation General Framework Algorithms

Organization: Relative importance Algorithms n n Notation Problem Formulation General Framework Algorithms

Notation o Digraph n G = (V, E) o Edges n Ordered pair of

Notation o Digraph n G = (V, E) o Edges n Ordered pair of nodes (u, v) o Graphs are directed, unweighted, simple o Walks from u to v n n a. k. a. n A walk is a path with no repeated nodes 5

Notation o k-short paths o P(u, v) – set of paths between u and

Notation o k-short paths o P(u, v) – set of paths between u and v o – set of distinct out-going edges from u o o Similarly, we have

Problem Formulation 1. Given G and r and t, where , compute the “importance”

Problem Formulation 1. Given G and r and t, where , compute the “importance” of t w. r. t. root node r:

Problem Formulation 2. Given G and node vertices in T(G), T , rank all

Problem Formulation 2. Given G and node vertices in T(G), T , rank all V, w. r. t. r.

Problem Formulation 3. Given G, a set of nodes T(G) to rank, and a

Problem Formulation 3. Given G, a set of nodes T(G) to rank, and a set of root nodes R(G) where R V, rank all vertices in T w. r. t. R. This is similar to the last case, except that we compute rather than Average importance:

Problem Formulation (3 cont’d. ) o Rather than average each node’s importance score, we

Problem Formulation (3 cont’d. ) o Rather than average each node’s importance score, we could define o This requires ‘important’ nodes to have a high importance score among all nodes in R

Problem Formulation 4. Given G, rank all nodes where R=T=V.

Problem Formulation 4. Given G, rank all nodes where R=T=V.

General Framework: Weighted Paths o Nodes are related according to the paths that connect

General Framework: Weighted Paths o Nodes are related according to the paths that connect them o The longer the path, the less importance: is a scalar coefficient, P(r, t) is a set of paths from r to t, pi is the ith path in P. Importance decays exponentially

How to choose P(r, t)? o Path examples a. b. Shortest paths from R

How to choose P(r, t)? o Path examples a. b. Shortest paths from R to T: {R-C-T. R-D-T} which fail to capture much of Connectivity from R to T.

Shortest Path o e. g. : Transport cargo from r to t o Shortest

Shortest Path o e. g. : Transport cargo from r to t o Shortest path doesn’t always give a good approximation of importance. n E. g: the web (graph b)

k-Short Paths of length K o Idea: there might often be longer paths than

k-Short Paths of length K o Idea: there might often be longer paths than the shortest ones that are important to take into account o Fixes problem of longer, important paths in Shortest Paths n e. g. : graph b. , 3 -short o Problem: capacity constraints n e. g. : network topology

k-Short Node-Disjoint Paths o No nodes and no edges are repeated n Implicitly enforces

k-Short Node-Disjoint Paths o No nodes and no edges are repeated n Implicitly enforces capacity constraints n Motivated by ‘mass flow’ where importance can ‘flow’ along paths n e. g. : graph b. o Breadth-first with some heuristic, with some K and some

Markov Chains & Relative Importance o Graph viewed as a stochastic process n Explanation

Markov Chains & Relative Importance o Graph viewed as a stochastic process n Explanation of Markov Chains n Token traversing Chain… n Obviously good for modeling the web

Markov Chains & Relative Importance o Markov Centrality n Mean First Passage Time :

Markov Chains & Relative Importance o Markov Centrality n Mean First Passage Time : expected number of steps until first arrival at node t starting at node r : probability that the chain first returns to state t in exactly n steps

Markov Chains & Relative Importance n Bias toward ‘central nodes’ n COMPLEX!! o Time:

Markov Chains & Relative Importance n Bias toward ‘central nodes’ n COMPLEX!! o Time: O(|V|3) (inversion of |V|x|V| transition matrix) o Space: O(|V 2|)

Markov Chains & Relative Importance o Page. Rank n Uses backlinks to assign importance

Markov Chains & Relative Importance o Page. Rank n Uses backlinks to assign importance to web pages

Markov Chains & Relative Importance o Page. Rank n Less complex Converges logarithmically n

Markov Chains & Relative Importance o Page. Rank n Less complex Converges logarithmically n 322 million links processed in 52 iterations

Markov Chains & Relative Importance o Retrofit Page. Rank such that all nodes in

Markov Chains & Relative Importance o Retrofit Page. Rank such that all nodes in R have a uniform bias at the start o ‘Surfer’ begins at a root node, traverses graph, returning to root set R with probability at each time-step o I(t|R) = probability that surfer visits t during a walk

Experiments (Simulated Data)

Experiments (Simulated Data)

Experiments (Simulated Data) o More complex n in and out degrees changed n Shortest

Experiments (Simulated Data) o More complex n in and out degrees changed n Shortest path lengths between nodes changed (e. g. : A-B) o Analysis which follows, R={A, F} 24

Experiments (Simulated Data) o HITSPa A F G C E H D J I

Experiments (Simulated Data) o HITSPa A F G C E H D J I B . 252. 241. 128. 110. 099. 052. 032. 025. 032. 024 o HITSPh F A D B E I H J G C . 225. 186. 162. 119. 090. 067. 061. 050. 028. 008 25

Experiments (Simulated Data) o Markov. C o KSMarkov J C G H E I

Experiments (Simulated Data) o Markov. C o KSMarkov J C G H E I F D A B . 180. 133. 130. 129. 111. 101. 069. 051. 047. 044 H G E J C I F D A B . 146. 142. 140. 120. 098. 087. 061. 034. 024 26

Experiments (9/11 Terrorist Network) o 63 nodes (terrorists) o 308 edges (interactions)

Experiments (9/11 Terrorist Network) o 63 nodes (terrorists) o 308 edges (interactions)

Rank PRank. P HITSP WKPaths Markov. C KSMarkov 1 Khemais Beghal Atta Khemais 2

Rank PRank. P HITSP WKPaths Markov. C KSMarkov 1 Khemais Beghal Atta Khemais 2 Beghal Khemais Al-Shehhi Beghal 3 Moussaoui Atta Moussaoui Al-Shibh Moussaoui 4 Maaroufi Moussaoui Maaroufi 5 Qatada Maaroufi Bensakhria Jarrah Qatada 6 Daoudi Qatada Daoudi Hanjour Daoudi 7 Courtaillier Bensakhria Qatada Al-Omari Bensakhria 8 Bensakhria Daoudi Walid Khemais Courtaillier 9 Walid Courtaillier Qatada Walid Khammoun Bahaji Khammoun 10

Conclusion o Provides a first-step to addressing ‘relative-importance’ o Scaling for algorithms such as

Conclusion o Provides a first-step to addressing ‘relative-importance’ o Scaling for algorithms such as Markov Chaining can be an issue o Using different algorithms and comparing results can reveal interesting information o …Paper Analysis…

References o White, Smyth. Algorithms for Estimating Relative Importance in Networks. SIGKDD ’ 03.

References o White, Smyth. Algorithms for Estimating Relative Importance in Networks. SIGKDD ’ 03. o Page, Brin, Motwani, Winograd. The Page. Rank Citation Ranking: Bringing Order to the Web. Stanford University, Computer Science Department Technical Report. o Wikipedia on Markov Chains n n http: //en. wikipedia. org/wiki/Markov_chain http: //en. wikipedia. org/wiki/Examples_of_Markov_chains