Link Building Martin Olsen Department of Computer Science
Link Building Martin Olsen Department of Computer Science Aarhus University 1
Outline • Motivation and Introduction • Contribution ü ü Link Building Communities in Networks Hedonic Games Simple Games 2
What is Search Engine Optimization (SEO) ? Objective of SEO: A link to your page appears here on page 1 • . . . in 2012, companies will spend almost $9 billion on search engine optimization … The New York Times, January 2009 3
www as a Graph = = 4
Random Surfer Zaps with probability 0. 15 Page. Rank. Random Surfer Perspective 1 2 4 100 3 100 5 6 7 8 9 10 100 1000 random surfers 5
Random Surfer Zaps with probability 0. 15 Page. Rank. Random Surfer Perspective 1 2 143 = 85 + 85/2 +15 3 270 355 = 4 85 + 15 4 5 6 7 8 9 10 100 15 58 15 15 1000 random surfers Distribution after one tick 6
Random Surfer Zaps with probability 0. 15 Page. Rank. Random Surfer Perspective 1 2 281 3 280 66 4 5 6 7 8 9 10 254 15 43 15 15 1000 random surfers Stationary distribution after 50 ticks 7
Random Surfer Zaps with probability 0. 15 Page. Rank. Random Surfer Perspective 1 2 0. 281 3 0. 280 0. 066 4 5 6 7 8 9 10 0. 254 0. 015 0. 043 0. 015 8
Random Surfer Zaps with probability 0. 15 Page. Rank. Random Surfer Perspective 1 2 0. 281 3 0. 280 0. 066 4 5 6 7 8 9 10 0. 254 0. 015 0. 043 0. 015 Page. Ranking: 1, 2, 4, 3, 6 Page. Rank is an important ingredient of the ranking mechanism Relevance counts as well! 9
Link Building is an Important Aspect of SEO 10
Contribution/Link Building The Computational Complexity of Link Building (Cocoon ´ 08) Olsen Maximizing Page. Rank with new Backlinks (submitted) Olsen MILP for Link Building (In preparation) Olsen, Viglas 11
The Link Building Problem. Formal Definition LINK BUILDING Instance : G(V, E), t V, k Z+ Solution : S V {t} with S k maximizing t after adding S {t} to E 12
Link Building is not Trivial 2 0. 096 0. 091 7 6 0. 060 1 0. 272 8 0. 250 0. 085 2 0. 039 0. 049 7 6 1 0. 367 8 0. 331 0. 070 3 4 0. 049 5 0. 060 4 0. 069 5 0. 035 3 2 0. 078 0. 054 0. 042 7 6 0. 042 1 0. 375 8 0. 337 0. 054 3 4 0. 054 5 0. 042 13
Page. Rank Topology Theorem*) i 1 1 increase in Page. Rank j : The expected number of visits to p for a random surfer starting at u prior to the first zapping event 14
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING j i Does the graph contain an independent set of size k? Can we turn this question into a Link Building problem? 15
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING j x y 1 i OPT! Basic idea: Make zij relatively big 16
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING j *) : LINK BUILDING is W[1]-hard LINK BUILDING solvable in time f(k) nc k-REGULAR INDEPENDENT SET solvable in time f(k) nc y x W[1] = FPT 1 Another result: FPTAS for LINK i BUILDING NP = P OPT! Basic idea: Make zij relatively big 17
Upper Bound: k = 1 fixed 2 0. 096 0. 091 7 6 2 0. 060 1 0. 272 8 0. 250 0. 085 5 0. 078 0. 060 3 7 4 6 0. 069 0. 070 0. 048 1 0. 338 8 0. 306 0. 048 3 4 0. 060 5 0. 070 The dashed link can be found in time corresponding to O(1) Page. Rank computations with a randomized scheme *). 18
Upper Bound: Mixed Integer Linear Programming Approach *) Price for link from i Compute the cheapest set of new incoming links that would make node 5 rank highest 2 0. 061 0. 099 7 6 0. 036 1 0. 187 8 0. 178 0. 189 3 4 0. 049 5 0. 200 19
A Quiz: Which of the two situations would be optimal for Martin? 20
Contribution/Communities in Networks Communities in Large Networks: Identification and Ranking (WAW ´ 06) Olsen 21
Communities in Networks Dolphins in Doubtful Sound [Newman, Girvan ´ 04]: 22
What is a Community? Informally: A community C is a set of nodes with relatively many links between them Assumption/Observation: A CS site has relatively many CS links! Formal definition based on assumption *) : v C, u C: wv. C ≤ wu. C C 23
A Greedy Approach for Detecting Members of a Community *) 1) Old C 2) New C Repeat until C is a Community: • Find v C with maximum attention to C • C C {v} • Update attentions Use two priority queues holding elements in C and V C 24
An Experiment. A Danish CS Community • Crawl of the dk-domain with 180. 468 sites in total • Representatives = 4 CS sites • CS-Community with 556 sites • Minimum attention, : 15. 8% • Maximum attention, : 15. 4% Ranking: 1) 2) 3) 4) 5) 6) www. daimi. au. dk (CS U Aarhus) www. diku. dk (CS U Copenhagen) www. itu. dk (ITU Copenhagen) www. cs. auc. dk (CS U Aalborg) www. brics. dk (CS Ph. D School) www. imm. dtu. dk (Informatics/Mathematical modeling DTU Copenhagen) … 17) www. imada. sdu. dk (CS/Mathematics U Southern Denmark) 25
Other Results Computing non trivial communities by the definition given is NP-hard A simple model for the evolution of communities is presented. These communities are probably obeying the definition for large n if the out degree of the nodes is (log n). 26
Contribution/Hedonic Games Nash Stability in Additively Separable Hedonic Games Is NP-Hard (Ci. E ´ 07) Olsen Extended version: Nash Stability in Additively Separable Hedonic Games and Community Structures (Theory of Computing Systems ´ 09) Olsen 27
An Additively Separable Hedonic Game Two buffaloes b 1 and b 2 that hate each other. They are only thirsty if they have a parasite on their back in which case they have to drink 9 l/h. Two gigantic parasites p 1 and p 2. They only want to sit on b 1 and b 2 respectively. Five waterholes w 1, …, w 5 with capacities 1, 2, 3, 4 and 8 l/h respectively. 28
An Additively Separable Hedonic Game One Nash Equilibrium for the game: PARTITION ≤ NE in ASHG NPC *) 29
Community Structures in Networks Put a 1 on each connection between two dolphins. The community structure is a NE! NE community structure? NE’s are NP-hard to compute even with symmetric and positive payoffs*) 30
Contribution/Simple Games On the Complexity of Problems on Simple Games (submitted) Freixas, Molinero, Olsen, Serna 31
Open Problems/Future Work • In thesis we show LINK BUILDING APX. Is there a PTAS for LINK BUILDING? • Surgical Link Building: ü Isolate the Community C ü Model all pages in V C as one page ü Use MILP • Use information on distribution of Page. Rank • Does the stuff presented really work? • Thank You! 32
Link Building. A Real World Example Dear X We are trying to get more links to our website to help improve its rating on the search engines. We were wondering if you could put a link to our site … on your webpage or blog. If you have a website or a Blog and put a link to our page on it then to say thank you for each month it is up, I will give you … Source: An e-mail to a colleague X 33
Link Building is not Trivial. 2 nd Example 1 Assumption: Obtaining a link from one green node is slightly better for node 1 compared to obtaining a link from one blue node. Now node 1 can pick three incoming links for free. What should node 1 choose? 34
No FPTAS for LINK BUILDING if NP ≠ P *) j x y 1 i OPT! 35
Power Law 36
Fixed Parameter Tractability: FPT and W[1] k-INDEPENDENT SET k-REGULAR INDEPENDENT SET FPT Complete for W[1] Solvable in time f(k) nc k-VERTEX COVER LINK BUILDING is W[1]-hard *) 37
Other Results Computing non trivial communities by the definition given is NP-hard A simple model for the evolution of communities is presented. C These communities are probably obeying the definition for large n if the out degree of the nodes is (log n). 38
Upper Bound: Mixed Integer Linear Programming Approach *) 2 0. 096 0. 091 7 6 price for 0. 060 1 0. 272 8 0. 250 0. 085 0. 078 0. 061 0. 099 3 7 4 6 0. 069 5 2 0. 036 1 0. 187 8 0. 178 0. 189 3 4 0. 049 5 0. 200 The dashed links show the cheapest modification that will bring node 5 to the top of the ranking. Computed using a MILP approach. Alternatively we could go for the maximum improvement in the ranking for a given budget. 39
- Slides: 39