Modeling the Spread of Influence on the Blogosphere
Modeling the Spread of Influence on the Blogosphere Akshay Java, Pranam Kolari, Tim Finin, and Tim Oates UMBC Tech Report 04/12/06
Outline n n n What is influence? Basic Influence Model Influence models for the blogosphere Results Conclusions
What is Influence? Main Entry: in·flu·ence Pronunciation: 'in-"flü-&n(t)s, esp Southern in-' Function: noun Etymology: Middle English, from Middle French, from Medieval Latin influentia, from Latin influent-, influens, present participle of influere to flow in, from in- + fluere to flow -- more at FLUID 1 a : an ethereal fluid held to flow from the stars and to affect the actions of humans b : an emanation of occult power held to derive from stars 2 : an emanation of spiritual or moral force the act or power of producing an effect without apparent exertion of force or direct exercise of command b : 3 a: corrupt interference with authority for personal gain 4 : the power or capacity of causing an effect in indirect or intangible ways : SWAY 5 : one that exerts influence - under the influence : affected by alcohol : DRUNK <was arrested for driving under the influence> NOT This Kind of Influence! ; -)
Motivation n Influence models studied for cocitation graphs q n Applies to blogs also. q n David Kempe, Jon Kleinberg, Eva Tardos Maximizing the Spread of Influence through a Social Network, KDD 2003 Recent Examples: Startups, Microsoft Origami, Walmart, Do. D GOAL: Predict influential blogs q Target nodes to help achieve a “Tipping Point”* * The Tipping Point: Malcolm Gladwell
Influence on the Blogosphere Post was Influenced by NPR, e. Week
Influence Models for the Blogosphere Blog Graph Influence Graph 1/3 2 U 2 3 V 2/5 1 1 1 3 1/3 1 1/5 5 4 5 2/5 1/2 4 1/2 U links to V => U is Influenced by V Wu, v = Cu, v / dv
Basic Influence Models n Linear Threshold Model Influence Graph 1/3 Σ bvw ≥ θv 2 w is the active neighbor of v n Cascade Model 2/5 θv 1 3 1/3 1 1/5 Pvw - probability with which a node can activate each of its neighbors, independent of history. 1 Active 2/5 1/2 5 Active 4 Inactive 1/2
Node Selection Heuristics n Inlinks q n Centrality q n Expensive to compute for every large graphs Page. Rank q q n Easily spammed Requires link information However, is easy to compute Greedy Heuristic q q Computationally expensive However performs better
Effect of Splogs on Node Selection (indegree vs pagerank) Almost 54% of the links were from splogs/failed to splogs/failed!
Effect of Splogs on Inlinks rank Tightly Knit Community of Splog URL #inlinks 1 http: //www. livejournal. com/users/pics 3072 2 http: //www. boing. net 2191 3 http: //www. dailykos. com 2017 4 http: //www. engadget. com 1942 5 http: //profiles. blogdrive. com 1526 6 http: //michellemalkin. com 1242 7 http: //www. opinionjournal. com 1232 8 http: //instapundit. com 1187 9 http: //slashdot. org 1124 10 http: //www. powerlineblog. com 909 11 http: //www. huffingtonpost. com/theblog 905 12 http: //corner. nationalreview. com 853 13 http: //www. talkingpointsmemo. com 733 14 http: //www. captainsquartersblog. com/mt 728 15 http: //espn-presents 2003 -world-seriesofpoker. blogspot. com 711 16 http: //3 -world-series-of-poker-online-3. blogspot. com 711 17 http: //worldseries-of-poker-network-tv-show. blogspot. com 711 18 http: //wsop 2003. blogspot. com 711 19 http: //wsop-bracelet 1. blogspot. com 711 20 http: //worldseries-poker. blogspot. com 711 21 http: //worldseries-of-poker-official. blogspot. com 711 22 http: //worldseries-of-poker-wsop. blogspot. com 711 23 http: //world-series-of-poker-nocd-patch 66. blogspot. com 711
Influence Models (without splog detection) Number of nodes selected
Influence Models (After splog removal)
Influence Models (w. r. t. Technorati Ranks)
Conlusions n n n Influence models can be applied to blogs not just cocitation graphs Splogs are a problem Greedy heuristics work well, pagerank is an inexpensive approximation
Ideas for CIKM 06 n n Good or bad influence? Associating sentiment with links. Finding influential blogs for a topic. (SVM accuracy 7585%) n Community structure of blogs.
n n Questions Comments/ Feedback? Thanks! Acknowledgement: q Buzzmetrics/Blogpulse for the dataset.
- Slides: 16