Information Diffusion in Social Media Kristina Lerman University

  • Slides: 45
Download presentation
Information Diffusion in Social Media Kristina Lerman University of Southern California CS 599: Social

Information Diffusion in Social Media Kristina Lerman University of Southern California CS 599: Social Media Analysis University of Southern California 1

Information diffusion on Twitter follower graph

Information diffusion on Twitter follower graph

Diffusion on networks • The spread of disease, ideas, behaviors, … on a network

Diffusion on networks • The spread of disease, ideas, behaviors, … on a network can be described as a contagion process where an active node (infected/informed/adopted) activates its non-active neighbors with some probability – … creates a cascade on a network • How large do cascades become? • What determines their growth?

Diffusion models • Complex response: infection requires multiple exposures. • Non-monotonic exposure response Exposure

Diffusion models • Complex response: infection requires multiple exposures. • Non-monotonic exposure response Exposure response function Complex contagion Threshold model 1 fiki number infected neighbors infection prob. 1 number infected neighbors

Epidemic diffusion model • Infected nodes propagate contagion to susceptible neighbors with probability m

Epidemic diffusion model • Infected nodes propagate contagion to susceptible neighbors with probability m (transmissibility or virality of contagion) Exposure response function infected infection prob. 1 exposed number infected neighbors

Epidemic threshold • Epidemic threshold t: – For m < t, localized cascades (epidemic

Epidemic threshold • Epidemic threshold t: – For m < t, localized cascades (epidemic dies out) – For m > t, global cascades • Epidemic threshold depends on topology only: largest eigenvalue of adjacency matrix of the network – True for any network Cascade size N 0 Epidemic threshold Transmissibility, m

Differences in the Mechanics of Information Diffusion across Topics: Idioms, Political Hashtags and Complex

Differences in the Mechanics of Information Diffusion across Topics: Idioms, Political Hashtags and Complex Contagion on Twitter Daniel M Romero, Brendan Meeder and Jon Kleinberg Presentation by Aswin Rajkumar

Motivation and Contribution • Information Diffusion and Topics - Eg: Controversial political topics have

Motivation and Contribution • Information Diffusion and Topics - Eg: Controversial political topics have high information diffusion. - Scientific study of the variation in diffusion mechanics across topics. • Contribution of the paper - Empirical analysis of real world data - Observation that the mechanics of spread can be defined using two variables, stickiness and persistence. - Confirmation of sociological theories found in the offline world – diffusion of innovations

The Study – How? • Twitter – Dataset, a snapshot covering a large number

The Study – How? • Twitter – Dataset, a snapshot covering a large number of tweets over a period of several months (Aug 09 to Jan 10) • 3 billion messages from over 60 million users • #Hashtag – Tokens, Top 500 Hashtags • @Mention – Network, Neighbor Set t mentions from X to Y, t = 3 Why? Shows X’s attention to Y.

The Study – What? • Adoption and Spread of Hashtags - Diffusion • Topics

The Study – What? • Adoption and Spread of Hashtags - Diffusion • Topics – Politics, Celebrity, Music, Movies, Games, Idioms, Sports and Technology • Stickiness - the probability that a piece of information will pass from a person who knows or mentions it to another person who is exposed to it. • Persistence and “Complex Contagion”, a principle from sociology. Persistence - the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects on adoption. Rate of decay.

Complex Contagion Complex contagion refers to the phenomenon in social networks in which multiple

Complex Contagion Complex contagion refers to the phenomenon in social networks in which multiple sources of exposure to an innovation are required before an individual adopts the change of behavior. - Wikipedia

P(K) Stickiness Persistence

P(K) Stickiness Persistence

Analysis – Stickiness and Persistence • Take the top 500 hashtags • Classify them

Analysis – Stickiness and Persistence • Take the top 500 hashtags • Classify them into 8 topics or categories • Construct p(k) curves for each hashtag and average them separately within each category • Compare the shapes Political Hashtags – High Stickiness and Persistence Twitter Idioms – High Stickiness, Low Persistence #mw 2, #mafiawars #lost, #newmoon #mj, #brazilwantsjb #pandora, #thisiswar #obama, #hcr #cricket, #nhl #photoshop, #digg

Twitter Idioms #cantlivewithout #iloveitwhen #musicmonday #followfriday

Twitter Idioms #cantlivewithout #iloveitwhen #musicmonday #followfriday

Analysis – Subgraph Structure • Interconnections among early adopters • Subgraphs for political hashtags

Analysis – Subgraph Structure • Interconnections among early adopters • Subgraphs for political hashtags - High in-degree, large number of triangles. • Tie Strength – Strong, Weak. Credit : Bridge-talent. com

Exposure Curve - Definitions • K-exposed – A user is k-exposed to a tag

Exposure Curve - Definitions • K-exposed – A user is k-exposed to a tag h if he has not used h, but is connected to k other users who have used h in the past. • What’s the probability that a k-exposed user u will use hashtag h in the future? 1) Ordinal Time Estimate Probability of a k-exposed user u using hashtag h before becoming k+1 exposed. P(k) = I(k) / E(k) – number of k-exposed users I(k) – number of k-exposed users who used h before becoming k+1 exposed. 2) Snapshot Estimate Similar, but based on time. E(k) – numer of users k-exposed at t 1. I(k) – number of users k-exposed at t 1 and used h before t 2 P(k) = I(k) / E(k) -> Exposure Curve

Comparison Parameters • Persistence Parameter F(P) = A(P) / R(P) A(P) – Area under

Comparison Parameters • Persistence Parameter F(P) = A(P) / R(P) A(P) – Area under P curve. R(P) – Area under the rectangle of length K and height max(P(k)) Curve comparisons Increases rapidly and falls vs Increases slowly and saturates vs Rapid Increase • Stickiness Parameter M(P) = Max(P(K))

Plots F(P) = A(P) / R(P) -> Persistence Parameter M(P) = Max(P(K)) -> Stickiness

Plots F(P) = A(P) / R(P) -> Persistence Parameter M(P) = Max(P(K)) -> Stickiness

Improvements and Related Work • @Mention network is not very representative. Also, attention should

Improvements and Related Work • @Mention network is not very representative. Also, attention should be from Y to X. • Considers only average persistence. Median and variance should be analyzed too. • Other types of networks. Eg: Blogs. [Gruhl, Guha, Nowell, Tomkins Information Diffusion through Blogspace]. • Influence on Online Behavior. Eg: Games. [Woo, Kang, Kim – The Contagion of Malicious Behaviors in Online Games] • Network structure is dynamic in real life. [Bano, Holthoefer, Wang, Moreno, Bailon – Diffusion Dynamics with Changing Network Composition ]

Conclusion • Hashtags of different topics exhibit different mechanics of spread. Politically controversial hashtags

Conclusion • Hashtags of different topics exhibit different mechanics of spread. Politically controversial hashtags have the highest diffusion. • Information diffusion depends on the probability of users adopting a hashtag after repeated exposure to it. Depends on the magnitude of the probabilities as well as the rate of decay • Confirms the sociological theory of complex contagion • Higher in-degree and stronger ties results in better spread.

Questions?

Questions?

What Stops Social Epidemics? (Ver Steeg et al. ) • Why do information cascades

What Stops Social Epidemics? (Ver Steeg et al. ) • Why do information cascades in social media – Grow quickly initially – But remain much smaller than predicted by epidemic models? • Information cascades differ from viral contagion: – Response to repeated exposure is important on Digg (and Twitter) – Drastically alters predictions about size of epidemics

Social news: • Users submit or vote for (infected by) news stories • Social

Social news: • Users submit or vote for (infected by) news stories • Social network – Users follow ‘friends’ to see • Stories friends submit • Stories friends vote for • Trending stories – Digg promotes most popular stories to its Top News page

How large are cascades in social media? Number of people who share a message

How large are cascades in social media? Number of people who share a message (with a URL) Digg 3. 5 K URLs 258 K users 1. 7 M edges Twitter 70 K URLs 700 K users 36 M edges Most cascades less than 1% of total network size! [Lerman et al. “Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs” ar. Xiv: 1202. 3162]

Why are these cascades so small? Standard model of epidemic growth Most cascades fall

Why are these cascades so small? Standard model of epidemic growth Most cascades fall in this range Transmissibility, m Transmissibility of almost all Digg stories fall within width of this line? ! (Heterogenous mean field theory, SIR model, same degree distribution as Digg)

Maybe graph structure is responsible? ← Mean field prediction (same degree dist. ) ←

Maybe graph structure is responsible? ← Mean field prediction (same degree dist. ) ← Simulated cascades on a random graph with same degree dist. epidemic threshold Simulated cascades on the observed Digg graph Transmissibility m clustering reduces epidemic threshold and cascade size, but not enough!

What about the spreading mechanism? Infected Not Infected ?

What about the spreading mechanism? Infected Not Infected ?

Are repeat exposures a big effect? Yes, more than half of the users are

Are repeat exposures a big effect? Yes, more than half of the users are exposed to the same information more than once!

How do people respond to repeated exposure? Exposure response Not much. We have similar

How do people respond to repeated exposure? Exposure response Not much. We have similar results for Twitter ------Also noted by Romero, et al, WWW 2011 29

Big consequences for cascade growth • Most people are exposed to a story more

Big consequences for cascade growth • Most people are exposed to a story more than once • Repeated exposures have little effect • Growth of epidemics is severely curtailed (especially compared to Ind. Cascade Model) 30

Weak response to repeated exposures suppresses outbreaks Take effect of repeat exposure into account:

Weak response to repeated exposures suppresses outbreaks Take effect of repeat exposure into account: Actual Digg cascades Epidemic threshold unchanged Result of simulations m*, Transmissibility λ* 31

How Limited Visibility and Divided Attention Constrain Social Contagion (Hodas & Lerman, 2012) •

How Limited Visibility and Divided Attention Constrain Social Contagion (Hodas & Lerman, 2012) • Questions – How do people respond to exposures to information by friends on social media? – What role does content play in information diffusion? • Findings – Users have finite ability to process information • Most recently received messages are retweeted, the rest are overlooked • Highly connected users (hubs) are far less likely to retweet any message they receive than poorly connected people – Reduced susceptibility of hubs to “infections” explains why cascades are small

Mechanics of information diffusion User must see an item and find it interesting before

Mechanics of information diffusion User must see an item and find it interesting before he/she can spread it (e. g. , by retweeting it, voting for or liking it, …) See? Cognitive Interface Interesting? Tastes Content Respond Retweet

Cognitive factors: Position bias • People pay more attention to items at the top

Cognitive factors: Position bias • People pay more attention to items at the top of the screen or a list of items [Payne, The Art of Asking Questions (1951) ] [Buscher et al, CHI’ 09] [Counts & Fisher ICWSM’ 11] … limits how far down the list/page the user navigates

Measuring position bias • Amazon Mechanical Turk experiments • Users were asked to recommend

Measuring position bias • Amazon Mechanical Turk experiments • Users were asked to recommend science stories • We controlled the order stories were presented to users Position bias: stories at top list positions received more recommendations [Lerman & Hogg (2014) “Leveraging position bias to improve peer recommendation” in Plos One.

Position bias creates a “limited attention” prob. to view post visibility post near the

Position bias creates a “limited attention” prob. to view post visibility post near the top is most likely to be seen position new post at top of user’s screen

Position bias creates a “limited attention” … some time later: newer posts appear at

Position bias creates a “limited attention” … some time later: newer posts appear at the top position post is less likely to be seen prob. to view post

Position bias and number of friends … some time later: newer posts appear at

Position bias and number of friends … some time later: newer posts appear at the top few friends many friends post is less likely to be seen same age post is even less visible to a highly connected user

Friends are a source of distraction users with more friends are more active users

Friends are a source of distraction users with more friends are more active users with more friends are distracted by more content nf • Limited attention makes hubs less susceptible to ‘infection’

Users retweet most recent messages high connectivity users “Time Response Function” low connectivity users

Users retweet most recent messages high connectivity users “Time Response Function” low connectivity users • • Users retweet newest messages (at the top of their screen) Hubs are much less likely to retweet an older message

Does content matter? probability to tweet a message visibility “virality” Estimated virality

Does content matter? probability to tweet a message visibility “virality” Estimated virality

Do “viral” messages spread farther? ln(“virality”) … “viral” messages can reach many or few

Do “viral” messages spread farther? ln(“virality”) … “viral” messages can reach many or few people

How do people respond to multiple exposures? Exposure response Number of tweeting friends •

How do people respond to multiple exposures? Exposure response Number of tweeting friends • Is this evidence for complex contagion?

“Complex contagion”- artifact of heterogeneity low connectivity users high connectivity users • Breaking down

“Complex contagion”- artifact of heterogeneity low connectivity users high connectivity users • Breaking down exposure response by different subpopulations, separated according to number of friends they follow, reveals simple, monotonic response

Summary • “A meme is not a virus” – Information spread ≠ Disease spread

Summary • “A meme is not a virus” – Information spread ≠ Disease spread • Big consequences for modeling information spread in social media • Highly connected people (hubs) act as fire walls to information spread – They have a hard time finding messages in their stream People have a finite capacity to process information; the more messages they receive, the less likely they are to respond to any given one – Information overload actually reduces the size of information cascades