I Tube You Tube Everybody Tubes Pablo Rodriguez
I Tube, You Tube, Everybody Tubes… Pablo Rodriguez Telefonica Research Barcelona add image
You. Tube Video Example
“Content is NOT king” 3
Content Explosion Internet Number of TV channels infinite How digitalto search content? cable 100 40 broadcast 3 1950 analog cable 1980 1995 today Time 4
Aggregation and Recommendation Infinite Choice = Overwhelming Confusion Filters required to connect users with content that appeal to their interests 5
Video and Social Networks �Trends in video services Users generate new videos Users help each other finding videos �Need to understand users and contents Video characteristics in You. Tube User-behavior and potential for recommendations 6
Particularities of “bite-size bits for high-speed munching” [Wired mag. Mar 2007] �Plethora of You. Tube clones �UGC is very different How different? 7
UGC vs. Non-UGC �Massive production scale 15 days in You. Tube to produce 120 -yr worth of movies in IMDb! �Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years �Short video length 30 sec– 5 min vs. 100 min movies in Love. Film the rest: consumption patterns 8
User Participation/Finding Videos �Despite Web 2. 0 features, user participation remains low Only 0. 16%-0. 22% viewers rate videos/comment. � 47% videos have pointers from external sites But requests from such sites account for less than 3% of the total views 9
Goals and Data Goals Data Potential for recommendation systems? Popularity evolution Content Duplication Crawled You. Tube and other UGC systems metadata: video ID, length, views 1. 6 M Entertainment, 250 KScience videos 10
Part 1: Popularity Distribution • • Static popularity characteristics Underlying mechanism 11
Pareto Principle Fraction of aggregate views 10% popular videos account for 80% total views Other online Vo. D systems show smaller skew! Normalized video ranking 12
Dominant Power-Law Behavior �Richer-get-richer principle Frequency (log) If video has K views, then users will watch the video with rate K a y=x - word frequency - citations of papers - scale of earthquakes - web hits City population (log) 13
UGC Video Distribution Straight-line waists and truncated both ends 14
Focusing on Popular Videos �Why popular videos deviate from power-law? �Fetch-at-most-once [SOSP 2003] Behavior of fetching immutable objects once cf. visiting popular web sites many times 15
Why the Unpopular Tail Falls Off �Natural shape is curved �Sampling bias or pre-filters Publishers tend to upload interesting videos �Information filtering or post-filters Search results or suggestions favor popular items 16
Impact of Post-Filters Videos exposed longer to filtering effect appear more truncated video rank 17
Is it Naturally Curved? Matlab curve fitting for Science videos Exponential Zipf + exp cutoff Zipf Log-normal 18
Is it Naturally Curved? Matlab curve fitting for Science videos Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf + and truncation is due to bottlenecks Exponential exp cutoff Log-normal 19
Implication of Our Findings demand for products that is suppressed by “ Latent bottlenecks in the system ” [Chris Anderson, The Long Tail] Entertainment Views 40% additional views! How? Personalized recommendation Enriched metadata Abundant videos Rankings 20
Part 2: Popularity Evolution • Relationship between popularity and age 21
Popularity Evolution � So far, we focused on static popularity � Now focus on popularity dynamics � How requests on any given day are distributed across the video age? � 6 -day daily trace of Science videos Step 1 - Group videos requested at least once by age Step 2 - Count request volume per age group 22
Request Volume Across Age User preference relatively insensitive to age --> 80% requests on videos older than a month The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly 23
Part 4: Content Duplication • • Level of duplication Birth of duplicates 24
Content Duplication � Alias- identical or similar copies of the same content � Aliases dilute popularity of a single event Views distributed across multiple copies Difficulty in recommendation & ranking systems � Test with 51 volunteers Find alias using keyword search Identified 1, 224 aliases for 184 original videos 25
The Level of Popularity Dilution Popularity diluted up to few-orders magnitude Often aliases got more requests than original (e. g. alias got >1000 times more requests) 26
How Late Aliases Appear? Significant aliases appear within one week Within the first day of posting the original video, sometimes you get more than 80 aliases 27
Conclusions � UGC is a new form of video social interaction � User interaction remains low � Lots of potential for social recommendations 28
Questions? Dataset available at http: //an. kaist. ac. kr/traces/IMC 2007. html 29
- Slides: 29