The Web Changes Everything How Dynamic Content Affects
The Web Changes Everything How Dynamic Content Affects the Way People Find Online Jaime Teevan Microsoft Research (CLUES) with Sue Dumais, Dan Liebling, Eytan Adar, Jon Elsas, Rich Hughes, Kevyn Collin-Thompson
Tour of Yesterday’s Web
Web Experience Ignores Dynamics Content Changes 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 People Revisit Today’s Browse and Search Experiences But, ignores …
Data Sets • Large scale Web crawl over time – Revisited pages • Unique users, visits/user, time between visits • 55, 000 pages crawled hourly for 18+ months – Judged pages (relevance to query) • 6 million pages crawled every two days for 6 months • Revisitation patterns – Query logs for “refinding” – Live Toolbar logs for “revisited pages” • Sample of 2. 3 million users for 5 weeks – User survey
Web Content Dynamics Content Changes 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Measuring Web Page Change 1. 2 • Summary metrics • Change curves – Fixed starting point – Measure similarity over different time intervals 1 Dice Similarity – Number of changes – Time between changes – Amount of change 0. 8 0. 6 0. 4 0. 2 0 Knot point Time from starting point
Measuring Within-Page Change • DOM structure changes • Term use changes – Divergence from norm • • • cookbooks salads cheese ingredient bbq – Staying power in page Sep. Oct. Nov. Dec. Time
Revisitation on the Web Content Changes 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 People Revisit
What is the Last Page You Visited? • • • Feb 08: Location Mar 08: Dates Jun 08: Call for papers Aug 08: Submit Oct 08: Response date? Nov 08: Formatting Nov 08: Dates Dec 08: Proceedings Jan 09: Registration Jan 09: Dates
Measuring Revisitation 1. 2 • Summary metrics – Unique visitors – Visits/user – Time between visits – Histogram of revisit intervals – Normalized 0. 8 Count • Revisitation curves 1 0. 6 0. 4 0. 2 0 Time Interval
Four Revisitation Patterns • Fast – Hub-and-spoke – Navigation within site • Hybrid – High quality fast pages • Medium – Popular homepages – Mail and Web applications • Slow – Entry pages, bank pages – Accessed via search engine
Search and Revisitation • How to measure? • Repeat query (33%) – mit cs department • Repeat click (39%) – http: //csail. mit. edu – Query csail • Big opportunity (43%) – Navigational (24%) Repeat Click New Click Repeat Query 33% 29% 4% New Query 67% 10% 57% 39% 61%
Repeat Clicks for Repeat Queries Repeat Click New Click 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Minutes Days Weeks
How Revisitation and Change Relate Content Changes 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 People Revisit
Possible Relationships • Interested in change – Monitor • Effect change – Transact • Change unimportant – Find • Change can interfere – Re-find
Understanding the Relationship • Compare summary metrics • Revisits: Unique visitors, visits/user, interval • Change: Number, interval, Dice Number of changes Time between changes Dice coefficient 2 visits/user 172. 91 133. 26 0. 82 3 visits/user 200. 51 119. 24 0. 82 4 visits/user 234. 32 109. 59 0. 81 5 or 6 visits/user 269. 63 94. 54 0. 82 7+ visits/user 341. 43 81. 80 0. 81
Comparing Change and Revisit Curves • Three pages – New York Times – Woot. com – Costco • Similar change patterns • Different revisitation – NYT: Fast (news, forums) – Woot: Medium – Costco: Slow (retail) NYT Woot 1. 2 1 0. 8 0. 6 0. 4 0. 2 0 Time Costco
Within-Page Relationship • Page elements change at different rates • Pages revisited at different rates • Resonance can serve as a filter for interesting content
Revisitation and Change • Web content changes – Page-level changes, within-page change • People revisit Web content • Relating revisitation and change allows us to – Identify pages for which change is important – Identify interesting components within a page • Building support using this relationship – Web browsers, search engines
Building Support: Web Browsers • Highlight interesting differences (Diff. IE) • Notify user of interesting changes • Choose content for mobile viewing • Build better history
Diff. IE toolbar Changes to page since last visit
Monitor
Find Expected New Content
Find Unexpected Important Content
Serendipitous Encounter
Understand a Web Page
Attend to Activity
Edit
Unexpected Expected Attend to Activity Find Expected New Content Serendipitous Encounter Find Unexpected Important Content Monitor Edit Understand a Web Page
Building Support: Search Engines • • Index dynamic and static content differently Crawl changed content that will be revisited Identify when people are open to advertising Surface change in snippet
Thank you. Jaime Teevan http: //research. microsoft. com/~teevan Change Adar, E. , J. Teevan, S. T. Dumais, and J. L. Elsas. The Web changes everything: Understanding the dynamics of Web Content. WSDM 2009 (Best Student Paper). Change Elsas, J. , S. Dumais, D. Liebling, K. Collins-Thompson. Temporal retrieval models. Under submission to SIGIR 2009. Revisitation Adar, E. , J. Teevan, and S. T. Dumais. Large scale analysis of Web revisitation patterns. CHI 2008 (Best Paper). Relationship Adar, E. , J. Teevan, and S. T. Dumais. Resonance on the Web: Web dynamics and revisitation patterns. CHI 2009. Relationship Teevan, J. , S. T. Dumais, D. J. Liebling, and R. Hughes. Changing How People View Changes on the Web. Under submission to UIST 2009.
- Slides: 34