Web of Objects Ralph Rabbat Peter Mika Philip
Web of Objects Ralph Rabbat Peter Mika Philip Bohannon October 2011
Outline Vision and opportunity Use cases Adopted approach 2 12/1/2020
Today’s world is a Web of Pages 3 12/1/2020
All these pages come from structured knowledge about people, places, and things MLB team 10% off tickets Is a for Chicago Cubs plays for Carlos Zambrano plays in Chicago from Barack Obama 4 12/1/2020
This underlying world is WOO—the Web of Objects MLB team 10% off tickets isa for Chicago Cubs plays for Carlos Zambrano plays in Chicago from Barack Obama 5 12/1/2020
Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse Carlos Zambrano Shopping Chicago Cubs Upcoming 10% off tickets for Local plays for isa Finance Entertainment Sports MLB team plays in Chicago from Scott Roy 6 12/1/2020
Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team isa 10% off tickets for Chicago Cubs plays for plays in Chicago Carlos Zambrano from Barack Obama 7 12/1/2020
Knowledge comes from many sources Show times and other information for US movies from source B Attributes Show times for Harry Potter and the Deathly Hallows part II Entities 8 12/1/2020
Combining these requires working with complementary, parallel, and overlapping sources Cast and show time information for global movies from licensed feeds Attributes Cast information for US movies from source A Cast information for global movies from Wikipedia Entities 9 12/1/2020
Attributes There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Information from structured data extraction on billions of Web pages Entities 10 12/1/2020
Value #1 — Breadth, depth, and accuracy at scale We show many entities we shouldn’t Up-to-date correct entities Real entities Dups, errors, and outdated entities No photo Incorrect store URL WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping No business hours 11 12/1/2020
Value #2 — Agility launching new experiences Answers instead of links Related knowledge in context WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB Emerging markets and tail pages The integrated KB lets us show relevant knowledge from one Yahoo property on other properties and off network The KB gets us deep into the tail by combining and blending knowledge from many sources 12 12/1/2020
Use Cases
Events, performers, and venues 14 12/1/2020
Local 15 12/1/2020
Shopping 16 12/1/2020
Search Enhanced result with deep links, rating, address. Faceted search to restrict results
Adopted Approach
Science at scale Data is ingested from web extraction, feeds, editorial content (billions of objects) Data integration using Hadoop clusters › Schema matching schema. org › Object reconciliation › Blending Data quality assessment Information extraction › Text, e. g. news content › Webpages Enrichment › Feature computation based on user behavior, social signals and web content Serving and ranking › Selecting the right objects to show by query, user, geography etc. Companies like greenplum, aster data(now Terradata) have built sql interfaces to grid but mostly limited to data analytics jobs. WOO is one of the few projects to build semantic knowledge system on grid
WOO ontology Primary use case is data validation › During information extraction and throughout the WOO platform › No reasoning OWL 2 ontology › Automatic documentation › Change management › Conversion to Yahoo internal schema language › Protégé OWL as editorial tool
WOO ontology cntd. Covers Yahoo’s domains of interest Movies, Music, TV, Business listings, Events, Finance, Sports, Autos, … › 250 classes and 800 properties (Sept, 2011) › Available only internally › Developed over 1. 5 years by Yahoo’s editorial team Aligned with schema. org › schema. org covers only a subset of the WOO ontology
Acknowledgements Tremendous effort from the team at Yahoo! • Yahoo! Knowledge and Personalization Group • Yahoo! Research • Editorial team • Operations • Web of Objects Engineering team • Quality Assurance • Yahoo! Properties owners: Search, Listings and Media. • Content Agility team • Cloud Platform team 22 12/1/2020
Schema. org Agreement between Bing, Google, and Yahoo on what markup webmasters should use › Help adoption by reducing fragmentation › Pre-competitive: each party will continue to build competing products independently Schema. org covers areas of interest to all three parties › Business listings (local), creative works (video), recipes, reviews › Expected to open up also to external contributions for non-core areas Schema. org is aligned with Yahoo’s own efforts in building a Web of Objects › Schema. org targets some of the core entity types
- Slides: 23