Deco Declarative Crowdsourcing Litian Ma Presentation Outline Overview

  • Slides: 28
Download presentation
Deco: Declarative Crowdsourcing Litian Ma

Deco: Declarative Crowdsourcing Litian Ma

Presentation Outline � Overview � Running Examples � Data Model � Query Language �

Presentation Outline � Overview � Running Examples � Data Model � Query Language � Query Processing � System Architecture � Experiments

Overview � Conventional data management, incorporate “human computation”. Declarative queries DBMS like thing Web

Overview � Conventional data management, incorporate “human computation”. Declarative queries DBMS like thing Web

Main Challenges � Resolve disagreeing human opinions. � How does database system interact with

Main Challenges � Resolve disagreeing human opinions. � How does database system interact with human workers? How to enable usage of external sources in addition to crowd? � Right data model and query language. � Materialization � Efficient of crowdsourced data. query processor.

Running Example � Restaurant(name, � Add. Info(address, rating, cuisine) city, zip)

Running Example � Restaurant(name, � Add. Info(address, rating, cuisine) city, zip)

Data Model Schema designer relations and other stuff End user relations Conceptual Schema automatic

Data Model Schema designer relations and other stuff End user relations Conceptual Schema automatic (system) Raw Schema DBMS

Conceptual Schema � Relations ◦ Like Restaurant and Addr. Info. � Partitioning of attributes

Conceptual Schema � Relations ◦ Like Restaurant and Addr. Info. � Partitioning of attributes in conceptual relation ◦ Anchor attributes (Identifier). ◦ Dependent attribute-groups (Property). � Fetch rules ◦ How to obtain data from external sources including humans. � Resolution rules ◦ Reconcile inconsistent or uncertain values.

Raw Schema � Tables � For actually stored in DBMS. each relation R in

Raw Schema � Tables � For actually stored in DBMS. each relation R in the conceptual schema: ◦ One anchor table whose attributes are the anchor attributes of R ◦ One dependent table for each dependent attributegroup D in R, containing the attributes in the resolution rule for D.

 • Fetch-Resolve-Join Sequence is a logical concept. • May interleave. • No materialization

• Fetch-Resolve-Join Sequence is a logical concept. • May interleave. • No materialization for conceptual data. Component of Data Model

Conceptual Relations - Restaurant � Restaurant(name, address, rating, cuisine) � Addr. Info(address, city, zip)

Conceptual Relations - Restaurant � Restaurant(name, address, rating, cuisine) � Addr. Info(address, city, zip) Enclose dependent attribute-groups � Restaurant(name, address, [rating], [cuisine]) � Addr. Info(address, [city, zip])

Resolution Rules �

Resolution Rules �

Resolution Rules

Resolution Rules

Fetch Rules �

Fetch Rules �

Fetch Rules Verification

Fetch Rules Verification

Raw Schema � Recap: ◦ One anchor table ◦ One dependent table for each

Raw Schema � Recap: ◦ One anchor table ◦ One dependent table for each dependent attributegroup. Rest. A(name, address) Rest. D 1(name, address, rating) Rest. D 2(name, cuisine) Addr. A(address) Addr. D 1(address, city, zip)

Valid Instance � Starting with the current contents of the raw tables and logically

Valid Instance � Starting with the current contents of the raw tables and logically performing: ◦ Fetch: add tuples to Deco tables. ◦ Resolve: resolve dependent attributes. ◦ Join: full outerjoin of Deco tables for each relation. � Resulting in a set of data for the conceptual relations. � Logical steps, not necessarily perform, not necessarily in order.

Metadata � Extra column in the raw tables. � Not first-class of data model,

Metadata � Extra column in the raw tables. � Not first-class of data model, but crucial for some crowdsourcing applications. � Deal with messy aspects of using crowdsourced data. � Examples: ◦ ◦ ◦ Data expiration Worker quality Voting Confidence scores Etc.

Query Language �A Deco query Q is a relational query over the conceptual relations.

Query Language �A Deco query Q is a relational query over the conceptual relations. The answer to Q is the result of evaluating Q over some valid instance of the database. Empty! At. Least 5 At least 5 tuples of non-NULL attributes will return.

Query Processing � Push-Pull Hybrid Execution Model ◦ Incremental Push �Borrow ideas from incremental

Query Processing � Push-Pull Hybrid Execution Model ◦ Incremental Push �Borrow ideas from incremental view maintenance. �Result of a fetch rule -> update to one/more base tables -> propagated to view (conceptual table). ◦ Asynchronous Pull �Borrow ideas from asynchronous iteration. �Initiate multiple new fetches in parallel and feed more tuples back to plan ASAP. ◦ Two Phase �Materialization: try to answer using raw tables. �Accretion: Issues fetch rules to obtain more results.

Query Plans

Query Plans

Query Plans

Query Plans

System Design

System Design

Experiment Setup County(name, [language], [capital])

Experiment Setup County(name, [language], [capital])

Benchmark Query County(name, [language], [capital])

Benchmark Query County(name, [language], [capital])

Experiment 1 - Fetch Configurations

Experiment 1 - Fetch Configurations

Experiment 2 – Query Plans � Plan “Down” � Plan “Up” ◦ Push all

Experiment 2 – Query Plans � Plan “Down” � Plan “Up” ◦ Push all predicates down as much as possible ◦ Similar to reverse fetch query plan. ◦ Predicate pull-up transformation. ◦ Similar to filter later query plan.

Experiment 2 – Query Plans

Experiment 2 – Query Plans

Thanks!

Thanks!