Schema Mediation and Query Processing in Peer Data

  • Slides: 42
Download presentation
Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor:

Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006 1

Preliminaries n Datalog head q n Code body Q(x) : - Airport(x, Vancouver) City

Preliminaries n Datalog head q n Code body Q(x) : - Airport(x, Vancouver) City SEA Seattle YVR Vancouver Mapping for heterogeneous schemas q q n Airport: Correspondences between two schemas A media for exchanging data, transferring queries, etc PDMS (Peer Data Management System) q q q Each peer has a database Peer can leave or join the network voluntarily Mappings between some peers are provided 2

A general query answering case in PDMS Local Schema UBC Local Database UBC Mapping

A general query answering case in PDMS Local Schema UBC Local Database UBC Mapping UBC_UW Local Schema UW Local Database UW Mapping UW_UT Local Schema UT Local Database UT 3

A general query answering case in PDMS Query Q over UBC Local Schema UBC

A general query answering case in PDMS Query Q over UBC Local Schema UBC Local Database UBC Query Q” over UT Query Q’ over UW Mapping UBC_UW Local Schema UW Local Database UW Mapping UW_UT Local Schema UT Local Database UT 4

Previous methods can only access in the local schema Assume relation: conf-paper(title, venue, year,

Previous methods can only access in the local schema Assume relation: conf-paper(title, venue, year, pages) Local Schema UW Assume relation: conf-paper(title, venue, year, URL) Mapping UW_UBC Local Database UW Local Schema UBC Local Database UBC Query that a UW user can ask: q(x) : - conf-paper(t, v, y, x). He can never ask information about URL !!! 5

What we’d like to improve… n n n Want to access more information, e.

What we’d like to improve… n n n Want to access more information, e. g. url Get rid of the restrictive query format, e. g. local schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity raised by mapping composition Make good use of indirect mapping information We have a method for mediated schema creation in PDMS that solves all of these 6

Challenges n n n How to create the mediated schema without a centralized authority?

Challenges n n n How to create the mediated schema without a centralized authority? How to result in the same mediated schema wherever mediation starts? How can an automatically created mediated schema be comprehensible to users? How can human intervention be minimized? Where to store the mediated schema, and how to update it? 7

Related Work n n n Bernstein et al. : a vision to incorporate the

Related Work n n n Bernstein et al. : a vision to incorporate the database research into the P 2 P scenario Piazza project: provides a complete prototype for query answering in PDMS Fagin et al. : use SO logic as mapping language He. PTo. X: XQuery reformulation Hyperion: uses both data-level and schema-level mappings to specify the correspondences between acquainted peers Peer. DB: use keywords as the basis for relation matching 8

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study 9

Introducing concept into conjunctive mappings n A conjunctive mapping is in the following form:

Introducing concept into conjunctive mappings n A conjunctive mapping is in the following form: conf-paper(title, venue, yr) : UW. conf-paper(title, venue, yr, pages) conf-paper(title, venue, yr) : UBC. conf-paper(title, venue, yr, URL) q IDB name: “conf-paper” q Component: each Data. Log query above is a component q Subgoal: each relation in the body, e. g. “UW. conf-paper(title, venue, yr, pages)” 10

Introducing concept into conjunctive mappings (Cont. ) n n Intuitively, a concept describes the

Introducing concept into conjunctive mappings (Cont. ) n n Intuitively, a concept describes the common object across different schemas Informally, two mappings CM 1 and CM 2 have the same concept if: q q q CM 1 and CM 2 have the same IDB names Q 1 and Q 2 that are constructed by overlapped subgoals of CM 1 and CM 2 are equivalent Subgoals should be compatible 11

Introducing concept into conjunctive mappings (Cont. ) n Mappings that express the same concept:

Introducing concept into conjunctive mappings (Cont. ) n Mappings that express the same concept: q Mapping 1, from UW to UBC: Paper(title, venue): -UW. paper(title, venue, yr, pages) Paper(title, venue): -UBC. paper(title, venue, author, URL) q Mapping 2, from UBC to UT: Paper(title, author): -UBC. paper(title, venue, author, URL) Paper(title, author): -UT. paper(title, author, area) n Mappings that do not express the same concept: q Mapping 1, from A to B Manager(x, y) : - A. Mgr(x, y) Manager(x, y) : - B. Mgr 1(x, y) q Mapping 2, from B to C Manager(x) : - B. Mgr 1(x, x) Manager(x) : - C. Self. Mgr(x) n Mapping Compatible Check before merge 12

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 13

Pottinger’s Schema Mediation Algorithm for DIS Mapping M_UBC Local Schema UW Mediated Schema M

Pottinger’s Schema Mediation Algorithm for DIS Mapping M_UBC Local Schema UW Mediated Schema M Mapping UW_UBC Local Database UW q Mapping M_UW Local Schema UBC Local Database UBC Base of our approach 14

Peer Schema Mediation – How the system works 15

Peer Schema Mediation – How the system works 15

Schema Mediation Strategy n n As explained in previous slide Merging two schemas is

Schema Mediation Strategy n n As explained in previous slide Merging two schemas is based on Mapping. Tables 16

Mapping. Table creation n Purpose: q q q Relate a relation in M for

Mapping. Table creation n Purpose: q q q Relate a relation in M for concept with subgoals from mappings Transform unstructured mapping information to structured forms Easy to reconstruct original mapping from the Mapping. Tables Indirect mapping information can easily be represented in Mapping. Table; hard to do by using mappings Example: 17

Merge Two Mapping. Tables n The Mapping. Table merging process follows the general principles:

Merge Two Mapping. Tables n The Mapping. Table merging process follows the general principles: q q q Related attributes should be positioned in the same column Un-related attributes are in different columns Overlapping local relations in the two Mapping. Tables are how we determine the indirect mapping information 18

Merge Two Mapping. Tables (Cont. ) M 3: result of merging M 1 and

Merge Two Mapping. Tables (Cont. ) M 3: result of merging M 1 and M 2 19

Compute GLAV Mappings for Each Local Peer 20

Compute GLAV Mappings for Each Local Peer 20

21

21

Query Reformulation n Reformulate Queries in both directions q q Q over E Q’

Query Reformulation n Reformulate Queries in both directions q q Q over E Q’ over M Q over E 22

Information that each peer maintains in the system set-up phase n Each peer stores:

Information that each peer maintains in the system set-up phase n Each peer stores: q q q E’s local database schema A list of mappings between E and its acquaintances A current version of mediated schema M Mapping. Table set corresponds to M GLAV mappings from M to E 23

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 24

Adding a Peer to the Network n n Some peer builds application over M

Adding a Peer to the Network n n Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built applications? q Keep transforming info to make old applications still usable (a) Right after the system setup phase (b) Sometime later, D joins… 25

Dropping a Peer from the Network n n Strategy One: A peer’s leaving the

Dropping a Peer from the Network n n Strategy One: A peer’s leaving the network triggers a schema mediation process from the very beginning q BAD: too much system work assigned for schema mediation only Strategy Two: Re-do the schema mediation once every assigned period q Two ways to know X is leaving: 1. 2. q n X notifies any other node before departure Other peer PINs or communicates with X BAD: Previously-created mediated schema will be useless Strategy Three: q X leaves without notifying others q X’s acquaintance Y will recognize X’s leaving q Y compute the new mediated schema q BAD: n n Y needs to be able to recognize which relation in the Mapping. Table comes from X Peers can easily lose connection with others 26

Dropping a Peer from the Network (Cont. ) n Strategy Four: X wants to

Dropping a Peer from the Network (Cont. ) n Strategy Four: X wants to leave: q q Ø Ø • X calculates a new mediated schema X assigns its acquaintance another acquaintance from its acquaintance list “Removal” operator: given M and X that is to be removed, compute the remaining part Removing part: can be relations, attributes in relations Good because • All previously constructed applications can still be available • All peers are still connected • No redundant work will be resulted: won’t start from the beginning 27

Information that each peer maintains in the system-steady state n Each peer stores the

Information that each peer maintains in the system-steady state n Each peer stores the following information: q q Local schema Mappings to its acquaintances Current mediated schema, Mapping. Tables, and mappings to its own schema Previous versions of mediated schema that local peer has applications built on it, and mappings to the new mediated schema 28

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 29

A study of Mapping Composition n Me. PSys only considers input mappings to be:

A study of Mapping Composition n Me. PSys only considers input mappings to be: q q n Mappings with the same Concept Ignoring such complicated factors as self-join and self-restrictive components Our approach is transferring the problem of mapping composition into another: using the mediated schema to relate different schemas 30

Some facts n n n [Madhavan and Halevy] The number of composed mappings does

Some facts n n n [Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings [Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings [Fagin et al. ] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic 31

Analysis for the Study n n n We compared Piazza, SO logic algorithm and

Analysis for the Study n n n We compared Piazza, SO logic algorithm and Me. PSys Whether Piazza method is expressive or not depends entirely on whether existential attributes in the second schema are mapped to the third schema The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components q n Me. PSys do not handle patterns with self-restrictive q n n However, results are hard to understand Mappings in such patterns do not support concepts Me. PSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable. 32

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated

Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 33

System Settings n n n Free. Pastry q A P 2 P network layer,

System Settings n n n Free. Pastry q A P 2 P network layer, using efficient routing strategy q Each node maintains a routing table q Keeps track of its immediate neighbors. q Provides the functionality of notifying applications of message arrival, node failures, etc. Emulab q Network emulation testbed q Access to different machines to emulate nodes in real network q 900 M memory with 2992. 787 MHz processor Input schemas and mappings q Input schema follows TCP-H standard q Avg num of acquaintances per peer q Avg num of relations per peer schema q Avg num of attributes in a relation 34

Experiment 1: Schema Mediation in Me. PSys 35

Experiment 1: Schema Mediation in Me. PSys 35

Experiment 2: Query Reformulation n For queries with similar size (less than 1 k),

Experiment 2: Query Reformulation n For queries with similar size (less than 1 k), time can be decidable 36

Experiment 2: Query Reformulation (Cont. ) n In the maximum case, 10 times query

Experiment 2: Query Reformulation (Cont. ) n In the maximum case, 10 times query reformulation only takes 2% of the total time 37

Experiment 3: Updating the Mediated Schema n n Computing a new mediated schema always

Experiment 3: Updating the Mediated Schema n n Computing a new mediated schema always takes less than 2% of the total time Updating almost takes no time 38

Our contributions n n n Me. PSys, in which a mediated schema is created

Our contributions n n n Me. PSys, in which a mediated schema is created dynamically and any information in the network can be queried without additional global services Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources Introduce the idea of automatically detecting specific Concepts in mappings Study on how mapping composition impacts query reformulation with existing approaches Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of Me. PSys 39

Future Work n n n Explore the semantic issues when a broader range of

Future Work n n n Explore the semantic issues when a broader range of mappings are considered, i. e. , mappings with self-join, mappings with different IDB names, etc More optimization issues to be considered in the future system Design better approach to update the mediated schema for local schema evolution 40

Acknowledgement 41

Acknowledgement 41

Thank you! Questions? 42

Thank you! Questions? 42