Schema Mediation and Query Processing in Peer Data






























![Some facts n n n [Madhavan and Halevy] The number of composed mappings does Some facts n n n [Madhavan and Halevy] The number of composed mappings does](https://slidetodoc.com/presentation_image_h2/345c172aadd544a7c3909130d4cc4253/image-31.jpg)











- Slides: 42
Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006 1
Preliminaries n Datalog head q n Code body Q(x) : - Airport(x, Vancouver) City SEA Seattle YVR Vancouver Mapping for heterogeneous schemas q q n Airport: Correspondences between two schemas A media for exchanging data, transferring queries, etc PDMS (Peer Data Management System) q q q Each peer has a database Peer can leave or join the network voluntarily Mappings between some peers are provided 2
A general query answering case in PDMS Local Schema UBC Local Database UBC Mapping UBC_UW Local Schema UW Local Database UW Mapping UW_UT Local Schema UT Local Database UT 3
A general query answering case in PDMS Query Q over UBC Local Schema UBC Local Database UBC Query Q” over UT Query Q’ over UW Mapping UBC_UW Local Schema UW Local Database UW Mapping UW_UT Local Schema UT Local Database UT 4
Previous methods can only access in the local schema Assume relation: conf-paper(title, venue, year, pages) Local Schema UW Assume relation: conf-paper(title, venue, year, URL) Mapping UW_UBC Local Database UW Local Schema UBC Local Database UBC Query that a UW user can ask: q(x) : - conf-paper(t, v, y, x). He can never ask information about URL !!! 5
What we’d like to improve… n n n Want to access more information, e. g. url Get rid of the restrictive query format, e. g. local schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity raised by mapping composition Make good use of indirect mapping information We have a method for mediated schema creation in PDMS that solves all of these 6
Challenges n n n How to create the mediated schema without a centralized authority? How to result in the same mediated schema wherever mediation starts? How can an automatically created mediated schema be comprehensible to users? How can human intervention be minimized? Where to store the mediated schema, and how to update it? 7
Related Work n n n Bernstein et al. : a vision to incorporate the database research into the P 2 P scenario Piazza project: provides a complete prototype for query answering in PDMS Fagin et al. : use SO logic as mapping language He. PTo. X: XQuery reformulation Hyperion: uses both data-level and schema-level mappings to specify the correspondences between acquainted peers Peer. DB: use keywords as the basis for relation matching 8
Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study 9
Introducing concept into conjunctive mappings n A conjunctive mapping is in the following form: conf-paper(title, venue, yr) : UW. conf-paper(title, venue, yr, pages) conf-paper(title, venue, yr) : UBC. conf-paper(title, venue, yr, URL) q IDB name: “conf-paper” q Component: each Data. Log query above is a component q Subgoal: each relation in the body, e. g. “UW. conf-paper(title, venue, yr, pages)” 10
Introducing concept into conjunctive mappings (Cont. ) n n Intuitively, a concept describes the common object across different schemas Informally, two mappings CM 1 and CM 2 have the same concept if: q q q CM 1 and CM 2 have the same IDB names Q 1 and Q 2 that are constructed by overlapped subgoals of CM 1 and CM 2 are equivalent Subgoals should be compatible 11
Introducing concept into conjunctive mappings (Cont. ) n Mappings that express the same concept: q Mapping 1, from UW to UBC: Paper(title, venue): -UW. paper(title, venue, yr, pages) Paper(title, venue): -UBC. paper(title, venue, author, URL) q Mapping 2, from UBC to UT: Paper(title, author): -UBC. paper(title, venue, author, URL) Paper(title, author): -UT. paper(title, author, area) n Mappings that do not express the same concept: q Mapping 1, from A to B Manager(x, y) : - A. Mgr(x, y) Manager(x, y) : - B. Mgr 1(x, y) q Mapping 2, from B to C Manager(x) : - B. Mgr 1(x, x) Manager(x) : - C. Self. Mgr(x) n Mapping Compatible Check before merge 12
Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 13
Pottinger’s Schema Mediation Algorithm for DIS Mapping M_UBC Local Schema UW Mediated Schema M Mapping UW_UBC Local Database UW q Mapping M_UW Local Schema UBC Local Database UBC Base of our approach 14
Peer Schema Mediation – How the system works 15
Schema Mediation Strategy n n As explained in previous slide Merging two schemas is based on Mapping. Tables 16
Mapping. Table creation n Purpose: q q q Relate a relation in M for concept with subgoals from mappings Transform unstructured mapping information to structured forms Easy to reconstruct original mapping from the Mapping. Tables Indirect mapping information can easily be represented in Mapping. Table; hard to do by using mappings Example: 17
Merge Two Mapping. Tables n The Mapping. Table merging process follows the general principles: q q q Related attributes should be positioned in the same column Un-related attributes are in different columns Overlapping local relations in the two Mapping. Tables are how we determine the indirect mapping information 18
Merge Two Mapping. Tables (Cont. ) M 3: result of merging M 1 and M 2 19
Compute GLAV Mappings for Each Local Peer 20
21
Query Reformulation n Reformulate Queries in both directions q q Q over E Q’ over M Q over E 22
Information that each peer maintains in the system set-up phase n Each peer stores: q q q E’s local database schema A list of mappings between E and its acquaintances A current version of mediated schema M Mapping. Table set corresponds to M GLAV mappings from M to E 23
Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 24
Adding a Peer to the Network n n Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built applications? q Keep transforming info to make old applications still usable (a) Right after the system setup phase (b) Sometime later, D joins… 25
Dropping a Peer from the Network n n Strategy One: A peer’s leaving the network triggers a schema mediation process from the very beginning q BAD: too much system work assigned for schema mediation only Strategy Two: Re-do the schema mediation once every assigned period q Two ways to know X is leaving: 1. 2. q n X notifies any other node before departure Other peer PINs or communicates with X BAD: Previously-created mediated schema will be useless Strategy Three: q X leaves without notifying others q X’s acquaintance Y will recognize X’s leaving q Y compute the new mediated schema q BAD: n n Y needs to be able to recognize which relation in the Mapping. Table comes from X Peers can easily lose connection with others 26
Dropping a Peer from the Network (Cont. ) n Strategy Four: X wants to leave: q q Ø Ø • X calculates a new mediated schema X assigns its acquaintance another acquaintance from its acquaintance list “Removal” operator: given M and X that is to be removed, compute the remaining part Removing part: can be relations, attributes in relations Good because • All previously constructed applications can still be available • All peers are still connected • No redundant work will be resulted: won’t start from the beginning 27
Information that each peer maintains in the system-steady state n Each peer stores the following information: q q Local schema Mappings to its acquaintances Current mediated schema, Mapping. Tables, and mappings to its own schema Previous versions of mediated schema that local peer has applications built on it, and mappings to the new mediated schema 28
Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 29
A study of Mapping Composition n Me. PSys only considers input mappings to be: q q n Mappings with the same Concept Ignoring such complicated factors as self-join and self-restrictive components Our approach is transferring the problem of mapping composition into another: using the mediated schema to relate different schemas 30
Some facts n n n [Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings [Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings [Fagin et al. ] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic 31
Analysis for the Study n n n We compared Piazza, SO logic algorithm and Me. PSys Whether Piazza method is expressive or not depends entirely on whether existential attributes in the second schema are mapped to the third schema The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components q n Me. PSys do not handle patterns with self-restrictive q n n However, results are hard to understand Mappings in such patterns do not support concepts Me. PSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable. 32
Outline n n n Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study 33
System Settings n n n Free. Pastry q A P 2 P network layer, using efficient routing strategy q Each node maintains a routing table q Keeps track of its immediate neighbors. q Provides the functionality of notifying applications of message arrival, node failures, etc. Emulab q Network emulation testbed q Access to different machines to emulate nodes in real network q 900 M memory with 2992. 787 MHz processor Input schemas and mappings q Input schema follows TCP-H standard q Avg num of acquaintances per peer q Avg num of relations per peer schema q Avg num of attributes in a relation 34
Experiment 1: Schema Mediation in Me. PSys 35
Experiment 2: Query Reformulation n For queries with similar size (less than 1 k), time can be decidable 36
Experiment 2: Query Reformulation (Cont. ) n In the maximum case, 10 times query reformulation only takes 2% of the total time 37
Experiment 3: Updating the Mediated Schema n n Computing a new mediated schema always takes less than 2% of the total time Updating almost takes no time 38
Our contributions n n n Me. PSys, in which a mediated schema is created dynamically and any information in the network can be queried without additional global services Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources Introduce the idea of automatically detecting specific Concepts in mappings Study on how mapping composition impacts query reformulation with existing approaches Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of Me. PSys 39
Future Work n n n Explore the semantic issues when a broader range of mappings are considered, i. e. , mappings with self-join, mappings with different IDB names, etc More optimization issues to be considered in the future system Design better approach to update the mediated schema for local schema evolution 40
Acknowledgement 41
Thank you! Questions? 42