Data Exchange with DataMetadata Translations MAD Algorithm Paolo
- Slides: 40
Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Hernández Wang-Chiew Tan
Data Exchange “Scientia potentia est” • What is Data Exchange? : • The process of taking data built under a source schema and transforming it into data built under a target schema • Data Exchange is the restructuring of data
Data Exchange – why? 1. Today when companies merge they also merge information sources.
Data Exchange – why? 2. When several institutions are working on a joint venture – a combined database is
Data Exchange – why? 3. Refreshing and updating data base scheme
Few problems with data exchange 1. The labels in the Source Schema and the values Target Schema could be very different 2. Data could be kept in a plethora of ways For instance: Car price could be stored in Shekels and in U. S dollars 3. Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well
Data Exchange v In the past Data Exchange was done manually, taking many resources such as time and money. v Many researchers struggle with ways of improving data exchange
Antique Car Dealership Schema Buy-A-Wreck Car Model price Agent-id Vauxhall 14 360, 000 48 Ford 430, 000 66 Model T Id Name 48 66 Car cars model Comm ission Nigel Dodds Vauxhall 14 0. 03 Ian Paisley T 0. 04 Ford Car AGENTS Schema Clunkers –R-Us Location List-price Automobil Seniorit e y Agentname Belfast, NR 650000 Morris 8 Gerry Adams Newry, NR 500000 Bentley Mark V 2 1 Martin Mc. Guiness Clunker table
Matching Examples Schema Buy-A-Wreck Name Nigel Dodds Ian Paisley Schema Clunkers –R-Us Agent- name Nigel Dodds Ian Paisley
Matching Examples Schema Buy-A-Wreck Car model Vauxhall 14 Ford T Schema Clunkers –R-Us Automobile Vauxhall 14 Ford T
Matching Examples Schema Buy-A-Wreck Car type price Agent-id Vauxh 14 all 360, 000 48 Model 430, 000 T 66 Ford Id Commission 48 0. 03 66 0. 04 cars Car AGENTS Schema Clunkers –R-Us List-price Car model 370800 Vauxhall 14 447200 Ford Model T
How do we match? • Creating mappings: 1. schema matching: find matches 2. create query expressions: for automated data translation or exchange Schema Matching Create Query expressions
Data Exchange 1. There may be no way to transform an instance given all of our constraints. 2. There may be numerous ways to transform the instance (possibly infinitely many). 3. We must identify and justify a best suited choice of solutions for our need.
Data Exchange - Summery To conclude: 1. Data exchange is exchanging data from a Source Schema to a Target Schema 2. It is a greatly dealt problem in the computerized world 3. Some Data exchange scenarios deal with Metadata Source schema S S Target schema T T
What is Metadata? • Metadata: Data on Data. Metadata can come as: Video Audio Image Text
Why Do we need Meta – Data? Meta-Data helps us to understand data Can anyone tell what these numbers mean? Jan 120 223 89 Feb 83 168 56
Why Do we need Meta – Data? After adding Meta-Data… Umbrella Sales Month USA UK Italy Jan 120 223 89 Feb 83 168 56
Why Do we need Meta – Data? We all know this picture…
Why Do we need Meta – Data? What is this picture all about?
Why Do we need Meta – Data? Sir Edward Carson signing the Ulster Covenant
Why Do we need Meta – Data?
Why Do we need Meta – Data? Wall Street, New York City, New York.
Data-Metadata Translations 23 • Data exchange scenarios may involve metadata transformations. • Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world.
Data-Metadata Translations • • Mapping systems support Data-to-Data transformations with fixed schemas (Clio). Goal: Extend mapping systems to support Data-Metadata Translations.
Data Exchange Clio • One software developed for simple graphic data exchange is “Clio” • Clio corresponded values between the source scheme and the target scheme • However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata • the solution involving Metadata is based on Clio
Clio interface
Metadata-to-Data How can we transform the following source data into the corresponding target? Source. Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target. Sales month country units Jan Jan Feb Feb USA UK Italy 120 223 89 83 168 56 Schema mapping m 1 Source: Rcd Target: Rcd Sales: Set. Of Rcd month “USA” USA country UK units Italy 27 m 1: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “USA” and $t. units = $s. USA
Metadata-to-Data How can we transform the following source data into the corresponding target? Source. Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target. Sales month country units Jan Jan Feb Feb USA UK Italy 120 223 89 83 168 56 Schema mapping m 2 Source: Rcd Sales: Set. Of Rcd month “UK” USA UK Italy 28 Target: Rcd Sales: Set. Of Rcd month country units m 1: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “USA” and $t. units = $s. USA m 2: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “UK” and $t. units = $s. UK
Metadata-to-Data How can we transform the following source data into the corresponding target? Source. Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target. Sales month country units Jan Jan Feb Feb USA UK Italy 120 223 89 83 168 56 Schema mapping m 3 Source: Rcd Sales: Set. Of Rcd month “Italy” USA UK Italy 29 Target: Rcd Sales: Set. Of Rcd month country units m 1: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “USA” and $t. units = $s. USA m 2: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “UK” and $t. units = $s. UK m 3: for $s in Source. Sales exists $t in Target. Sales where $t. month = $s. month and $t. country = “Italy” and $t. units = $s. Italy
Metadata-to-Data: Our solution Metadat. A-Data (MAD) mapping: Source: Rcd Sales: Set. Of Rcd month USA countries UK label Italy Target: Rcd Sales: Set. Of Rcd month country units Source. Sales Jan 120 223 89 Feb 83 168 56 value Select the elements to group for exists where and 30 Placeholder $s in Source. Sales, $c in {“USA”, “UK”, “Italy”} $t in Target. Sales $t. month = $s. month $t. country = $c $t. units = $s. ($c) Copy elements’ labels Target. Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Copy elements’ values Set of labels (strings) Is a label value Dynamic selection of the source element
Data-to-Metadata Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Source: Rcd Target: Rcd Stock. Ticker: Set. Of Rcd Stockquotes: Set. Of Rcd time symbols Dynamic price label value element Run-time: The dynamic element defines the target instance and the target schema. 31
Data-to-Metadata: Heterogeneous records Consider this mapping and this source instance: Source: Rcd Target: Rcd Stock. Ticker: Set. Of Rcd Stockquotes: Set. Of Rcd time symbols price label value Source Instance Stock. Ticker (time: 0900, Symbol : MSFT, Stock. Ticker (time: 0900, Symbol : IBM, Stock. Ticker (time: 0905, Symbol : MSFT, First alternative: Heterogeneous target records Computed Target Instance Stockquotes (time: 0900, MSFT: 27. 20 ) Stockquotes (time: 0900, IBM: 120. 00 ) Stockquotes (time: 0905, MSFT: 27. 30 ) Price: 27. 20 ) Price: 120. 00 ) Price: 27. 30 ) There are two possible interpretations for the target ndos: Computed Target Schema Target: Rcd Stockquotes: Set. Of Rcd time symbols: Choice MSFT IBM
Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Source: Rcd Target: Rcd Stock. Ticker: Set. Of Rcd Stockquotes: Set. Of Rcd time symbols price label value Source Instance Stock. Ticker (time: 0900, Symbol : MSFT Stock. Ticker (time: 0900, Symbol : IBM Stock. Ticker (time: 0905, Symbol : MSFT Price: 27. 20 ) Price: 120. 00 ) Price: 27. 30 ) Second alternative: Homogeneous target records Computed Target Instance Stockquotes (time: 0900, MSFT: 27. 20, IBM: null ) Stockquotes (time: 0900, MSFT: null , IBM: 120. 00 ) Stockquotes (time: 0905, MSFT: 27. 30, IBM: null ) There are two possible interpretations for the target ndos: Computed Target Schema Target: Rcd Stockquotes: Set. Of Rcd time MSFT IBM
Data-to-Metadata: Homogenous records Source: Rcd Target: Rcd Stock. Ticker: Set. Of Rcd Stockquotes: Set. Of Rcd time symbols price label value Stockquotes(time: 0900, MSFT : 27. 20 ) Stockquotes(time: 0900, IBM : 120. 00 ) Stockquotes(time: 0905, MSFT : 27. 30 ) Natural solution for semistructured data models (XSD, DTD, JSON) Homogeneity Constraint: “For every pair of tuples t 1 and t 2, if a is a label in t 1, then a is a label in t 2” Stockquotes(time: 0900, MSFT : 27. 20, IBM: null ) Stockquotes(time: 0900, MSFT : null , IBM: 120. 00) Stockquotes(time: 0905, MSFT : 27. 30, IBM: null ) 34 The Homogenous approach is a MAD improvemnet
MAD Mapping Metadat. A-Data(MAD) mapping three steps: 1. Preliminary mapping Ø How do we map the Source schema to the Target schema Ø Preliminary mapping for <<D>> includes the metadata label and the value label of <<D>>.
Target. Sales Source. Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 36 { $x 1 Source. Sales. By. Countries, $x 2 <<countries>>; $x 3=$x 1. ($x 2) } Source: Rcd Sales. By. Countries: month USA UK Italy Set. Of Rcd countries label value Target: Rcd Sales: Set. Of Rcd month country units Jan Jan Feb Feb USA UK Italy 120 223 89 83 168 56 Preliminary Mapping Label Value Transfer
MAD Mapping 2. Skeletons: Ø n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i, j) can be potential mapping. 3. Creating MAD Mapping: Ø At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons. Source. Sales. country Target. Country. Sales. country Matched against one or more source mappings 37 Matched against one or more target mappings
MAD Mapping Generation Example Source : { $x 1 Source. Sales. By. Country, $x 2 <<countries>>; $x 3: =$x 1. ($x 2) } Target : { $y 1 Target. Sales} Source. Sales. By. Countries. <<countries>> Target. Sales. country Source. Sales. By. Countries. &<<countries>> Target. Sales. units Source: Rcd Target: Rcd Sales. By. Country: Set. Of Rcd Sales: Set. Of Rcd month USA country countries UK units label Italy value
MAD vs Clio Data exchange with data-metadata support: Data to Data is a special case Source schema S GUI Target schema T Declarative (internal) representation Executable code (XSLT, XSLT XQuery, Java) Java § New construct to iterate over elements’ labels: placeholder § Target schema can be incomplete: nested dynamic output schema (ndos) § New mapping & query generation algorithms
Fin. 40
- Real exchange rate vs nominal exchange rate
- Voluntary exchange
- Gas exchange key events in gas exchange
- Composée de deux translations
- Reflection rotation translation dilation
- Translations of trigonometric graphs
- Translate
- Translation vs reflection
- Translations between user and system
- Lesson 9-2 transformations
- 9-2 translations answers
- Relation de chasles
- Lesson 9-2 translations
- 9-2 translations
- Horizontal phase shift
- Reflection symmetry examples
- Translations rotations and reflections
- Translations reflections and rotations
- Translations art
- Translations of quadratic functions
- What are the properties of translations
- Lesson 9-2 transformations
- Congruence transformation
- L
- Translation et vecteur
- Translations brian friel summary
- Lesson 1 translations
- Turkish bible translations
- Back translations
- Translations of absolute value functions
- Translate the following sentences into spanish
- The politics of translation spivak summary
- Algebraic translations
- Journalistic document translations
- Journalistic translation definition
- Reflection translation rotation dilation
- Translation shapes examples
- Translation du vecteur
- How to find absolute deviation
- A* vs ao* algorithm
- Adri wessels