DOTTORATO DI RICERCA IN INGEGNERIA DELLINFORMAZIONE XVI ciclo

  • Slides: 56
Download presentation
DOTTORATO DI RICERCA IN INGEGNERIA DELL’INFORMAZIONE XVI ciclo di dottorato - II ciclo Nuova

DOTTORATO DI RICERCA IN INGEGNERIA DELL’INFORMAZIONE XVI ciclo di dottorato - II ciclo Nuova Serie Dai Dati all’Informazione: il sistema MOMIS dott. ing. Francesco Guerra tutore: prof. Sonia Bergamaschi Francesco Guerra – DBGroup@unimo 1

Outline n n n Intelligent Integration of Information Matching The MOMIS system o o

Outline n n n Intelligent Integration of Information Matching The MOMIS system o o MOMIS in the Semantic Web MOMIS as the basis of a virtual marketplace MOMIS to manage collaborative processes (the WINK project) MOMIS as a semantic search engine (the SEWASIE project) Francesco Guerra – DBGroup@unimo 2

Intelligent Integration of Information o Distinguishing elements: n n n n Kinds of managed

Intelligent Integration of Information o Distinguishing elements: n n n n Kinds of managed sources The Global-as-View vs. the Local-as-View approach Data Model Building the Global View Querying the Global View Description Logics techniques Updating the Global View Francesco Guerra – DBGroup@unimo 3

Intelligent Integration of Information the systems Francesco Guerra – DBGroup@unimo 4

Intelligent Integration of Information the systems Francesco Guerra – DBGroup@unimo 4

Intelligent Integration of Information the systems Francesco Guerra – DBGroup@unimo 5

Intelligent Integration of Information the systems Francesco Guerra – DBGroup@unimo 5

Matching comparison o Distinguishing elements: n Different kinds of mappings representation (granularity, cardinality) n

Matching comparison o Distinguishing elements: n Different kinds of mappings representation (granularity, cardinality) n Mappings extraction (structure-instances analysis, lexical analysis, external tools exploitation) Francesco Guerra – DBGroup@unimo 6

Matching comparison Extended from : E. Rahm and P. A. Bernstein. A survey of

Matching comparison Extended from : E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching, VLDB Journal, 10(4): 334 -350, 2001 Francesco Guerra – DBGroup@unimo 7

Matching comparison Francesco Guerra – DBGroup@unimo 8

Matching comparison Francesco Guerra – DBGroup@unimo 8

The MOMIS System o MOMIS (Mediator envir. Onment for Multiple Information Sources) is a

The MOMIS System o MOMIS (Mediator envir. Onment for Multiple Information Sources) is a framework to perform information extraction and integration from both structured and semistructured data sources. n n An object-oriented language, with an underlying Description Logic, called ODL-I 3, derived from the standard ODMG is introduced for information extraction. Information integration is then performed in a semi-automatic way, by exploiting the knowledge in a Common Thesaurus and ODL-I 3 descriptions of source schemas with a combination of clustering techniques and Description Logics. This integration process gives rise to a virtual integrated view of the underlying sources (the Global Virtual View) for which mapping rules and integrity constraints are specified to handle heterogeneity. The MOMIS system, based on a conventional wrapper/mediator architecture, provides methods and open tools for data management in Internet-based information systems by using a CORBA-2 interface. MOMIS was developed as a joint collaboration between the University of Modena and Reggio Emilia and University of Milano and Brescia. Francesco Guerra – DBGroup@unimo 9

The MOMIS System Distributed information stored in multiple, heterogeneous sources • Sources integration provides

The MOMIS System Distributed information stored in multiple, heterogeneous sources • Sources integration provides a Global Schema (which is a virtual view) • the Global Schema allows the user to send a query and get a unified answer from all the involved sources (transparently) • All information in http: //www. dbgroup. unimo. it • INTERDATA (1999 -2000); D 2 I (from Data to Information) (2001 -2002) – “Programmi di ricerca scientifica di rilevante interesse nazionale”; WINK (Weblinked Integration of Network-based Knowledge) (2002 -2003); SEWASIE (Semantic Webs and Agent. S in Integrated Economies) (2002 -2005) Francesco Guerra – DBGroup@unimo 10

The MOMIS System- Architecture Francesco Guerra – DBGroup@unimo 11

The MOMIS System- Architecture Francesco Guerra – DBGroup@unimo 11

WRAPPING COMMON THESAURUS GENERATION GVV GENERATION SCHEMA DERIVED RELATIONSHIPS ODLI 3 LOCAL SCHEMA 1

WRAPPING COMMON THESAURUS GENERATION GVV GENERATION SCHEMA DERIVED RELATIONSHIPS ODLI 3 LOCAL SCHEMA 1 GLOBAL CLASSES LEXICON DERIVED RELATIONSHIPS Common Thesaurus … … ODLI 3 LOCAL SCHEMA N USER SUPPLIED RELATIONSHIPS MAPPING TABLES INFERRED RELATIONSHIPS MANUAL ANNOTATION SEMI-AUTOMATIC ANNOTATION SYNSET# SYNSET 4 SYNSET 2 SYNSET 1 Francesco Guerra – DBGroup@unimo 12

Local sources annotation n The integration designer has to manually choose the appropriate Word.

Local sources annotation n The integration designer has to manually choose the appropriate Word. Net (www. cogsci. princeton. edu/~wn/) meaning for each element of the conceptual schema provided by wrappers. n Motivations of the annotation: 1. Exploiting semantics associated with the names of the schemas/structures of the information sources 2. Having a well-known meaning for each term of the sources n The annotation phase is composed of two steps: 1. Word Form choice. The Word. Net morphologic processor aids the designer by suggesting a word form corresponding to the given term. 2. Meaning choice. The designer can choose to map an element on zero, one or more senses. Notice that the user can choose a sense among the existing ones in Word. Net and he can add new senses in the DB. Francesco Guerra – DBGroup@unimo 13

Global Virtual View annotation n The GVV has to be annotated to become ”exportable

Global Virtual View annotation n The GVV has to be annotated to become ”exportable knowledge”. n Annotating a GVV means to provide Global Classes with a name and with meanings. n By starting from annotations of local sources and mappings between the GVV and the local ontologies, we have developed a semi-automatic methodology to generate the annotations of the GVV. Francesco Guerra – DBGroup@unimo 14

GVV annotation Wordnet meanings essay#1 = an analytic or interpretive literary composition publication#2 =

GVV annotation Wordnet meanings essay#1 = an analytic or interpretive literary composition publication#2 = a copy of a printed work offered for distribution article#1 = nonfictional prose forming an independent part of a publication Annotated Local classes CS. Essay=<essay, {essay#1}> CS. Publication=<publication, {publication#2}> UNI. Article=<article, {article#1}> The CT relationships UNI. Article NT CS. Publication CS. Essay NT CS. Publication A Global class Global. Class 1 CS. Essay CS. Publication UNI. Article broadest meaning The annotated Global class meanings Global. Class 1 = <publication, {essay#1, publication#2, article#1}> name BLCGC={LC GC| y GC, (LC NT y ) v (y BT LC)} Francesco Guerra – DBGroup@unimo 15

Updating the GVV A created GVV can change: 1) By adding a new source

Updating the GVV A created GVV can change: 1) By adding a new source on the system 2) By updating an existing data source schema 3) By deleting a previously integrated source Adding a new source: two possible scenarios • Integration from scratch: the integration process is applied again; in this case only the Common Thesaurus of the previously GVV can be exploited. • Integration with the GVV: the process exploits the “automatically annotated” GVV and the Common Thesaurus. Francesco Guerra – DBGroup@unimo 16

Adding a new source Annotated GVV XML RDB New OODB New Common Thesaurus –

Adding a new source Annotated GVV XML RDB New OODB New Common Thesaurus – intra/inter schema relationships (only new sources) Sources’s Common Schema Thesaurus – lexicon relationships ODLI 3(GVV e new sources annotated) – relationships inserted by user – inferred relationships Cluster generation New GVV Globlal. Class 1 Globlal. Class 2 Mapping Global schema/ Local schema Globlal. Class 3 Francesco Guerra – DBGroup@unimo 17

Adding a new source o Three scenarions: n A new global class is composed

Adding a new source o Three scenarions: n A new global class is composed of only one old global class and one or more new local classes n A global class of the new integrated schema is composed of only new local classes A global class of the new integrated schema is composed of more than one global class of the old GVV and at least one local class of the new source n Francesco Guerra – DBGroup@unimo 18

GVV- integrated ontology n A GVV may be thought of as a domain ontology

GVV- integrated ontology n A GVV may be thought of as a domain ontology for the integrated sources; the usual approach in the Semantic Web is based on “a priori” existence of an ontology connected by means of semantic markups to the sources Semantic Web MOMIS Ontolog y Ontology Builder Francesco Guerra – DBGroup@unimo 19

GVV- integrated ontology o The MOMIS ontology is composed of the following components: Global

GVV- integrated ontology o The MOMIS ontology is composed of the following components: Global Virtual View o Mapping Rules o Integrity constraint rules o Intensional and extensional inter and intra-schema relationships (Common Thesaurus) o o We express the ontology by using the ODLI 3 language or an OWL file. Francesco Guerra – DBGroup@unimo 20

Using the MOMIS system o The MOMIS system was exploited: n n n To

Using the MOMIS system o The MOMIS system was exploited: n n n To create a virtual marketplace To support collaborative processes within the European Wink project To build an advanced semantic search engine within the European SEWASIE project (under development) Francesco Guerra – DBGroup@unimo 21

SEWASIE n SEWASIE (SEmantic Webs and Agent. S in Integrated Economies) is a research

SEWASIE n SEWASIE (SEmantic Webs and Agent. S in Integrated Economies) is a research project funded by EU on action line Semantic Web (May 2002/April 2005) n The consortium details n n n n Università degli Studi di Modena e Reggio Emilia (ITALY) CNA SERVIZI Modena s. c. a. r. l. (ITALY) Università degli Studi di Roma “La Sapienza” (ITALY) Rheinisch Westfaelische Technische Hochschule Aachen (GERMANY) Libera Università di Bolzano (ITALY) Thinking Networks AG (GERMANY) IBM Italia SPA (ITALY) Fraunhofer-Gesellschaft Institut Angewandte Informationstechnik (GERMANY) Francesco Guerra – DBGroup@unimo 22

SEWASIE Objectives The SEWASIE project aims to develop an advanced search engine enabling intelligent

SEWASIE Objectives The SEWASIE project aims to develop an advanced search engine enabling intelligent access to heterogeneous data sources on the web, via semantic enrichment, to provide the basis for structured web-based communication. The SEWASIE project pursues the following aims: To develop an agent-based secure, scalable and distributed system architecture for semantic search (based on ontologies) and for structured web-based communication. n. To develop a general framework for query management and information reconciliation based on a semantically enriched data and trusted agent structure. n To develop an information brokering component which includes methods for collecting, contextualizing and visualizing data. n. To provide the end-user with an efficient interface formulating queries using a graphical representation and for intelligent navigation through the semantically information space. n Francesco Guerra – DBGroup@unimo 23

The SEWASIE architecture § The SEWASIE system realizes a virtual network, the SEWASIE Virtual

The SEWASIE architecture § The SEWASIE system realizes a virtual network, the SEWASIE Virtual Network (SVN), whose nodes are SEWASIE Information Nodes (SINodes), multi-database mediatorbased systems, each including a Virtual Data Store, an Ontology Builder, and a Query Manager § Brokering Agents maintain the knowledge related to the SEWASIE Virtual Network and the user profiles. § In query solving phase, starting from a specified SINode, a Query Agent accesses other SINodes and thus collects partial answers. § To select SINodes useful to solve a query, a Query Agent interacts with a/several Brokering Agents. Francesco Guerra – DBGroup@unimo 24

int The erf use ace r lay er The SEWASIE architecture user Other users

int The erf use ace r lay er The SEWASIE architecture user Other users user User Profile Monitor Profiles User Interface Comm. Agent OLAP Reports Visualisation Monitoring Comm. Interface Query Interface Metadata Interface Communication Tool Monitoring Agent (MA) Query Results Brokering Agent (BA) Ontology Query. Agent SINode maps SEWASIE BA The Interconnectio inte n BA rminfrastructure lay ediar er ies Virtual Data Store Query Manager Metadata Repository Ontology Wrapper Semantic Enrichment RDBs Structured Databases Ontolog y builder Wrapper Semantic …Enrichment <XML> <DATA>. . . </DATA> Semi- Structured Databases Wrapper HTML XML Wrap HTML→XML T he i n fo rm a t i o n l a ye r BA Brokering Agent (BA) Ontology maps BA OLAP Tool <HTML>. . . Unstructured Text documents Francesco Guerra – DBGroup@unimo 25

Future Work – Ontology evolution within an SINode n Update of existing sources n

Future Work – Ontology evolution within an SINode n Update of existing sources n Deletion of previously integrated sources – Extending Word. Net n If a source description element has no correspondent concept in Word. Net, the designer may add a new meaning and proper relationships connecting them to existing meanings. – Multilingual functionalities n n SEWASIE multilingual technologies will allow users to share information and resources available all over the world, but also to preserve their original local qualities. Enrichment of multi-lingual lexicon ontology with the aid of statistical analysis techniques for multilingual text corpora (for example with techniques for the generation of multilingual dictionaries). Francesco Guerra – DBGroup@unimo 26

Partecipazione a progetti di ricerca di carattere nazionale ed europeo • progetto D 2

Partecipazione a progetti di ricerca di carattere nazionale ed europeo • progetto D 2 I (From Data to Information) finanziato dal MIUR: “Programma di ricerca scientifica di rilevante interesse nazionale (2000 -2001)”; • progetto “Agenti software e commercio elettronico: profili giuridici, tecnologici e psico-sociali”, finanziato dal MIUR “Programma di ricerca scientifica di rilevante interesse nazionale” (2001 -2002) • progetto “Tecnologie per arricchire e fornire accesso a contenuti” finanziato con il Fondo Speciale Innovazione 2000 (2001 -2002) • progetto SEWASIE (SEmantic Web and Agent. S in Integrated Economies) finanziato dalla Comunità Europea (2002 -2005) • progetto WINK (Web-linked Integration of Network-based Knowledge) finanziato dalla Comunità Europea (cluster EUTIST-AMI). (2002 -2003) Francesco Guerra – DBGroup@unimo 27

Pubblicazioni Riviste Internazionali (RI) e Capitoli in libri Internazionali (CLI) [RI 1] S. Bergamaschi,

Pubblicazioni Riviste Internazionali (RI) e Capitoli in libri Internazionali (CLI) [RI 1] S. Bergamaschi, G. Cabri, F. Guerra, L. Leonardi, M. Vincini, F. Zambonelli, Exploiting Agents to Support Information Integration, Special Issue of the International Journal on Cooperative Information Systems vol. 11(3 -4): 293 -314, 2002, ISSN 0218 -8430 [RI 2] I. Benetti, D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, An Information Integration Framework for E-Commerce, IEEE Intelligent Systems Magazine, Jan/Feb 2002, pp. 18 -25, [RI 3] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, Synthesizing an Integrated Ontology, IEEE Internet Computer, September-October 2003, 42 -51, ISSN 1089 -7801 [RI 4] I. Benetti, S. Bergamaschi, F. Guerra, M. Vincini, Soap-enabled web services for knowledge management to appear in Int. J. Web Engineering and Technology, Inder. Science Publishers. [RI 5] D. Beneventano, F. Guerra, S. Magnani, M. Vincini A Web Service based framework for the semantic mapping between product classification schemas, to appear in Journal of Electronic Commerce Research, ISSN 15266133. [CLI 1] D. Beneventano, S. Bergamaschi, J. Gelati, F. Guerra, M. Vincini: MIKS: an agent framework supporting information access and integration, Intelligent Information Agents - The Agent. Link Perspective, (editor S. Bergamaschi, M. Klusch, P. Edwards, P. Petta) - March 2003, Lecture Notes in Computer Science N. 2586 - Springer Verlag, pp. 22 -49 ISSN 0302 -9743 ISBN 3 -540 -00759 -8 28 Riviste Nazionali (RN) [RN 1] G. Gelati, F. Guerra, M. Vincini, Agents Supporting Information Integration: the MIKS Framework, AI*IA Notizie, Periodico dell’Associazione Italiana per l’Intelligenza Artificiale, Anno. XIV, Francesco Guerra – DBGroup@unimo N. 4, Dicembre 2001

Pubblicazioni Congressi Internazionali (CI) [CI 1] D. Beneventano, S. Bergamaschi, I. Benetti, A. Corni,

Pubblicazioni Congressi Internazionali (CI) [CI 1] D. Beneventano, S. Bergamaschi, I. Benetti, A. Corni, F. Guerra, G. Malvezzi, SI-Designer: a tool for intelligent integration of information, 34 th Annual Hawaii International Conference on System Sciences (HICSS-34), January 3 -6, 2001, Maui, Hawaii - Track 9. IEEE Computer Society [CI 2] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, The Momis approach to Information Integration, IEEE and AAAI International Conference on Enterprise Information Systems (ICEIS 01), Setùbal, Portugal, 7 -10 July 2001, pp. 194 -198, ISBN 972 -98050 -2 -4 [CI 3] I. Benetti, D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, SI-Designer: an Integration Framework for E-Commerce, IJCAI*01 Workshop on E-Business & the Intelligent Web Seattle, USA * August 5 2001 [CI 4] S. Bergamaschi, G. Cabri, F. Guerra, L. Leonardi, M. Vincini, F. Zambonelli, Supporting information integration with autonomous agents, Fifth International Workshop CIA-2001 on COOPERATIVE INFORMATION AGENTS September 6 - 8, 2001 Modena, Italy pp, 88 -99. [CI 5] D. Calvanese, S. Castano, F. Guerra, D. Lembo, M. Melchiori, G. Terracina, D. Ursino, M. Vincini, Towards a comprehensive methodological framework for integration , 8 th International Workshop on Knowledge Representation meets Databases (KRDB-2001), Roma, Italy, 2001 [CI 6] S. Bergamaschi, F. Guerra, M. Vincini, A Data Integration Framework for E-commerce product classification, 1 st International Semantic Web Conference (ISWC 2002), Sardegna, Italy, 912 June 2002, LNCS 2342 Springer 2002, ISBN 3 -540 -43760 -6, pp. 379 -393, ISBN 3 -540 -43760 -6 Francesco Guerra – DBGroup@unimo 29

Pubblicazioni [CI 7] S. Bergamaschi, F. Guerra, Peer to Peer Paradigm for a Semantic

Pubblicazioni [CI 7] S. Bergamaschi, F. Guerra, Peer to Peer Paradigm for a Semantic Search Engine, in proceedings of the International Workshop on Agents and Peer-to-Peer Computing, Bologna, 15 July 2002, LNCS 2530, Springer ISBN 3 -540 -40538 -0 [CI 8] S. Bergamaschi, F. Guerra, M. Vincini, Product Classification Integration for E-Commerce, Second International Workshop on Electronic Business Hubs - WEBH 2002 in conjunction with DEXA 2002, September 2 -6 2002, Aix En Provence, France, published by IEEE Computer Society, Los Alanitos (CA), ISBN 0 -7695 -1668 -8, pp. 861 -867 [CI 9] D. Beneventano, S. Bergamaschi, S. Castano, V. De Antonellis, A. Ferrara, F. Guerra, F. Mandreoli, G. Ornetti, M. Vincini, Semantic Integration and Query Optimization of Heterogeneous Data Sources, 1 st Int. l Workshop on Efficient Web-based Information Systems (EWIS), 2002, Montpellier, France, pp. 154 -165. [CI 10] S. Bergamaschi, F. Guerra, M. Vincini, A peer-to-peer information system for the semantic web, in proceedings of the International Workshop on Agents and Peer-to-Peer Computing, in AAMAS 2003 Melbourne, Australia, July 14, 2003 [CI 11] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini: Building an Ontology with MOMIS, in proceedings of the Semantic Integration Workshop within the Second International Semantic Web Conference, October 20, 2003 Sundial Resort, Sanibel Island, Florida, USA. [CI 12] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, Building an integrated Ontology within SEWASIE system, in proceedings of the First International Workshop on Semantic Web and Databases, Co-located with VLDB 2003 Berlin, Germany, (2003) [CI 13] S. Bergamaschi, G. Gelati, F. Guerra, M. Vincini, WINK: a Web-based Enterprise System for Collaborative Project Management in Virtual Enterprises, 4 th International Conference on Web Information Systems Engineering, Roma Italy, 10 -12 December 2003 Francesco Guerra – DBGroup@unimo 30

Pubblicazioni Congressi Nazionali (CN) [CN 1] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini,

Pubblicazioni Congressi Nazionali (CN) [CN 1] D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, Exploiting extensional knowledge for query reformulation and object fusion in a data integration system, Proceedings of SEBD 2001, Venezia, 27 -29 June, 2001, pp. 257 -271 [CN 2] G. Gelati, F. Guerra, M. Vincini, Agents Supporting Information Integration: the MIKS Framework, Proc. AIIA and TABOO Workshop: From Object to Agents, Pitagora Editrice, Bologna, ISBN 88 -371 -1272 -6, September 2001 [CN 3] D. Beneventano, S. Bergamaschi, D. Bianco, F. Guerra, M. Vincini, SI-Web: a Web based interface for the MOMIS project, Proceedings of SEBD 2002, 19 -22 June, 2002, pp. 407 -411 [CN 4] D. Beneventano, S. Bergamaschi, D. Gazzotti, G. Gelati, F. Guerra, M. Vincini, The WINK Project for Virtual Enterprise Networking and Integration, Proceedings of SEBD 2002, pp. 283 -290 [CN 5] D. Beneventano, S. Bergamaschi, M. Felice, D. Gazzotti, G. Gelati, F. Guerra, M. Vincini, . An Agent framework for Supporting the MIKS Integration, Proc. AIIA and TABOO Workshop: From Object to Agents, 18 -19 Novembre 2002, Milano Università Bicocca [CN 6] D. Beneventano, S. Bergamaschi, A. Fergnani, F. Guerra, M. Vincini, D. Montanari, A Peer-to. Peer Agent-Based Semantic Search Engine, Proceedings of SEBD 2003, Cetraro (CS), 2003, pp. 283290 [CN 7] S. Bergamaschi, G. Gelati, F. Guerra, M. Vincini, A Experiencing AUML for the WINK Milti. Agent System, Proc. AIIA and TABOO Workshop: From Object to Agents, 10 -11 Settembre 2003, Villasimius (CA) Francesco Guerra – DBGroup@unimo 31

Global Instance Computation o For the definition of a Global Class we have to

Global Instance Computation o For the definition of a Global Class we have to define the following elements: n n n Mapping Table: define the mapping between the global class attributes and the local classes attributes Join condition: we assume that there is a Join Condition between each pair of overlapping relations to identify tuples corresponding to the same object and fuse them Full disjunction: the GC contains a unique tuple containing a unique tuple resulting from the merge of all different tuples representing the same real world object. Francesco Guerra – DBGroup@unimo 32

Global Instance Computation S(l 1)= (firstn, lastn, year, e_mail) S(l 2)= (name, e_mail, dept_code,

Global Instance Computation S(l 1)= (firstn, lastn, year, e_mail) S(l 2)= (name, e_mail, dept_code, s_code) o Two functions: n n Global function: renaming the attributes of the local classes into attributes of the global class Local Function: converting a tuple of elements of a local classby suitable functions such as string concatenations …. Francesco Guerra – DBGroup@unimo 33

Global Instance Computation o Semantic Homogeneity property condition Join Attribute Full Disjunction Francesco Guerra

Global Instance Computation o Semantic Homogeneity property condition Join Attribute Full Disjunction Francesco Guerra – DBGroup@unimo 34

Global Instance Computation o Semantic Homogeneity property condition not verified: n Resolution functions: o

Global Instance Computation o Semantic Homogeneity property condition not verified: n Resolution functions: o o o Random Priority User defined function Francesco Guerra – DBGroup@unimo 35

Example University source (relational) Department(dept_code, dept_name, budget) Research_Staff(name, e_mail, dept-code, s_code) FK dept_code REF

Example University source (relational) Department(dept_code, dept_name, budget) Research_Staff(name, e_mail, dept-code, s_code) FK dept_code REF Department, s_code REF Section School_Member(name, school, year, e_mail) Section(s_code, section_name, length, room_code) FK room_code REF Department, s_code REF Room(room_code, seats_number, notes) Tax_Position source (XML) <!ELEMENT List. Of. Student (Student*)> <!ELEMENT Student (name, s_code, school_name, e_mail, tax_fee)> <!ELEMENT name (#PCDATA)> Francesco Guerra – DBGroup@unimo 39

Example Computer_Science source (object) CS_Person(first_name, last_name) Professor: CS_Person(belongs_to: Division, rank) Student: CS_Person(year, takes: set<Course>,

Example Computer_Science source (object) CS_Person(first_name, last_name) Professor: CS_Person(belongs_to: Division, rank) Student: CS_Person(year, takes: set<Course>, rank, e_mail) Division(description, address: Location) Location(city, street, number, country) Course(course_name, tought_by: Professor) Francesco Guerra – DBGroup@unimo 40

Source Acquisition Module Francesco Guerra – DBGroup@unimo 41

Source Acquisition Module Francesco Guerra – DBGroup@unimo 41

Common Thesaurus (Domain Ontology) Set of terminological relationships between classes and attributes names (terms)

Common Thesaurus (Domain Ontology) Set of terminological relationships between classes and attributes names (terms) expresses both intra-schema and inter-schema knowledge Relationships added to Common Thesaurus: (1) schema derived (2) lexicon derived (3) designer supplied (4) inferred exploiting ODB-Tools capabilities Francesco Guerra – DBGroup@unimo 42

Schema-derived relationships Terminological and extensional intra-schema relationships Ø RT relationships derived from foreign keys

Schema-derived relationships Terminological and extensional intra-schema relationships Ø RT relationships derived from foreign keys in a relational schema UNI. Section RT UNI. Department Ø BT/NT relationships derived from inheritance relationships in a object-oriented schema or integrity constraints in relational schema CS. Student NT CS. CS_Person CS. Professor NT CS. CS_Person Francesco Guerra – DBGroup@unimo 43

Schema Derived Relationships Francesco Guerra – DBGroup@unimo 44

Schema Derived Relationships Francesco Guerra – DBGroup@unimo 44

Lexicon-derived relationships Extracted from Word. Net lexical database (Princeton Un. ) 129625 lemma organized

Lexicon-derived relationships Extracted from Word. Net lexical database (Princeton Un. ) 129625 lemma organized in 99759 synonym set (synset) Synonymy Polysemy Tax_position_xml. Student. name SYN University. School_member. name CS. Professor NT CS. CS_Person Francesco Guerra – DBGroup@unimo 45

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 46

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 46

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 47

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 47

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 48

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 48

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 49

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 49

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 50

Lexicon Derived Relationships Francesco Guerra – DBGroup@unimo 50

Inferred relationships Exploiting Description Logics techniques (ODB-Tools system) a new set of terminological relationships

Inferred relationships Exploiting Description Logics techniques (ODB-Tools system) a new set of terminological relationships are inferred University. Research_Staff RT CS. Course Francesco Guerra – DBGroup@unimo 51

Common Thesaurus Francesco Guerra – DBGroup@unimo 52

Common Thesaurus Francesco Guerra – DBGroup@unimo 52

Mediator global schema Global schema generation (interaction with ARTEMIS module): Affinity calculation Cluster generation

Mediator global schema Global schema generation (interaction with ARTEMIS module): Affinity calculation Cluster generation Global attributes and mapping table generation A global class gci is generated for each cluster Cli SI-Designer builds the attributes set to be associated to the cluster: –Union of the attributes of all classes belonging to the cluster –Fusion of “similar attributes” Francesco Guerra – DBGroup@unimo 53

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 54

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 54

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 55

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 55

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 56

Affinity tree and Cluster Francesco Guerra – DBGroup@unimo 56

Mapping table example – each global class includes mapping rules between global and local

Mapping table example – each global class includes mapping rules between global and local attributes (and/or relationships, default/null values) – a mapping is generated for each global class gci Francesco Guerra – DBGroup@unimo 57

Mapping table Francesco Guerra – DBGroup@unimo 58

Mapping table Francesco Guerra – DBGroup@unimo 58

Mapping table Francesco Guerra – DBGroup@unimo 59

Mapping table Francesco Guerra – DBGroup@unimo 59