Ontology Resource x Matching Mapping x Schema Instance

  • Slides: 40
Download presentation
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} : : Components of

{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} : : Components of the same challenge? Invited Talk, International Workshop on Ontology Matching collocated with the 5 th International Semantic Web Conference ISWC-2006, November 5, 2006, Athens GA Professor Amit Sheth Special Thanks: Meena Nagarajan Acknowledgment: Sem. Dis project, funded by NSF

Information System needs and Ontology Matching goals Semantics Sem. Dis, ISIS Generation III (information

Information System needs and Ontology Matching goals Semantics Sem. Dis, ISIS Generation III (information brokering) 1997. . . Generation II (mediators) 1990 s Generation I (federated DB/ multidatabases) 1980 s (Ontology, Context, Relationships, KB) Video. Anywhere Semantic Web, some DL-II projects, Info. Quilt Semagix SCORE, Applied Semantics OBSERVER Metadata Visual. Harness Info. Harness Data Mermaid DDTS (Domain model) Info. Sleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic, TSIMMIS, Harvest, RUFUS, . . . (Schema, “semantic data modeling) Multibase, MRDSM, ADDS, IISS, Omnibase, . . .

Information systems - From mediators to information brokering • Mediators between heterogeneous information sources

Information systems - From mediators to information brokering • Mediators between heterogeneous information sources – Info. Harness, Visual. Harness, Info. Sleuth, SIMS, Garlic etc. Circa 1992 -1996. End User Web Browsers Internet IH Server IH administrative tools End User Web Browsers IH Clients . . . Repository 1 Repository m Metadata Database (Metabase) (Oracle) Image Visual. Harness Architecture Raw Data Audio Text Video Information Resources

Information systems - From mediators to information brokers • Information brokers – Info. Quilt,

Information systems - From mediators to information brokers • Information brokers – Info. Quilt, OBSERVER etc. INFORMATION CONSUMERS Corporations People Programs Universities User Query Government User Query Domain Specific Ontologies User Query INFORMATION BROKERING Information System Circa 1996 -2000 Data Repository Information System Newswires Corporations Research Labs Universities INFORMATION PROVIDERS

Need for querying across multiple ontologies OBSERVER. . . IRM Ontologies User Mappings/ Ontology

Need for querying across multiple ontologies OBSERVER. . . IRM Ontologies User Mappings/ Ontology Server Interontologies Relationships Mappings/ Ontology Server . . . Repositories Circa 1994, 1996 -2002 Query Processor Ontologies Mappings/ Ontology Server . . . Repositories Query Processor Ontologies

Ontology Matching – goals • Goals of ontology matching (and mapping, or integration) –

Ontology Matching – goals • Goals of ontology matching (and mapping, or integration) – Shallow analysis to identify dependencies for integration – Deeper analysis to create mappings for query based transformations / integration – Integrate schemas to create a global schema – Integrate instance bases Sheth, Review of a real world experience in database schema integration (Bellcore, ca. 1993)

Ontology Matching – changing notions • Given the distributed nature of modeling domains and

Ontology Matching – changing notions • Given the distributed nature of modeling domains and metadata, the need for matching advanced to Information Integration • Now – Query processing not limited to multiple databases or ontologies, but multiple domains and sources of information – Exploiting structured, semi-structured and unstructured data sources, multi-model Web sources

The process of Ontology Matching • Different for purposes of merging / aligning ontologies

The process of Ontology Matching • Different for purposes of merging / aligning ontologies – Type of relationships that suffice to be discovered are limited to equivalence / inclusion / disjointness / overlap mappings • Different for purposes of information integration to analytics to discovery – Need for discovering more Complex mappings • Named relationships / associations • Graph based / numerical mappings

Top down and bottom up view to ontology matching • Top Down: schema +

Top down and bottom up view to ontology matching • Top Down: schema + instance integration to provide information integration

Top down and bottom up view to ontology matching • Bottom up: exploit external

Top down and bottom up view to ontology matching • Bottom up: exploit external data sources to drive schema matching

A step back DB vs. Ontology - Fundamental differences

A step back DB vs. Ontology - Fundamental differences

Schema integration goals – DB vs. Ontology • DB schema integration goal – “Defining

Schema integration goals – DB vs. Ontology • DB schema integration goal – “Defining an integrated view of the data for all applications using the data. ” • Ontology schema integration goal – “Defining an agreement between multiple ontology schemas modeled for the same domain. ”

Goals are different because of differences in: • The modeling paradigms – A database

Goals are different because of differences in: • The modeling paradigms – A database schema is a model for the data that one more applications intend to use. – An ontology is a model of knowledge for a bounded region of interest (also known as a domain) • Data vs. Knowledge : A DB instance base is not the same as an ontology instance base – A database models data to be used by one or more applications – An ontology models knowledge about a domain, independent of the application

Modeling Database vs. Ontology schemas - Fundamental differences Axis of comparison Modeling perspective Structure

Modeling Database vs. Ontology schemas - Fundamental differences Axis of comparison Modeling perspective Structure vs. Semantics Database schemas Ontology schemas Intended to model a data being used domain by one or more applications Emphasis while modeling is on structure of the tables Emphasis while modeling is on the semantics of the domain – emphasis on relationships, also facts/knowledge/ground truth

Agreement Limited to a syntactic agreement between applications using the data Symbolizes agreement of

Agreement Limited to a syntactic agreement between applications using the data Symbolizes agreement of the modeling of a domain possibly used by applications in varying contexts. In both cases however, affects the schema is only an Choice of modeling the possible Instance Limited expressivity in More expressive abstraction of the real world; space of instance heterogeneities and paradigm metadata capturing level modeling real power/semantics lies at the modeling /the metadata dueprocess to static of matching. therefore the expressiveness schemas instance level. Context of modeling Well defined by applications using the data Modeling of a domain irrespective of applications

The space of heterogeneities in DB schema integration • Conflicts/Heterogeneities in DB schema integration

The space of heterogeneities in DB schema integration • Conflicts/Heterogeneities in DB schema integration – Model / representation : relational vs. network vs. hierarchical models – Structural / schematic : • • Domain Incompatibilities Entity Definition Incompatibilities Data Value Incompatibilities Abstraction level Incompatibilities • Largely syntactic and structural; relatively few semantic conflicts Sheth/Kashyap 1992, Kim/Seo 1993, Kashyap/Sheth 1996)

The space of heterogeneities in ontology schema integration • Conflicts/Heterogeneities in ontology schema integration

The space of heterogeneities in ontology schema integration • Conflicts/Heterogeneities in ontology schema integration – Significant conflicts in perception of a domain – semantic conflicts – Other heterogeneities are similar to those in the DB world • Model / representation : OWL/RDF ; topic maps etc. • Structural : modeling as an entity vs. an attribute/property; generalization vs. abstraction etc. • Largely semantic conflicts; comparable syntactic conflicts

Key Observations • There are significant philosophical differences in how a DB schema and

Key Observations • There are significant philosophical differences in how a DB schema and an Ontology schema are modeled • In spite of these distinctions, many schema matching techniques overlap significantly. • Have we advanced the state of art in ontology schema matching?

Schema Integration – DB vs. Ontology Have we advanced the state of art ?

Schema Integration – DB vs. Ontology Have we advanced the state of art ?

Schema Integration – techniques used Schema matching techniques Schema level • Syntactic – Linguistic:

Schema Integration – techniques used Schema matching techniques Schema level • Syntactic – Linguistic: Matching names, descriptions, namespaces etc. – Constraint-based: Constraint matches on data types, value ranges, uniqueness, cardinalities etc. Information exploited DB • Matching • Table and column level names and constraints Ontology Matching class, properties/ relationship, attribute level names and constraints

Schema Integration – techniques used Schema matching techniques Schema level • Structural – Constraint-based:

Schema Integration – techniques used Schema matching techniques Schema level • Structural – Constraint-based: Tree / Graph structure matching Information exploited DB Ontology • Matching structures class of hierarchies relational and tables structures

Schema Integration – techniques used Schema matching techniques Instance level Information exploited DB Ontology

Schema Integration – techniques used Schema matching techniques Instance level Information exploited DB Ontology • Linguistic – IR techniques, word frequencies, key terms, combination of key terms etc. • Constraint based – Numerical value patterns, ranges useful for recognizing phone numbers etc. • Hybrid approaches use a combination of all techniques

Discovered semantic relationships • State of the art – in DBs and Ontologies –

Discovered semantic relationships • State of the art – in DBs and Ontologies – Relationships with set semantics: overlap / disjointness / exclusion / equivalence / subsumption – Their logical encodings are what they mean • Of more interest is discovering arbitrary named relationships – Relationships such as works_for or causes have “real-world” semantics. Their encoding in first order logic lacks semantic grounding. • Matching and mapping closely tied. Ability to capture complex mapping (e. g. , semantic proximity) puts significantly different demand on matching

Key Observation • DB and Ontology schema matching techniques overlap significantly – Not much

Key Observation • DB and Ontology schema matching techniques overlap significantly – Not much advancement since DB schema integration efforts • Ontologies formalize the semantics of a domain, but matching is still primarily syntactic / structural. – The semantics of ‘named relationships’ is largely unexploited • The real semantics lies in the relationships connecting entities – Modeled as first class objects in Ontologies – In DB, they are not explicit and have to be inferred

(Complex) named relationships and Ontology Matching

(Complex) named relationships and Ontology Matching

(Complex) named relationships example ENVIRON. VOLCANO BUILDING LOCATION ASH RAIN DESTROYS PYROCLASTIC FLOW WEATHER

(Complex) named relationships example ENVIRON. VOLCANO BUILDING LOCATION ASH RAIN DESTROYS PYROCLASTIC FLOW WEATHER PEOPLE COOLS TEMP DESTROYS KILLS PLANT

Discovering such (complex) named relationships • Matching techniques have exhausted Schema + Instance properties

Discovering such (complex) named relationships • Matching techniques have exhausted Schema + Instance properties • Ontology modeling de couples schema + instance base – Tremendous opportunity to exploit knowledge present outside the ontology knowledge base (External structured, semi-structured and unstructured data sources)

Knowledge discovery and validation Relevant docs Query and update Prediction of - Pathways -

Knowledge discovery and validation Relevant docs Query and update Prediction of - Pathways - Symptoms of Diseases - Other complex relationship Pub. Med etc. DBs

A Vision for Ontology Matching : SIMPLE TO COMPLEX MATCHES Discovering simple to complex

A Vision for Ontology Matching : SIMPLE TO COMPLEX MATCHES Discovering simple to complex matches – from schema, instances and corpus Ontologies Possible identifiable matches: equivalence / inclusion / overlap / disjointness Semantic metadata Possible to identify more complex relationships from the corpus. Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from the market. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency. Heterogeneous data

Corpus based schema matching

Corpus based schema matching

The Intuition Biologically active substance UMLS affects complicates causes Lipid Disease or Syndrome affects

The Intuition Biologically active substance UMLS affects complicates causes Lipid Disease or Syndrome affects instance_of Fish Oils ? ? ? ? instance_of Raynaud’s Disease Me. SH 9284 documents Pub. Med 5 documents 4733 documents

The Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite

The Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities

Key Observation • What is interesting is not the entity “estrogen” or “endometrium” Current

Key Observation • What is interesting is not the entity “estrogen” or “endometrium” Current KR frameworks do not model this. Capturing this might affect the way we • The real knowledge lies the complex think of matching andinmapping. and modified entities “an excessive endogeneous stimulation by estrogen”

Converting candidate relationships to ontology matches • Linguistic and statistical challenges: – Variations of

Converting candidate relationships to ontology matches • Linguistic and statistical challenges: – Variations of entities, relationships and associations • Translating instance level findings to the schema level – GOING FROM several discovered relationships like “Deficiency in migraine causes Migraine” TO “substance X causes condition Y”

Discovery vs. Validation of relationships – two sides of the coin • Discovering complex

Discovery vs. Validation of relationships – two sides of the coin • Discovering complex relationships from text is a hard problem – Natural Language challenges (not all sentences are well formed) • Validating complex relationships / hypothesis is relatively simpler

Corpus based Hypothesis validation affected. By Does magnesium alleviate effects of migraine in patients?

Corpus based Hypothesis validation affected. By Does magnesium alleviate effects of migraine in patients? Magnesium Migraine isa inhibit Stress One possible hypothesized connection between magnesium. Calcium and migraine…. Patient Channel Blockers Complex Query Pub. Med Supporting Document sets retrieved

From matching to mappings – several challenges • Mappings are not always simple mathematical

From matching to mappings – several challenges • Mappings are not always simple mathematical / string transformations • Examples of complex mappings Number of earthquakes with magnitude > 7 almost constant. – Associations / paths between classes So if at all, then nuclear tests – Graph based / form fitting functions only cause earthquakes with author_of E 2: Paper author_of E 1: Reviewer E 6: Person magnitude <7 author_of E 7: Submission E 4: Paper knows author_of E 3: Person knows E 5: Person

The take home message

The take home message

A world beyond simple matches and mappings • The distinction between schema and instances

A world beyond simple matches and mappings • The distinction between schema and instances is slowly disappearing Need to go beyond • Integrating new and external data sources, well-mannered schemas and mining and analyzing them is gaining importance. knowledge representations; and relatively simpler • Tremendous opportunities andmappings challenges in using more information than what is modeled in a schema and captured in an instance base.

For more information LSDIS Lab: http: //lsdis. cs. uga. edu Kno. e. sis Center:

For more information LSDIS Lab: http: //lsdis. cs. uga. edu Kno. e. sis Center: http: //www. knoesis. org