Inexact Matching of Ontology Graphs Using Expectation Maximization

  • Slides: 18
Download presentation
Inexact Matching of Ontology Graphs Using Expectation. Maximization Prashant Doshi, Christopher Thomas LSDIS Lab,

Inexact Matching of Ontology Graphs Using Expectation. Maximization Prashant Doshi, Christopher Thomas LSDIS Lab, Dept. of Computer Science, University of Georgia

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies –

Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies – Essential step in ontology engineering Types of Match • Exact matches – Isomorphisms with edge consistency – Bijection – E. g. GLUE (Doan 02), Bayes. OWL (Ding 05), FALCON-AO(Hu 05), OMEN(Mitra 05) • Inexact matches – Homomorphisms with edge consistency – Many-one or Many-Many – E. g. This approach (Many-one)

Overview of Our Approach • Exploit structural and lexical similarity – Graph structure –

Overview of Our Approach • Exploit structural and lexical similarity – Graph structure – Node and edge labels Match Quality • Formulation within the iterative Expectation. Maximization (EM) scheme May converge to local maxima Space of Matches • Suitable for taxonomies but can be used for edgelabeled ontologies using reification

Edge-Labeled Ontology Graphs Reification Edge-labeled graph Reified bipartite graph (Hayes&Gutierrez 04) • Distinct edge

Edge-Labeled Ontology Graphs Reification Edge-labeled graph Reified bipartite graph (Hayes&Gutierrez 04) • Distinct edge label is a node • Dummy nodes are introduced to preserve the relations.

Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate

Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate of an underlying model from observed data (X) in the presence of missing values (Y) • E-step – Evaluate the likelihood of different models (Mn+1) given a seed model (Mn) M-step – Choose the best model and use it in the next iteration Generalized M-step – Select a model that is better than the current one

Graph Matching Using GEM • Treat the match assignments as the model – Mixture

Graph Matching Using GEM • Treat the match assignments as the model – Mixture model • Given a data node, the correspondence with some model node is a hidden variable

E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity

E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity between the node labels • We use the generalized M-step

String Similarity Measures • String distance metrics (Cohen et al. 03): – – Exact

String Similarity Measures • String distance metrics (Cohen et al. 03): – – Exact string match Substring match N-Gram score Sequence alignment score (Smith&Waterman 81) S 1: Modern Naval Ship 000000 11111 0001111 S 2: Naval Warship

Model Sampling • Model space is large: – Random sampling from the model space

Model Sampling • Model space is large: – Random sampling from the model space – Combine sampling with intuitive heuristics Mn+1 Map-Parent Heuristic Mn Mn+1

Simple Example Q(M 1’|M 0) = 52. 56 M 1 ’ Q(M 1|M 0)

Simple Example Q(M 1’|M 0) = 52. 56 M 1 ’ Q(M 1|M 0) = 51. 57 M 0 M 1

Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M

Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M step, if we generate K samples within a sample set, the worst case complexity is O(K[|Vd||Vm|]2)

Performance Weapons ontologies from the I 3 CON repository Matching heuristics speed up the

Performance Weapons ontologies from the I 3 CON repository Matching heuristics speed up the converge

Recall = 77. 8% Precision = 63. 6% Lexical Match

Recall = 77. 8% Precision = 63. 6% Lexical Match

Recall = 100% Precision = 90% GEM Match

Recall = 100% Precision = 90% GEM Match

Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM

Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM – Considers structural and label similarity – Produces the most likely match • Many-one correspondence allows mapping between clusters of different semantic granularity • Computational complexity is a issue – More efficient ways to cover the model space

Thank you Questions

Thank you Questions