Inexact Matching of Ontology Graphs Using Expectation Maximization

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies –

Overview of Our Approach • Exploit structural and lexical similarity – Graph structure –

Edge-Labeled Ontology Graphs Reification Edge-labeled graph Reified bipartite graph (Hayes&Gutierrez 04) • Distinct edge

Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate

Graph Matching Using GEM • Treat the match assignments as the model – Mixture

E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity

String Similarity Measures • String distance metrics (Cohen et al. 03): – – Exact

Model Sampling • Model space is large: – Random sampling from the model space

Simple Example Q(M 1’|M 0) = 52. 56 M 1 ’ Q(M 1|M 0)

Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M

Performance Weapons ontologies from the I 3 CON repository Matching heuristics speed up the

Recall = 77. 8% Precision = 63. 6% Lexical Match

Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM

Slides: 18

Download presentation

Inexact Matching of Ontology Graphs Using Expectation. Maximization Prashant Doshi, Christopher Thomas LSDIS Lab, Dept. of Computer Science, University of Georgia

Motivating Example Weapons ontology 1 (model) Candidate Ontology Match Weapons ontology 2 (data)

Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies – Essential step in ontology engineering Types of Match • Exact matches – Isomorphisms with edge consistency – Bijection – E. g. GLUE (Doan 02), Bayes. OWL (Ding 05), FALCON-AO(Hu 05), OMEN(Mitra 05) • Inexact matches – Homomorphisms with edge consistency – Many-one or Many-Many – E. g. This approach (Many-one)

Overview of Our Approach • Exploit structural and lexical similarity – Graph structure – Node and edge labels Match Quality • Formulation within the iterative Expectation. Maximization (EM) scheme May converge to local maxima Space of Matches • Suitable for taxonomies but can be used for edgelabeled ontologies using reification

Edge-Labeled Ontology Graphs Reification Edge-labeled graph Reified bipartite graph (Hayes&Gutierrez 04) • Distinct edge label is a node • Dummy nodes are introduced to preserve the relations.

Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate of an underlying model from observed data (X) in the presence of missing values (Y) • E-step – Evaluate the likelihood of different models (Mn+1) given a seed model (Mn) M-step – Choose the best model and use it in the next iteration Generalized M-step – Select a model that is better than the current one

Graph Matching Using GEM • Treat the match assignments as the model – Mixture model • Given a data node, the correspondence with some model node is a hidden variable

E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity between the node labels • We use the generalized M-step

String Similarity Measures • String distance metrics (Cohen et al. 03): – – Exact string match Substring match N-Gram score Sequence alignment score (Smith&Waterman 81) S 1: Modern Naval Ship 000000 11111 0001111 S 2: Naval Warship

Model Sampling • Model space is large: – Random sampling from the model space – Combine sampling with intuitive heuristics Mn+1 Map-Parent Heuristic Mn Mn+1

Simple Example Q(M 1’|M 0) = 52. 56 M 1 ’ Q(M 1|M 0) = 51. 57 M 0 M 1

Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M step, if we generate K samples within a sample set, the worst case complexity is O(K[|Vd||Vm|]2)

Performance Weapons ontologies from the I 3 CON repository Matching heuristics speed up the converge

Recall = 77. 8% Precision = 63. 6% Lexical Match

Recall = 100% Precision = 90% GEM Match

Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM – Considers structural and label similarity – Produces the most likely match • Many-one correspondence allows mapping between clusters of different semantic granularity • Computational complexity is a issue – More efficient ways to cover the model space

Thank you Questions