Weighted PartonomyTaxonomy Trees with Local Similarity Measures for

  • Slides: 32
Download presentation
Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making Lu Yang, Marcel

Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making Lu Yang, Marcel Ball, Virendra C. Bhavsar and Harold Boley BASe. WEB, May 8, 2005 1

Outline l l l Introduction Motivation Partonomy Similarity Algorithm – – l Node Label

Outline l l l Introduction Motivation Partonomy Similarity Algorithm – – l Node Label Similarity – – l 2 Tree representation Tree simplicity Partonomy similarity Experimental Results Inner-node similarity Leaf-node similarity Conclusion

Introduction l Buyer-Seller matching in e-business, e-learning User Web Browser Main Server User Info

Introduction l Buyer-Seller matching in e-business, e-learning User Web Browser Main Server User Info User Profiles Agents User Agents Cafe-1 Matcher 1 3 To other sites (network) Cafe-n Matcher n A multi-agent system

Introduction l An e-Learning scenario Learner 1 Learner 2 Course Provider 1 Cafe Course

Introduction l An e-Learning scenario Learner 1 Learner 2 Course Provider 1 Cafe Course Provider 2 Matcher Learner n Course Provider m H. Boley, V. C. Bhavsar, D. Hirtle, A. Singh, Z. Sun and L. Yang, A match-making system for learners and learning Objects. Learning & Leading with Technology, International Society for Technology in Education, Eugene, OR, 2005 (to appear). 4

Motivation l Metadata for buyers and sellers – – l 5 Keywords/keyphrases Tree similarity

Motivation l Metadata for buyers and sellers – – l 5 Keywords/keyphrases Tree similarity

Tree representation l Characteristics of our trees – – – Node-labled, arc-labled and arc-weighted

Tree representation l Characteristics of our trees – – – Node-labled, arc-labled and arc-weighted Arcs are labled in lexicographical order Weights sum to 1 Car Make 0. 3 Year Model 0. 5 0. 2 Ford 6 Explorer 2002

Tree representation – Serialization of trees Weighted Object-Oriented Rule. ML – XML attributes for

Tree representation – Serialization of trees Weighted Object-Oriented Rule. ML – XML attributes for arc weights and subelements for arc labels – <Cterm> <Ctor>Car</Ctor> <slot weight="0. 3"><Ind>Make</Ind><Ind>Ford</Ind></slot> <slot weight="0. 2"><Ind>Model</Ind><Ind>Explorer</Ind></slot> <slot weight="0. 5"><Ind>Year</Ind><Ind>2002</Ind></slot> </Cterm> 7 Tree serialization in WOO Rule. ML

Tree simplicity A (0. 9) tree simplicity: 0. 0563 0. 2 D – –

Tree simplicity A (0. 9) tree simplicity: 0. 0563 0. 2 D – – 8 b a c B d 0. 8 E e 0. 1 F 0. 7 C (0. 45) f 0. 9 G (0. 225) The deeper the leaf node, the less its contribution to the tree simplicity ü Depth degradation index (0. 9) ü Depth degradation factor (0. 5) Reciprocal of tree breadth L. Yang, B. Sarker, V. C. Bhavsar and H. Boley, A weighted-tree simplicity algorithm for similarity matching of partial product descriptions (submitted for publication).

Tree simplicity – Computation if T is a leaf node, otherwise. Š(T): the simplicity

Tree simplicity – Computation if T is a leaf node, otherwise. Š(T): the simplicity value of a single tree T DI and DF: depth degradation index and depth degradation factor d: depth of a leaf node m: root node degree of tree T that is not a leaf wj: arc weight of the jth arc below the root node of tree T Tj: subtree below the jth arc with arc weight wj 9

Partonomy similarity – Simple trees 1 tree t Ford 10 tree t´ Car Make

Partonomy similarity – Simple trees 1 tree t Ford 10 tree t´ Car Make 0. 3 1 0 Inner nodes Model 0. 7 Car (House) Model 0. 7 Make 0. 3 Mustang Leaf nodes Escape Ford 0

Partonomy similarity – Complex trees t t´ lom educational 0. 3334 edu-set general language

Partonomy similarity – Complex trees t t´ lom educational 0. 3334 edu-set general language title en 0. 5 general technical format 0. 5 tec-set platform 0. 5 0. 7 gen-set language title format en Introduction HTML Win. XP to Oracle technical 0. 3333 gen-set lom 0. 2 tec-set platform 0. 8 0. 1 0. 9 Basic Oracle * Win. XP * : Don’t Care (si (wi + w'i)/2) 11 A(si) ≥ si (A(si)(wi + w'i)/2)

Partonomy similarity – Main functions l Three main functions (Relfun) – Treesim(t, t'): Recursively

Partonomy similarity – Main functions l Three main functions (Relfun) – Treesim(t, t'): Recursively compares any (unordered) pair of trees Paremeters N and i – Treemap(l, l'): Recursively maps two lists, l and l', of labeled and weighted arcs: descends into identical– labeled subtrees Treeplicity(i, t): Decreases the similarity with decreasing simplicity – V. C. Bhavsar, H. Boley and L. Yang, A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Computational Intelligence, 2004, 20(4): 584 -602. 12

Similarity of simple trees Experiments Tree make 1 auto ford t 1 auto 1.

Similarity of simple trees Experiments Tree make 1 auto ford t 1 auto 1. 0 0. 0 2 make 2002 t 1 auto year 0. 0 ford 13 2002 year make ford year 0. 5 1. 0 t 3 Results Tree 2002 make auto year 0. 5 0. 1 chrysler t 2 1998 auto make year 1. 0 ford make 0. 0 t 2 auto 1. 0 ford 1998 year 0. 0 t 4 0. 55 2002 1. 0

Similarity of simple trees (Cont’d) Experiments Tree auto model 1. 0 3 mustang ford

Similarity of simple trees (Cont’d) Experiments Tree auto model 1. 0 3 mustang ford explorer 2000 t 1 t 2 auto model 1. 0 14 year make 0. 45 model 0. 45 0. 1 year make model 0. 05 0. 9 mustang ford explorer 2000 t 3 t 4 Results 0. 2823 0. 1203

Similarity of identical tree structures Experiments 4 Tree auto year make 0. 3 model

Similarity of identical tree structures Experiments 4 Tree auto year make 0. 3 model 0. 5 0. 2 make model year 0. 3 0. 2 0. 5 ford explorer 2002 ford explorer 1999 t 1 t 2 auto make year make model year model 0. 3334 0. 3333 15 ford explorer 2002 ford explorer 1999 t 3 t 4 Results 0. 55 0. 7000

Similarity of complex trees A t d b 0. 3333 c 0. 3333 b

Similarity of complex trees A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 8160 16 0. 9316 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 8996 B C D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9230 B 2 B 3 C 1 0. 9647 F C 3 C 4 D 1 0. 9793

Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333

Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 8555 17 0. 9626 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 9314 B C D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9499 B 2 B 3 C 1 0. 9824 E F C 3 C 4 D 1 0. 9902

Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333

Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 9134 18 0. 9697 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 9530 B * D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9641 B 2 B 3 C 1 0. 9844 F C 3 C 4 D 1 0. 9910

Node label similarity l For inner nodes and leaf nodes – Exact string matching

Node label similarity l For inner nodes and leaf nodes – Exact string matching binary result 0. 0 or 1. 0 – Permutation of strings “Java Programming” vs. “Programming in Java” Number of identical words Maximum length of the two strings Example 19 For two node labels “a b c” and “a b d e”, their similarity is: 2 = 0. 5 4

Node label similarity (Cont’d) Example Node labels “electric chair” and “committee chair” 1 2

Node label similarity (Cont’d) Example Node labels “electric chair” and “committee chair” 1 2 • Semantic 20 = 0. 5 similarity meaningful?

Node label similarity – Inner nodes vs. leaf nodes l Inner nodes — class-oriented

Node label similarity – Inner nodes vs. leaf nodes l Inner nodes — class-oriented – – – l Leaf nodes — type-oriented – – 21 Inner node labels can be classes are located in a taxonomy tree taxonomic class similarity measures address, currency, date, price and so on type similarity measures (local similarity measures)

Node label similarity Non-Semantic Matching Exact String Matching (both inner and leaf nodes) 22

Node label similarity Non-Semantic Matching Exact String Matching (both inner and leaf nodes) 22 String Permutation (both inner and leaf nodes) Semantic Matching Taxonomic Class Similarity (inner nodes) Type Similarity (leaf nodes)

Inner node similarity – Partonomy trees Distributed Programming Tuition Credit 0. 2 Duration Textbook

Inner node similarity – Partonomy trees Distributed Programming Tuition Credit 0. 2 Duration Textbook 0. 4 0. 1 0. 3 2 months “Introduction $800 3 to Distributed Programming” t 1 23 Object-Oriented Programming Tuition Credit 0. 1 Duration Textbook 0. 2 0. 5 0. 2 $1000 3 months “Objected-Oriented 3 Programming Essentials” t 2

Inner node similarity – Taxonomy tree Programming Techniques 0. 3 Object-Oriented 0. 5 0.

Inner node similarity – Taxonomy tree Programming Techniques 0. 3 Object-Oriented 0. 5 0. 7 0. 4 0. 2 General Concurrent Programming Sequential Applicative Automatic 0. 3 0. 9 Programming 0. 5 Arc weights Parallel Distributed Programming • same level of a subtree: do not need to add up to 1 • assigned by human experts or extracted from documents A. Singh, Weighted tree metadata extraction. MCS Thesis (in preparation), University of New 24 Brunswick, Fredericton, Canada, 2005.

Inner node similarity – Taxonomic class similarity Programming Techniques 0. 5 General 0. 7

Inner node similarity – Taxonomic class similarity Programming Techniques 0. 5 General 0. 7 0. 4 0. 2 Applicative Automatic 0. 3 Programming 0. 5 0. 3 Object-Oriented Programming Concurrent Programming Sequential 0. 9 Programming Parallel Distributed Programming • red arrows stop at the nearest common ancestor • the product of subsumption factors on the two paths = 0. 018 25

Inner node similarity – Integration of taxonomy tree into partonomy trees l Taxonomy tree

Inner node similarity – Integration of taxonomy tree into partonomy trees l Taxonomy tree – l extra taxonomic class similarity measures Semantic similarity without – – changing our partonomy similarity algorithm losing taxonomic semantic similarity Encode the (subsections) of taxonomy tree into partonomy trees www. teclantic. ca 26

Inner node similarity – Encoding taxonomy tree into partonomy tree Programming Techniques Applicative Programming

Inner node similarity – Encoding taxonomy tree into partonomy tree Programming Techniques Applicative Programming 0. 1 Automatic Programming 0. 1 * 0. 15 Object-Oriented General Programming 0. 3 * Sequential Programming Concurrent Programming 0. 2 * Distributed Programming 0. 6 * 0. 15 * Parallel Programming 0. 4 * encoded taxonomy tree 27 * *

Inner node similarity – Encoding taxonomy tree into partonomy tree (Cont’d) course Classification Tuition

Inner node similarity – Encoding taxonomy tree into partonomy tree (Cont’d) course Classification Tuition 0. 65 Duration Title 0. 05 Credit taxonomy 0. 05 0. 1 taxonomy 0. 2 0. 05 0. 15 0. 05 Programming 3 2 months Distributed $800 Programming 3 3 months Object- $1000 Techniques Programming Oriented Techniques 1. 0 Programming * 1. 0 Concurrent Sequential Programming * Sequential Object-Oriented 0. 7 0. 3 Programming * * 0. 8 0. 2 Distributed Parallel Programming * * 0. 4 0. 6 * 28 * t 2 t 1 encoded partonomy trees

Leaf node similarity (local similarity) Different leaf node types different type similarity measures l

Leaf node similarity (local similarity) Different leaf node types different type similarity measures l Various leaf node types – “Price”-typed leaf nodes e. g. for buyer ≤$800 [0, Max] for seller ≥$1000 [Min, ∞] l 29

Leaf node similarity (local similarity) Example: “Date”-typed leaf nodes Project 0. 74 start_date 0.

Leaf node similarity (local similarity) Example: “Date”-typed leaf nodes Project 0. 74 start_date 0. 5 end_date 0. 5 Nov 3, 2004 Project May 3, 2004 0. 5 30 { t 2 if | d 1 – d 2 | ≥ 365, 0. 0 1– Jan 20, 2004 Feb 18, 2005 t 1 DS(d 1, d 2) = start_date 0. 5 end_date | d 1 – d 2 | 365 otherwise.

Conclusion l l l 31 Arc-labeled and arc-weighted trees Partonomy similarity algorithm – Traverses

Conclusion l l l 31 Arc-labeled and arc-weighted trees Partonomy similarity algorithm – Traverses trees top-down – Computes similarity bottom-up Node label similarity – Exact string matching (inner and leaf nodes) – String permutation (inner and leaf nodes) – Taxonomic class similarity (inner nodes) ü Taxonomy tree ü Encoding taxonomy tree into partonomy tree – Type similarity (leaf nodes) ü date-typed similarity measures

Questions? 32

Questions? 32