Weighted PartonomyTaxonomy Trees with Local Similarity Measures for
































- Slides: 32
Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making Lu Yang, Marcel Ball, Virendra C. Bhavsar and Harold Boley BASe. WEB, May 8, 2005 1
Outline l l l Introduction Motivation Partonomy Similarity Algorithm – – l Node Label Similarity – – l 2 Tree representation Tree simplicity Partonomy similarity Experimental Results Inner-node similarity Leaf-node similarity Conclusion
Introduction l Buyer-Seller matching in e-business, e-learning User Web Browser Main Server User Info User Profiles Agents User Agents Cafe-1 Matcher 1 3 To other sites (network) Cafe-n Matcher n A multi-agent system
Introduction l An e-Learning scenario Learner 1 Learner 2 Course Provider 1 Cafe Course Provider 2 Matcher Learner n Course Provider m H. Boley, V. C. Bhavsar, D. Hirtle, A. Singh, Z. Sun and L. Yang, A match-making system for learners and learning Objects. Learning & Leading with Technology, International Society for Technology in Education, Eugene, OR, 2005 (to appear). 4
Motivation l Metadata for buyers and sellers – – l 5 Keywords/keyphrases Tree similarity
Tree representation l Characteristics of our trees – – – Node-labled, arc-labled and arc-weighted Arcs are labled in lexicographical order Weights sum to 1 Car Make 0. 3 Year Model 0. 5 0. 2 Ford 6 Explorer 2002
Tree representation – Serialization of trees Weighted Object-Oriented Rule. ML – XML attributes for arc weights and subelements for arc labels – <Cterm> <Ctor>Car</Ctor> <slot weight="0. 3"><Ind>Make</Ind><Ind>Ford</Ind></slot> <slot weight="0. 2"><Ind>Model</Ind><Ind>Explorer</Ind></slot> <slot weight="0. 5"><Ind>Year</Ind><Ind>2002</Ind></slot> </Cterm> 7 Tree serialization in WOO Rule. ML
Tree simplicity A (0. 9) tree simplicity: 0. 0563 0. 2 D – – 8 b a c B d 0. 8 E e 0. 1 F 0. 7 C (0. 45) f 0. 9 G (0. 225) The deeper the leaf node, the less its contribution to the tree simplicity ü Depth degradation index (0. 9) ü Depth degradation factor (0. 5) Reciprocal of tree breadth L. Yang, B. Sarker, V. C. Bhavsar and H. Boley, A weighted-tree simplicity algorithm for similarity matching of partial product descriptions (submitted for publication).
Tree simplicity – Computation if T is a leaf node, otherwise. Š(T): the simplicity value of a single tree T DI and DF: depth degradation index and depth degradation factor d: depth of a leaf node m: root node degree of tree T that is not a leaf wj: arc weight of the jth arc below the root node of tree T Tj: subtree below the jth arc with arc weight wj 9
Partonomy similarity – Simple trees 1 tree t Ford 10 tree t´ Car Make 0. 3 1 0 Inner nodes Model 0. 7 Car (House) Model 0. 7 Make 0. 3 Mustang Leaf nodes Escape Ford 0
Partonomy similarity – Complex trees t t´ lom educational 0. 3334 edu-set general language title en 0. 5 general technical format 0. 5 tec-set platform 0. 5 0. 7 gen-set language title format en Introduction HTML Win. XP to Oracle technical 0. 3333 gen-set lom 0. 2 tec-set platform 0. 8 0. 1 0. 9 Basic Oracle * Win. XP * : Don’t Care (si (wi + w'i)/2) 11 A(si) ≥ si (A(si)(wi + w'i)/2)
Partonomy similarity – Main functions l Three main functions (Relfun) – Treesim(t, t'): Recursively compares any (unordered) pair of trees Paremeters N and i – Treemap(l, l'): Recursively maps two lists, l and l', of labeled and weighted arcs: descends into identical– labeled subtrees Treeplicity(i, t): Decreases the similarity with decreasing simplicity – V. C. Bhavsar, H. Boley and L. Yang, A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Computational Intelligence, 2004, 20(4): 584 -602. 12
Similarity of simple trees Experiments Tree make 1 auto ford t 1 auto 1. 0 0. 0 2 make 2002 t 1 auto year 0. 0 ford 13 2002 year make ford year 0. 5 1. 0 t 3 Results Tree 2002 make auto year 0. 5 0. 1 chrysler t 2 1998 auto make year 1. 0 ford make 0. 0 t 2 auto 1. 0 ford 1998 year 0. 0 t 4 0. 55 2002 1. 0
Similarity of simple trees (Cont’d) Experiments Tree auto model 1. 0 3 mustang ford explorer 2000 t 1 t 2 auto model 1. 0 14 year make 0. 45 model 0. 45 0. 1 year make model 0. 05 0. 9 mustang ford explorer 2000 t 3 t 4 Results 0. 2823 0. 1203
Similarity of identical tree structures Experiments 4 Tree auto year make 0. 3 model 0. 5 0. 2 make model year 0. 3 0. 2 0. 5 ford explorer 2002 ford explorer 1999 t 1 t 2 auto make year make model year model 0. 3334 0. 3333 15 ford explorer 2002 ford explorer 1999 t 3 t 4 Results 0. 55 0. 7000
Similarity of complex trees A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 8160 16 0. 9316 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 8996 B C D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9230 B 2 B 3 C 1 0. 9647 F C 3 C 4 D 1 0. 9793
Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 8555 17 0. 9626 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 9314 B C D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9499 B 2 B 3 C 1 0. 9824 E F C 3 C 4 D 1 0. 9902
Similarity of complex trees (Cont’d) A t d b 0. 3333 c 0. 3333 b B 4 C 1 0. 9134 18 0. 9697 E d c 0. 3333 0. 3334 B C D b 1 c 3 d 1 0. 5 b 4 0. 3333 c 2 0. 3334 1. 0 0. 3333 0. 5 B 1 A t´ C 3 D 1 0. 9530 B * D b 3 c 1 b 1 c 4 0. 3334 d 1 0. 25 c 2 c 3 0. 3333 b 2 0. 25 1. 0 0. 3333 B 1 0. 9641 B 2 B 3 C 1 0. 9844 F C 3 C 4 D 1 0. 9910
Node label similarity l For inner nodes and leaf nodes – Exact string matching binary result 0. 0 or 1. 0 – Permutation of strings “Java Programming” vs. “Programming in Java” Number of identical words Maximum length of the two strings Example 19 For two node labels “a b c” and “a b d e”, their similarity is: 2 = 0. 5 4
Node label similarity (Cont’d) Example Node labels “electric chair” and “committee chair” 1 2 • Semantic 20 = 0. 5 similarity meaningful?
Node label similarity – Inner nodes vs. leaf nodes l Inner nodes — class-oriented – – – l Leaf nodes — type-oriented – – 21 Inner node labels can be classes are located in a taxonomy tree taxonomic class similarity measures address, currency, date, price and so on type similarity measures (local similarity measures)
Node label similarity Non-Semantic Matching Exact String Matching (both inner and leaf nodes) 22 String Permutation (both inner and leaf nodes) Semantic Matching Taxonomic Class Similarity (inner nodes) Type Similarity (leaf nodes)
Inner node similarity – Partonomy trees Distributed Programming Tuition Credit 0. 2 Duration Textbook 0. 4 0. 1 0. 3 2 months “Introduction $800 3 to Distributed Programming” t 1 23 Object-Oriented Programming Tuition Credit 0. 1 Duration Textbook 0. 2 0. 5 0. 2 $1000 3 months “Objected-Oriented 3 Programming Essentials” t 2
Inner node similarity – Taxonomy tree Programming Techniques 0. 3 Object-Oriented 0. 5 0. 7 0. 4 0. 2 General Concurrent Programming Sequential Applicative Automatic 0. 3 0. 9 Programming 0. 5 Arc weights Parallel Distributed Programming • same level of a subtree: do not need to add up to 1 • assigned by human experts or extracted from documents A. Singh, Weighted tree metadata extraction. MCS Thesis (in preparation), University of New 24 Brunswick, Fredericton, Canada, 2005.
Inner node similarity – Taxonomic class similarity Programming Techniques 0. 5 General 0. 7 0. 4 0. 2 Applicative Automatic 0. 3 Programming 0. 5 0. 3 Object-Oriented Programming Concurrent Programming Sequential 0. 9 Programming Parallel Distributed Programming • red arrows stop at the nearest common ancestor • the product of subsumption factors on the two paths = 0. 018 25
Inner node similarity – Integration of taxonomy tree into partonomy trees l Taxonomy tree – l extra taxonomic class similarity measures Semantic similarity without – – changing our partonomy similarity algorithm losing taxonomic semantic similarity Encode the (subsections) of taxonomy tree into partonomy trees www. teclantic. ca 26
Inner node similarity – Encoding taxonomy tree into partonomy tree Programming Techniques Applicative Programming 0. 1 Automatic Programming 0. 1 * 0. 15 Object-Oriented General Programming 0. 3 * Sequential Programming Concurrent Programming 0. 2 * Distributed Programming 0. 6 * 0. 15 * Parallel Programming 0. 4 * encoded taxonomy tree 27 * *
Inner node similarity – Encoding taxonomy tree into partonomy tree (Cont’d) course Classification Tuition 0. 65 Duration Title 0. 05 Credit taxonomy 0. 05 0. 1 taxonomy 0. 2 0. 05 0. 15 0. 05 Programming 3 2 months Distributed $800 Programming 3 3 months Object- $1000 Techniques Programming Oriented Techniques 1. 0 Programming * 1. 0 Concurrent Sequential Programming * Sequential Object-Oriented 0. 7 0. 3 Programming * * 0. 8 0. 2 Distributed Parallel Programming * * 0. 4 0. 6 * 28 * t 2 t 1 encoded partonomy trees
Leaf node similarity (local similarity) Different leaf node types different type similarity measures l Various leaf node types – “Price”-typed leaf nodes e. g. for buyer ≤$800 [0, Max] for seller ≥$1000 [Min, ∞] l 29
Leaf node similarity (local similarity) Example: “Date”-typed leaf nodes Project 0. 74 start_date 0. 5 end_date 0. 5 Nov 3, 2004 Project May 3, 2004 0. 5 30 { t 2 if | d 1 – d 2 | ≥ 365, 0. 0 1– Jan 20, 2004 Feb 18, 2005 t 1 DS(d 1, d 2) = start_date 0. 5 end_date | d 1 – d 2 | 365 otherwise.
Conclusion l l l 31 Arc-labeled and arc-weighted trees Partonomy similarity algorithm – Traverses trees top-down – Computes similarity bottom-up Node label similarity – Exact string matching (inner and leaf nodes) – String permutation (inner and leaf nodes) – Taxonomic class similarity (inner nodes) ü Taxonomy tree ü Encoding taxonomy tree into partonomy tree – Type similarity (leaf nodes) ü date-typed similarity measures
Questions? 32