Gelbe Seiten Thesaurus and ISO 25964 relationships in
Gelbe Seiten Thesaurus and ISO 25964 relationships in Vocbench 3 Roland Wingerter Gelbe Seiten Marketing Gmb. H EU Publication Office Workshop Luxembourg, June 4 th/5 th 2019
Topics • Migration of YP Thesaurus to Voc. Bench 3 • Classes and Relationships used • Benefits • Discussion of possible changes to the data model • Future developments
Migration of YP Thesaurus to Vocbench 3 Task • Migrate a term-oriented thesaurus with 18 relationship types, two of them with relation weights • Languages: German, (English), (French) • Diverse ordering systems grouping thesaurus terms • 150 subject groups • 16 clusters • 2 two-level hierarchies for access to headings (for online application, resp. mobile app) • Various custom attributes for terms • Ability to export thesaurus in z. Thes xml-format.
Mapping of thesaurus elements to VB 3 Thesaurus element Mapping in VB 3 Purpose Topical Hierarchies gs: Topic skos: Concept. Scheme (1 st level); Top. Concept (2 nd lev. ) Top-down access to headings Clusters gs: Cluster skos: Concept. Scheme Data monitoring Subject Groups gs: Subject. Group skos: Concept. Scheme Internal structuring Heading skos: Concept Indexing and Search Access. Term gs: Access. Term subclass. Of owl: Thing Indexing and Search Controlled KW gs: Controlled. Keyword subclass. Of owl: Thing (Indexing and) Search Brand gs: Brand a skos: Concept (Indexing and Search) Custom attributes Instances of owl: Classes Information for users / internal structuring
Mapping of hierarchical relations English Meaning Tag BT / NT Broader / Narrower Term BTG / Broader / Narrower NTG Term generic BTP / Broader / Narrower NTP Term partitive (BTR) / Broader / Narrower (NTR) Term restricted (BTO) / Broader / Narrower (NTO) Term (other) German Tag OB / UB OBA / UBA OBP / UBP OBB / UBB OBZ / UBZ Representation in VB 3 Used for skos: broader / skos: narrower iso-thes: broader. Generic / narrower. Generic isothes: broader. Partitive / narrower. Partitve gs: broader. Restricted / gs: narrower. Restricted gs: broader. Other / gs: narrower. Other (Top. Concept) Heading – Heading Frq Pct n/a 22. 111 70, 5% Heading – Heading 420 1, 3% Heading – Heading 62 0, 2% Heading – Heading 8. 748 27, 9% 53. 508 100, 0%
Mapping of associative relations English Meaning Tag RT Related Term German Tag VB (BRD) / Brand / Product or (PRD) Service ? (Warning) MRK / PRD gs: brand / gs: product. Or. Service VV gs: different. From Brand – Heading 194 ? GG gs: antonym Heading – Heading 20 gs: Controlled. Keyword Relationship gs: controlled. Keyword gs: keyword gs: weight Nondescriptor – Heading Antonym (KWH) Controlled Keyword QKB / BQK / (HKW) – Heading / Heading Controlled Keyword Representation in VB 3 Used for skos: related Heading – Heading Frq 14. 437 3. 035 2. 358
Mapping of equivalence relations English Meaning Tag USE / Use synonym or near synonym UF Used for synonym or near synonym (ID) Identical meaning German Representation in VB 3 Used for Tag BS / BF skosxl: pref. Label / Nondescriptor – Heading skosxl: alt. Label ID skos: exact. Match Heading – Heading 1. 382 (QS) Near Synonym QS skos: close. Match Heading – Heading 396 USE / UFC Use combination of simple BK descriptors Used for combination of simple descriptors 1. Use (one of) several BSA / alternative descriptors for an BFA ambiguous term. 2. Nondescriptor pointing to several descriptors that will be found in a search. (UA) / (UFA) iso-thes: Nondescriptor – Heading Compound. Equivalence iso-thes: use. Plus iso-thes: UFPlus gs: Access. Relationship Nondescriptor – Heading gs: access gs: entry gs: weight Frq 24. 226 161 36. 480
Class frequency Class Frq skos: Concept 22. 355 gs: Brand 2. 190 gs: Access. Entry skosxl: Label skos: Concept. Scheme gs: Cluster 1. 142 69. 135 287 17 gs: Topic 102 gs: Subject. Group 167 gs: Access. Relationship gs: Controlled. Keyword. Relationship iso-thes: Compound. Equivalence 34. 610 2. 358 80
Relation Frequency Type FRQ RELATION Pct FRQ Pct TYPE ASSOCIATIVE 20. 044 17, 6% BTG / NTG 22. 111 70, 5%HIERARCHIC EQUIVALENT 62. 645 54, 9% BTP / NTP 420 1, 3%HIERARCHIC 31. 341 27, 5% (BTR) / (NTR) 62 0, 2%HIERARCHIC 114. 030 100, 0% (BTO) / (NTO) 8. 748 27, 9%HIERARCHIC Sum RELATION USE / UF FRQ Pct TYPE 24. 226 38, 7%EQUIVALENT (ID) 1. 382 2, 2%EQUIVALENT (QS) 396 0, 6%EQUIVALENT USE / UFC 161 0, 3%EQUIVALENT (UA) / (UFA) 36. 480 58, 2%EQUIVALENT Sum 62. 645 100, 0% RELATION RT Brand – Product Different. From Antonym Contr. Keyword Heading Sum 31. 341 100, 0% FRQ Pct TYPE 14. 437 72, 0%ASSOC 3. 035 15, 1%ASSOC 194 1, 0%ASSOC 20 0, 1%ASSOC 2. 358 11, 8%ASSOC 20. 044 100, 0%
Benefits of the migration to Voc. Bench • Thesaurus can be maintained and enhanced on an open and flexible, web-based platform. • Based on standards (OWL, RDFS, SKOS-XL) no vendor-lockin. • Opens up new perspectives (LOD, Ontolex Dictionaries, Knowledge Representation). • Using BTG / NTG for query expansion seems promising and can easily be tested with SPARQL.
Changes proposed to the YP model • Make these a subclass of skos: Concept • Create class gs: Attribute. Value
Discussion of Compound. Equivalence
Access. Relationship – proposed change v. 1. 0 Current state v. 1. 1 Inverse rel. added v. 1. 2 More changes
Compound Equivalence proposal OR
Access. Relationship in VB 3
Facets of YP Headings (pref. Labels) Facet Thing, Product Person, Profession Examples Frq Cars, Food, Windows, Pharmaceutical Products Car repair, Health care, Tax Consulting Banks, Hotels, Pharmacies, Restaurants Carpenters, Lawyers, Photographers Domain Astronomy, Geology. 256 1% Others n/a 217 1% 18. 910 100% Activity, Service Organization Pct 9. 244 49% 5. 212 28% 2. 217 12% 1. 764 9% • Facets can be used for quality control BTG / NTG must belong to the same facet. • Think of these Facets as classes • Organization is an Agent, Person is an Agent
Challenge I • Headings that are Things/Products are underspecified. The German YP do not differentiate between Manufacturing, Wholesale or Retail. In the past we tried to remedy this by adding numerous nondescriptors that are compounds with words like: Frequnt components of alt. Labels handel@de hersteller@de händler@de fabrik@de verkauf@de großhandel@de laden@de geschäft@de handlung@de
Challenge II • A thesaurus struggles to cope with the lexical and syntactical variety. Considering the frequency of compounds in German on one hand it and synonyms, spelling variants, abbreviations and full forms on the other hand it is tedious and hopeless to simply add as many variations of alt. Labels as possible. Decomposiition using a ressource like Word. Net (and possibly rules for the automatic generation of alt. Labels) would be a more systematic approach.
Description of Headings • Describe what an Agent does • Add missing default Actions to a Heading Action Object Bakeries Retail Bakery Product Pharmacy Retail Pharmaceutical Product Car parts and accessories Manufacturing Wholesale Retail Heading Action Object Bakery goods Wholesale Bakery Product Pharmacy Retail Pharmaceutical Product Car parts and accessories Manufacturing Wholesale Retail
Description of Headings • Decompose Compounds and make their role explicit • Use Lexical Concepts / Word. Senses from Word. Net. Inherit synonyms, semantic relations and more. Heading Action Object Car repair Repair Car Health care Care Health Tax consulting Consulting Tax
Use LOD ressources
Describe concepts with Lexical Concepts
A simple example Car Repair Action Repair Object Car Brand [empty] Action [empty] Object [empty]
Let‘s try some more … Action Object Brand Action Object
- Slides: 24