Archi Word Net Integrating Word Net with DomainSpecific

  • Slides: 38
Download presentation
Archi. Word. Net Integrating Word. Net with Domain-Specific Knowledge Luisa Bentivogli 1, Andrea Bocco

Archi. Word. Net Integrating Word. Net with Domain-Specific Knowledge Luisa Bentivogli 1, Andrea Bocco 2, Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico di Torino, Italy

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the Multi. Word. Net model • Integrating Archi. Word. Net with Multi. Word. Net • Conclusion and future work GWC 2004 - Brno, January 20 -23, 2004

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the Multi. Word. Net model • Integrating Archi. Word. Net with Multi. Word. Net • Conclusion and future work GWC 2004 - Brno, January 20 -23, 2004

Archi. Word. Net: a Word. Net-like thesaurus • A bilingual English/Italian thesaurus for the

Archi. Word. Net: a Word. Net-like thesaurus • A bilingual English/Italian thesaurus for the “Architecture and Construction” domain – structured according to the Word. Net model – fully integrated with Multi. Word. Net A multilingual lexical database in which the Italian Word. Net is strictly aligned with Princeton’s English Word. Net. GWC 2004 - Brno, January 20 -23, 2004

Motivation • Still Image Server, an architecture image archive available at the Polytechnic of

Motivation • Still Image Server, an architecture image archive available at the Polytechnic of Turin – need for a thesaurus: • Image cataloguing (minimize subjectivity) • Image retrieval (minimize ambiguity) • No exhaustive thesauri for the architecture domain are available GWC 2004 - Brno, January 20 -23, 2004

Why (Multi)Word. Net model? • A rich and rigorous structure – synonyms – many

Why (Multi)Word. Net model? • A rich and rigorous structure – synonyms – many relations explicitly and homogeneously encoded • Allows for a more powerful and expressive retrieval mechanism – no ambiguities – extended search with related concepts • Is more suitable for educational purposes GWC 2004 - Brno, January 20 -23, 2004

Why integrated with Multi. WN? • General and multilingual framework for the specialized knowledge

Why integrated with Multi. WN? • General and multilingual framework for the specialized knowledge • Integrated access allowing for a more flexible retrieval of the information • Information already existing in the generic (Multi)Word. Net can be exploited in the creation of the specialized one GWC 2004 - Brno, January 20 -23, 2004

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the Multi. Word. Net model • Integrating Archi. Word. Net with Multi. Word. Net • Conclusion and future work GWC 2004 - Brno, January 20 -23, 2004

Adopting Multi. WN model • Sources: – Specialized sources • • • Art and

Adopting Multi. WN model • Sources: – Specialized sources • • • Art and Architecture Thesaurus (AAT) Construction Indexing Manual of CI|Sf. B International and National standards (ISO, CEN, UNI) Architecture and Building Dictionaries Domain literature – Multi. WN itself • Issues: – Reorganize specialized sources to make them compatible with the Multi. WN model – Modify Multi. WN synsets to make them suitable for representing the specialized domain GWC 2004 - Brno, January 20 -23, 2004

Reorganizing domain-specific sources AAT hierarchy Archi. WN hierarchy GWC 2004 - Brno, January 20

Reorganizing domain-specific sources AAT hierarchy Archi. WN hierarchy GWC 2004 - Brno, January 20 -23, 2004

Tailoring Multi. WN synsets • Multi. WN synsets considered appropriate by the domain experts

Tailoring Multi. WN synsets • Multi. WN synsets considered appropriate by the domain experts are included into Archi. WN • Several options are available: – add or delete synonyms to Multi. WN synsets – modify Multi. WN definitions of the synsets – delete and add relations between synsets GWC 2004 - Brno, January 20 -23, 2004

New relations for Archi. WN • HAS FORM (n/n) – {tympanum} HAS-FORM {triangle, trigon,

New relations for Archi. WN • HAS FORM (n/n) – {tympanum} HAS-FORM {triangle, trigon, …} • HAS ROLE (n/n) – {metal section} HAS-ROLE {upright, vertical} • HAS FUNCTION (n/v) – {beam} HAS-FUNCTION {to hold, to support, …} GWC 2004 - Brno, January 20 -23, 2004

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the Multi. Word. Net model • Integrating Archi. Word. Net with Multi. Word. Net • Conclusion and future work GWC 2004 - Brno, January 20 -23, 2004

Integrating Archi. WN with Multi. WN • 5, 000 terms grouped in 13 semantic

Integrating Archi. WN with Multi. WN • 5, 000 terms grouped in 13 semantic areas => the main Archi. WN hierarchies • Architectural styles • Materials • Construction products • Techniques • Tools • Components of buildings • Single buildings and building complexes • Physical properties • Conditions • Disciplines • People • Documents • Drawings and representations GWC 2004 - Brno, January 20 -23, 2004

Integration issues • Identify the Multi. WN nodes where to insert the Archi. WN

Integration issues • Identify the Multi. WN nodes where to insert the Archi. WN hierarchies • Include Archi. WN hierarchies in Multi. WN • Handle the overlaps between terms present in both Multi. WN and Archi. WN • Handle the possible inconsistencies in the hierarchies GWC 2004 - Brno, January 20 -23, 2004

The integration methodology • Basic operations – performed on single Multi. WN synsets •

The integration methodology • Basic operations – performed on single Multi. WN synsets • Complex procedures (plug-in) – apply to entire hierarchies GWC 2004 - Brno, January 20 -23, 2004

Basic operations • eclipse a synset • tag a synset with the “architecture and

Basic operations • eclipse a synset • tag a synset with the “architecture and construction” domain label • add or delete relations to a synset • add or delete synonyms in a synset • modify the synset definition GWC 2004 - Brno, January 20 -23, 2004

Complex procedures • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in

Complex procedures • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN MWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in AWN • Hyponymic plug-in AWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in AWN • Hyponymic plug-in AWN AWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN MWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in AWN • Hyponymic plug-in AWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in AWN • Hyponymic plug-in AWN MWN AWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN • Hyponymic plug-in MWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 MWN

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN AWN MWN • Hyponymic

Complex procedures • Substitutive plug-in MWN • Integrative plug-in MWN AWN MWN • Hyponymic plug-in MWN AWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004

Complex procedures • Substitutive plug-in AWN • Integrative plug-in AWN • Hyponymic plug-in AWN

Complex procedures • Substitutive plug-in AWN • Integrative plug-in AWN • Hyponymic plug-in AWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004 AWN

Complex procedures • Substitutive plug-in AWN • Integrative plug-in AWN MWN AWN • Hyponymic

Complex procedures • Substitutive plug-in AWN • Integrative plug-in AWN MWN AWN • Hyponymic plug-in AWN MWN • Inverse plug-in GWC 2004 - Brno, January 20 -23, 2004

Results • 13 Archi. WN semantic areas plugged in 18 Multi. WN synsets –

Results • 13 Archi. WN semantic areas plugged in 18 Multi. WN synsets – 11 Archi. WN semantic areas (12 hierarchies) directly plugged in Multi. WN • 4 substitutive plug-ins • 8 integrative plug-ins – 2 Archi. WN semantic areas (6 hierarchies) required a reorganization of some Multi. WN sub-hierarchies • 4 hyponymic plug-ins • 2 inverse plug-ins • large synset eclipsing GWC 2004 - Brno, January 20 -23, 2004

Archi. WN up to now • “Single buildings and building complexes” subhierarchy – 900

Archi. WN up to now • “Single buildings and building complexes” subhierarchy – 900 synsets – Italian and English synonyms – accurate definition • Work done manually using the Multi. WN graphical interface which allows the user – to modify existing synsets and relations – to create new synsets GWC 2004 - Brno, January 20 -23, 2004

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the

Outline • Archi. Word. Net: a Word. Net-like thesaurus • Adopting and adapting the Multi. Word. Net model • Integrating Archi. Word. Net with Multi. Word. Net • Conclusion and future work GWC 2004 - Brno, January 20 -23, 2004

Conclusions • It is possible to integrate Archi. WN with Multi. WN • Multi.

Conclusions • It is possible to integrate Archi. WN with Multi. WN • Multi. WN itself can be widely exploited in the creation of Archi. WN hierarchies • Advantages of interdisciplinary cooperation – wrt specialized thesauri • formalized structure • inheritance of linguistic-oriented information from the generic Word. Net – wrt lexical resources • many synsets will be associated with images GWC 2004 - Brno, January 20 -23, 2004

Future work • Go on enriching the “Single buildings and building complexes” hierarchy and

Future work • Go on enriching the “Single buildings and building complexes” hierarchy and populating the remaining hierarchies • Industrial applications: multilingual specialized lexicon of approximately 1, 000 synsets for the window and curtain wall industry • Agreement for the future usage of Archi. WN by the Piemonte region in the cataloguing of its architectural cultural heritage GWC 2004 - Brno, January 20 -23, 2004

Details GWC 2004 - Brno, January 20 -23, 2004

Details GWC 2004 - Brno, January 20 -23, 2004

back Direct plug-ins Architectural styles Materials Construction products Techniques architectural style/1 material/1, substance/1 building

back Direct plug-ins Architectural styles Materials Construction products Techniques architectural style/1 material/1, substance/1 building material/1 technique/1 Sub Sub Int Tools Physical properties Conditions Disciplines People Documents Drawings and representations tool/1 physical property/1 condition/1 discipline/1 person/ document/1 drawing/2, representation/2 Int Int GWC 2004 - Brno, January 20 -23, 2004

back Reorganizations Components of buildings structure/1 component/3 region/1 Hypo Single buildings and building complexes

back Reorganizations Components of buildings structure/1 component/3 region/1 Hypo Single buildings and building complexes structure/AWN building/1 building complex/1 Hypo Inverse GWC 2004 - Brno, January 20 -23, 2004

Term overlapping ITC-irst provides the Polythecnic with lists of terms: -synsets tagged with the

Term overlapping ITC-irst provides the Polythecnic with lists of terms: -synsets tagged with the “architecture” label in WN -Domains -hyponyms of Word. Net plug-in synsets WN-Domains: 2, 595 • Architecture = 155 synsets – Town planning = 444 synsets – Building industry = 1, 541 synsets – Furniture = 455 synsets GWC 2004 - Brno, January 20 -23, 2004

Hyponyms of Plug-in synsets Architectural styles Materials Construction products Techniques Tools Physical properties Conditions

Hyponyms of Plug-in synsets Architectural styles Materials Construction products Techniques Tools Physical properties Conditions Disciplines People Documents Drawings and representations architectural style/1 material/1 substance/1 building material/1 technique/1 tool/1 physical property/1 condition/1 discipline/1 person/ document/1 drawing/2, representation/2 GWC 2004 - Brno, January 20 -23, 2004 S S S I I I I I back 12 hyponyms 1, 266 hyponyms 6, 054 hyponyms 95 hyponyms 301 hyponyms 103 hyponyms 1, 721 hyponyms 464 hyponyms 6, 068 hyponyms 328 hyponyms 26 hyponyms 159 hyponyms

entity/1 Reorganization of: -Components of buildings -Single buildings and building complexes eclipsing object/1 artifact/1

entity/1 Reorganization of: -Components of buildings -Single buildings and building complexes eclipsing object/1 artifact/1 part/4 location/1 structure/1 component/3 region/1 hypo structure (AWN) inverse building/1 hypo architectural component hypo architectural space inverse building complex/1 building element room, area, building space GWC 2004 - Brno, January 20 -23, 2004 open space

Modifying Multi. WN definition Word. Net: {wall – “an architectural partition with a height

Modifying Multi. WN definition Word. Net: {wall – “an architectural partition with a height and length greater than its thickness; used to divide or enclose an area or to support another structure”} partition divider support ISA wall structural_wall bearing_wall an architectural partition with a height and length greater than its thickness; used to divide or enclose an area any wall supporting a floor or the roof of a building GWC 2004 - Brno, January 20 -23, 2004