Building a Knowledge Base of Morphosyntactic Terminology Will
Building a Knowledge Base of Morphosyntactic Terminology Will Lewis, Scott Farrar, and Terry Langendoen, University of Arizona IRCS Workshop on Linguistic Databases, University of Pennsylvania, Dec. 11 -13, 2001 2/13/2022 1
E-MELD components • Electronic Metastructure for Endangered Languages Data – Metadata standards for endangered language data; – Markup recommendations for structuring endangered language data for presentation and analysis on the World Wide Web – “Best practice” showcase 2/13/2022 2
Linguistic markup • Major extant recommendations are those of the Text Encoding Initiative (TEI). – Speech transcription (Chapter 11) – Print dictionaries (12) – Linking, segmentation, and alignment (14) – Simple analytic markup (15) – Feature structures (16) – Feature system declarations (26) 2/13/2022 3
TEI data interchange model • Project “A” data in A’s markup scheme map to TEI interchange format • Send to Project “B” map to Project “B” local markup scheme • Project “A” data in B’s markup scheme • Projects use TEI format to avoid translations at both ends. 2/13/2022 4
Data comparison model language resource B language resource A User 2/13/2022 5
Metatagging • Comparing linguistic data sets which use different markup definitions – Assume a common markup language, XML. – Methods are needed by which the semantics as well as the syntax of each tagging scheme can be compared. • Both A and B use “absolutive”. Do they mean the same thing? • A uses “possessive” where B uses “genitive”. How do they compare? 2/13/2022 6
Knowledge Base • A collection of structured objects along with the tools necessary for manipulating them (Sowa 2000). • Key components of a KB – Ontology – Computational tools – Logic 2/13/2022 7
Ontology • • Concept hierarchy Enriched taxonomy Acts as an interlingua Eliminates need for a gold standard of linguistic terminology • Desiderata – Representational language of ontology must interface with tools of the knowledge base. – Grounded in linguistic practice 2/13/2022 8
The E-MELD Ontology • Contents – Knowledge base of Linguistic Terminology – Current version concentrates specifically on Morphosyntactic Terminology • Structure – – 2/13/2022 Inheritance, is-a Multiple Inheritance Mereological (Part-Whole) Relations And other relations 9
Ontology Fragment TOP : : Constituent Morpheme Construction Clause Phrase Word Is part of 2/13/2022 10
The Knowledge Base Editor • Protégé: – Provides an extensible architecture for developing knowledge systems – uses CLIPS style formatting for representing data (class, slot, instance, frame, facet) – provides JDBC (java d. B connectivity) support – it is widely used in the knowledge engineering community – freeware and open source 2/13/2022 11
Protégé Development Environment 2/13/2022 12
Considerations for an Ontology of Linguistic Terminology • Category subsuming the linguistic objects and processes (e. g. , Morpheme, Suffixation) • Category subsuming what may be called grammatical properties of the first category (e. g. , Accusative. Case, Perfective. Aspect) • Category subsuming all possible tags a researcher wishes to use to describe a given language (e. g. , Acc, Gen, VBD) 2/13/2022 13
Building a Knowledge Base • Starting from scratch is very time consuming and difficult • Ontology development is a bottleneck • Solution: chose an existing ontology • SUMO (Suggested Upper Merged Ontology) – Goal: standard upper ontology that can be used for any application – Top Level categories already defined 2/13/2022 14
E-MELD System Architecture End user GUI Sample query: "ergative P 2" Returns: Examples of ergative constructions from P 2 languages Hopi <XML> … 2/13/2022 EMELD Query Engine Nahuatl <XML> … Knowledge Base of Terminology, including Ontology Yaqui <XML> … 15
Future Directions • Subject the existing morphosyntactic ontology to peer review • Merge existing tagsets into the ontology 2/13/2022 16
2/13/2022 17
Future Directions • Subject the existing morphosyntactic ontology to peer review • Merge existing tagsets into the ontology • Extend the ontology beyond the morphosyntactic domain • Develop tools that use the ontology 2/13/2022 18
- Slides: 18