Morphology beyond inflection Building a wordformation based lexicon
Morphology beyond inflection. Building a wordformation based lexicon for Latin Eleonora Litta Modignani Picozzi MSCA Fellow CIRCSE Research Centre Università Cattolica del Sacro Cuore, Milano Marie Skłodowska-Curie grant agreement No 658332 -WFL
Inflectional vs derivational Morphology (for Latin) • Word formation relations not treated by computational lexical resources or morphological analysers. • Halfway between morphology and semantics • WFR not only build new words, but create new words with a shared semantic core, which can be useful for NLP tools. • First tools for modern languages (Czech, Croatian, German).
The Word Formation Latin (WFL) project • Started by Passarotti and Mambrini in 2012 • Awarded Marie Curie Fellowship • Definitive derivational lexicon for Latin.
The lexical basis: Lem. Lat • Morphological analyser for Latin. • Data collected from three dictionaries: – Georges and Georges, Ausfuhrliches Lateinisch-Deutsches Handwôrterbuch (1913 -1918) – Glare, Oxford Latin Dictionary (1982) – Gradenwitz: Laterculi vocum latinarum (1904). • 40, 014 lexical entries, 432 lemmas, and 26, 205 lemmas from Forcellini’s Onomasticon (1940). • Lexical basis for WFL: LES (LExical Segment) archive and List of Lemmas.
Word formation Item-and-Arrangement model: word forms are 1) simple morphemes or 2) concatenation of morphemes absolving the following conditions: • Baudoin’s assumption that both base and affixes are lexical elements (i. e. they are both morphemes), • They are dualistic, having both form and meaning (Bloomfield’s “sign-base” morpheme theory) • They both exist in a lexicon (Bloomfield’s “lexical morpheme” theory)
Formalising WFL Word formation based lexicon built in three steps: 1) WFRs are detected 2) WFRs are applied to the lexical data 3) Results are manually checked and evaluated
Types of word formation 1) Derivation: a. Affixal: – Prefixal: duco => con-duco – Suffixal: amo => am-a-bil-is b. Conversion : bonus (adj. ) => bonum (noun) 1) Compounding : magnus + facio = magnificus
Detection of word formation rules (WFRs) • Semi-automatic finding of affixal rules (Passarotti & Mambrini 2012). • List of possibile combination of Po. S for conversion (e. g. V-to-V, V-to-N, V-to-A, etc. ) and compounding (A+V=N, A+V=A, etc. ). • WFRs formalised into a table according to category of change, type of word formation, input Po. S and output Po. S. • 50 additional rules found so far while working through the data.
WFR list
My. Sql relational database wfr LES archive Lemmas My. SQL queries wfr_rule(s) WFR Master Table
WFR paired candidates: V-to-V prefix sub-
Finding prefixal and suffixal candidates
Manual checking and disambiguating Conversion N 2 -to-N 1 focaria - n ‘kitchen maid’ Conversion A -to-N 1 focarius - n ‘kitchen servant’ Conversion A -to-N 2 focarius - adj. ‘belonging to fire’
Evaluation • Precision is higher for lower morphotactic mutations (e. g. prefixal rules) • Precision lower for obscure WFRs (e. g. compounding) • Precision can also depend on workflow • Recall needs to be calculated when we have finding WFRs
Viewing the relations • Visualisation query system (by Chris Culy) • Browsing options: – By WFR – By Affix – By Po. S – By Lemma
Thank you! WFL team • Eleonora Litta • Marco Passarotti Visualisations • Chris Culy (http: //chrisculy. net/) DB Engineering • Paolo Ruffolo Website • http: //progetti. unicatt. it/progetti-milan-wfl-home • https: //www. facebook. com/wordformationlatin/ WFL has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 658332 -WFL.
- Slides: 21