Across Languages and Paradigms New Perspectives on Resource
- Slides: 65
Across Languages and Paradigms New Perspectives on Resource Acquisition, Grammar Engineering and Application ACL Workshop on Deep Linguistic Processing Prague, June 28, 2007 Anette Frank Computational Linguistics Department University of Heidelberg
Deep Linguistic Processing Decades of Research and Development Formalisms Algorithms Processing Platforms Grammar Engineering Linguistic Research (Application)
Deep Linguistic Processing Tremendous developments Open ends, but a solid base to work on Scalable and efficient systems Broad linguistic coverage Reconciling robustness and precision Sometimes different, sometimes similar strategies
Deep Linguistic Processing Where do we stand? Are we ready to cope with applications? Can we join forces? Diversity Commonalities and Differences Common research themes Shared Perspectives Resource Acquisition, Cross-Framework Exchange, Applications Research Questions
Granularity – Ambiguity – Complexity Bermuda triangle of deep linguistic processing Complexity Granularity Ambiguity
Granularity – Ambiguity – Complexity Bermuda triangle of deep linguistic processing Processing Complexity Granularity Linguistic Modelling Ambiguity Representation & Selection
Granularity – Ambiguity – Complexity CFG: Chart Parsing „Packed“ disjunctions (Maxwell & Kaplan, 1989) Multiplied readings Complexity The sheep-sg liked the fish-sg The sheep-sg liked the fish-pl The sheep-pl liked the fish-sg The sheep-pl liked the fish-pl Granularity Ambiguity Disjunction fish: N ( PRED)= fish { ( NUM)= sg | ( NUM)= pl }. sheep: N ( PRED)= sheep { ( NUM)= sg | ( NUM)= pl }.
Granularity – Ambiguity – Complexity CFG: Chart Parsing „Packed“ disjunctions Complexity Packed (factorised) readings The sheep Granularity sg liked the fish pl Ambiguity Disjunction fish: N ( PRED)= fish { ( NUM)= sg | ( NUM)= pl }. sheep: N ( PRED)= sheep { ( NUM)= sg | ( NUM)= pl }. sg pl
Granularity – Ambiguity – Complexity CFG: Chart Parsing „Packed“ disjunctions Complexity Boolean Constraint Solving The sheep p: sg liked the fish q: sg ¬q: pl ¬p: pl (p v ¬p ) (q v ¬q ) Granularity Ambiguity Disjunction fish: N ( PRED)= fish { ( NUM)= sg | ( NUM)= pl }. sheep: N ( PRED)= sheep { ( NUM)= sg | ( NUM)= pl }.
Granularity – Ambiguity – Complexity Ambiguity Granularity person 1 st 2 nd number 3 rd sg pl Unification-based Parsing „Packed“ disjunction (Oepen & Carroll, 2000) AGR NUMBER sg PERSON 1 st | 2 nd NUMBER pl non-3 rdsg-verb disjunctive feature structures
Granularity – Ambiguity – Complexity Ambiguity Granularity person number 1 st-or-3 rd non-3 rd sg pl AGR NUMBER sg PERSON non-3 rd NUMBER pl 3 rd 1 st 2 nd Unification-based Parsing „Packed“ disjunction (Oepen & Carroll, 2000) non-3 rdsg-verb Disjunctive types: Eliminating disjunctions in feature space
Granularity – Ambiguity – Complexity Processing largely solved – start worrying about SELECTION Processing Complexity Granularity Linguistic Modelling Ambiguity Representation & Selection
Granularity – Ambiguity – Complexity Applications – need to get the correct reading • Efficient algorithms for unpacking • Processing (packed) ambiguity Complexity Granularity Ambiguity Across all frameworks Reading selection • statistical • preference-based (OT)
Multilingual Grammar Development Pargram Matrix/Delph-In Principles of grammar engineering and best practice Cross-lingual research: formalising grammar with unified theory Formalism/Platform/Theory development
Grammar Engineering Multilingual Grammar Development Improved algorithms and tools New insights/modeling via new phenomena Cross-linguistic variation and generalisations Linguistic Diversity: Exploring „new“ linguisitic properties
Multilingual Grammar Development Web Search E-Science Grammar Engineering Entailment Recognition QA Email Response ic pl Ap a ti s n o Text Summarization CALL: Computer-assisted Language Learning Machine Translation Linguistic Diversity: Exploring „new“ linguistic properties
Deep Linguistic Processing Where do we stand? Are we ready to cope with applications? Can we join forces? Diversity Commonalities and Differences Novel common research themes Shared Perspectives Resource Acquisition, Cross-Framework Exchange, Applications Research Questions
Diversity Grammar formalisms – A small number of „established“ frameworks Lexicalisation, Unification, Generalisation, Semantics – Computation – Joint efforts and sharing resources MG CG CUG CCG GPSG HPSG FTAG TG/GG TAG LTAG FUG DG LF G PATR
Diversity Early 90 s: „Migration“ Millennium: Cross-framework evaluation Standardised corpora and new evaluation measures Common research agendas Multilingual grammar engineering Statistical modelling Robustness and Lexical acquisition Grammar induction Cross-framework collaboration and interchange Would it be fruitful? How? Or are we too different, after all?
Learning from Diversity Different frameworks – similar tendencies Different frameworks – different foci and problems Grammar architectures Formalisms Processing algorithms and their impact on research issues and directions
Learning from Diversity Views on grammar architecture Integrated – Parallel – Derivation Abstraction techniques Main representation levels Constituents, functions, argument structure, semantics Concentration on special aspects – theory-driven? Breadth of languages Phenomena Techniques for induction, robustness, precision, coverage
Diversity and Foci (to be taken with a grain of salt) CCG HPSG LFG LTAG architecture integrated parallel abstraction/ generalisation lexical types? types templates tree families main focus semantics syn&sem f-structure derivation special type raising word order domains functional uncertainty adjunction breadth of languages ? ? computation supertagging packing supertagging induction statistical selection
Views on Grammar Architecture Integrated vs. parallel architectures HPSG and (C)CG LFG and TAGs HPSG/CCG analyses solve syntax and semantics in concord Semantics of prepositions Intersective or scopal modifiers Semantics of specifiers and quantifiers Semantics of tense, aspect, case Needs more competence Makes it harder to treat new languages quickly
Views on Grammar Architecture Integrated vs. parallel architectures HPSG and (C)CG LFG and TAGs LFG/LTAG analyses may concentrate on syntactic aspects in isolation Syntax and/or semantics experts Makes it easier to look at various languages under a comparative perspective Might overlook important syntax-semantics interferences
Views on Grammar Architecture Main Representation Levels Syntax-Semantics Interface Semantic composition tightly joint with constituent structure HPSG and (C)CG Cross-linguistic variation Word order and non-configurational languages Generalisation across languages Function-/dependency-driven semantic composition LFG and L/FTAG
Main Representation Levels Lexicalisation, Argument Structure Lexical types CCG Lexical types HPSG Lexical entry LFG focusing on semantics – arguments - constituents - functions Comparative cross-lingual aspects LTAG tree
Special Phenomena Abstract „correspondence“ of framework-specific constructs and techniques Work out differences in complex areas where all formalisms need to be. . . strained
Special Phenomena Straining theories. . . Coordination Shared constituents: subcategorisation and semantics Agreement: feature indeterminacy and feature resolution Asymmetric coordination (Non-constitutent coordination) Complex Predicate Formation
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics Derivation-oriented approach Special operation on trees Special treatment in parsing
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics Functional approach: distribution of features across set elements
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics Semantics: Resource sharing
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics Set-valued features and coindexation (Cat vs. Index)
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics CCG Coordination: and : = (XX)/X John eats cookies and drinks beer NP SNP/NP NP conj SNP/NP NP SNP S Resource sharing in semantic composition
Special Phenomena: Coordination Shared constituents: subcategorisation and semantics CCG Coordination: and : = (XX)/X Type raising: flexibly building constituents Resource sharing in semantic composition
Tendencies CCG, LTAG Semantics construction/derivation vs. constituency LFG, HPSG Agreement: distribution vs. identity
Coordination: Feature indeterminacy OBJ-acc SUBJ-nom Ich habe gegessen, was übrig bleibt. I have eaten what remains.
Coordination: Feature indeterminacy (↑ CASE) = { nom, acc } acc є (↑ OBJ CASE) nom є (↑ SUBJ CASE) Ich habe gegessen, was übrig bleibt. I have eaten what remains. Feature-based accounts: LFG: Closed sets
Coordination: Feature indeterminacy p-nom-acc Ich habe gegessen, was übrig bleibt. I have eaten what remains. HPSG: Typed-feature-structures: complex types & subsumption
Feature indeterminacy and resolution (Typed)-feature-based accounts closed sets or complex types feature distribution vs. identity & resolution in semantics Feature resolution: closed sets vs. complex types Closest Conjunct agreement: LFG, HPSG accounts Non-feature-centered theories (TAG, CCG) need (similar) special devices
Coordination: Asymmetry Interaction of constituency, word order and subcategorisation VP Coordination Der Jäger [ging in den Wald] und [fing einen Hasen]. The hunter [went into the forest] and [caught a rabbit]. Asymmetric Coordination [In den Wald ging der Jäger] und [fing einen Hasen]. [Into the forest went the hunter] and [caught a rabbit].
Coordination: Asymmetry No distribution over conjuncts violation of completeness
Coordination: Asymmetry Separating constituency and word order constituency word order domains ordering principles vf < cf < mc < vc
Coordination: Asymmetry
Coordination: Asymmetry Asymmetric projection of a grammaticalised discourse function GDF: SUBJ/TOPIC/FOCUS → Domain extension & subordination (cf. modal subordination)
Coordination: Asymmetry CCG: flexibly building constituents (somehow. . ) In den Wald ging der Jäger und fing einen Hasen S NP [ (SNP)]& SNP S Sic! Kathol 1993
Special Phenomena: Coordination CCG, HPSG Decoupling of constituency and word order Explaining word order restrictions LFG Classification of grammatical functions Non-isomorphism of c- and f-structures LTAG?
Special Phenomena: Complex Predicates Argument structure composition in syntax Relation changes: Marie a fait lire le livre à Jean. (Jean lit le livre) Long cliticization: Marie lui a fait lire le livre. Long reflexives: Jean s’est fait écraser par une voiture. Combining two predicates into a single predicate Relation changes (function demotion: <SUBJ, OBJ, SUBJ OBJ 2>) Extended domain: cliticization, reflexivization, passivization
Special Phenomena: Complex Predicates Non-isomorphic projection to functional clause nucleus
Special Phenomena: Complex Predicates Fusion of functional nuclei • Restriction Operator • Splitting projections (m-/f-/a-str) • HPSG: Word Order Domains
Diversity and Theory-driven Insights Diversity is a good thing ! Understanding what works well, why and how. Learning from each other‘s strengths (and challenges) NO WAY we can „integrate“ or „migrate“ theories
Deep Linguistic Processing Where do we stand? Are we ready to cope with application? Can we join forces? Diversity Commonalities and Differences Novel common research themes Shared Perspectives Resource Acquisition, Cross-Framework Exchange, Applications Research Questions
Common Research Themes Stochastic Selection Grammar Induction Evaluation Robustness and Lexical Acquisition
Common Themes: Stochastic Selection Log. Linear Models Linguistically motivated features Studying relation between statistical weights and OTpreference models Creation of training material and gold standards
Common Themes: Grammar Induction Corpus-based techniques Inducing grammars from (enriched) treebanks Similar techniques, with framework-specific crafting Linguistic enrichment / Treebank conversion / Extraction Framework-specific „special“ areas of grammar Coordination, relative clauses, . . . Cross-framework talk
Common Themes: Evaluation Parseval Beyond Parseval: grammatical relations Dependency Bank Parc Dependency Bank TIGER Dependency/LFG/RMRS Bank Semantics: Propbank, RMRS, DRS? Knowledge Bases? (Large-scale) Applications
(Less) Common Theme: Robustness Lexical acquisition Relaxation of constraint-based grammars Learning: in vitro/in vivo (shallow and deep features) Error mining Partial parsing Fragment collection Preference-based ranking (OT) Catching errors
Challenges and New Directions Large-scale Multilingual Grammar Induction from Treebanks: Availability and design differences Cross-lingual reuse of annotation/conversion algorithms? Different grammar frameworks, special constructions Grammar projection across aligned corpora Within-framework Across-framework: common representation
Different types of grammars manually crafted grammars linguistic modelling and generalisations corpus-based grammars with stochastic weights Can these types of grammars „talk to each other“? Aligning corpus-based & manual grammar analyses Stochastic weights and OT Interfacing analyses: pruning or voting Rule compaction and data analysis LTAG tree extraction from corpora
LTAG Induction from Treebanks Large-scale tree family construction Mapping against (converted) treebank trees Extraction of „known linguistic structures“ Isomorphic corpus-based / manually crafted grammars Remnants gaps, under-researched areas, „noisy“ language Dependency graph analysis
Cross-lingual Grammar Development Grammar Matrix Explorations into the space of languages Space of typological properties & implications LX = arb. lang Across Theories? Language-specific properties word order, extraction, binding HPSG_LX CCG_LX Framework-specific constructs HPSG CCG LFG LTAG DG LFG_LX LTAG_LX DG_LX
Semantics Structural sentential semantics Quantifier Scope, Plurals, Modification – Tense and Aspect Semantic Formalisms Underspecification – Packing Automatically inducing Syn. Sem-Interfaces? Still far from „real“ and cross-lingual semantics Syntactified Semantics Lexical semantics, compounds, multiwords, idioms & constructions Mapping to ontologies and KR Discourse- and dialogue semantics
Cross-framework interchange Lexicon and semantics open-class lexica common abstract lexical types Layered grammar architectures linking lexical semantics to shared ontologies Interfacing with discourse and dialogue models (anaphora, discourse relations) based on common/comparable semantic representations Many equivalence results: CLLS, RMRS, UDRT Glue: LFG/LTAG/HPSG (R)MRS, (U)DRT, -Calculus
Applications DNLP Precise natural language analysis in parsing and generation Satisfying high user expectations Applications are gaining depths (again) (S)MT, Dialogue, E-Science, QA Combined with statistical methods
Parsing & Generation Flexibility DNLP for Applications Fine-Grainedness Precision Reliability Expressiveness Naturalness Understanding
- New evidence and perspectives on mergers
- New perspectives on marketing in the service economy
- Paradigms and principles 7 habits
- Paradigm syntagm
- Distributed systems: principles and paradigms
- Distributed systems principles and paradigms
- Php paradigms
- Paradigms and principles
- Distributed systems principles and paradigms
- Distributed systems principles and paradigms
- Resource leveling is the approach to even out the peaks of
- Perbedaan resource loading dan resource levelling
- New zealand national sport
- Aotearoa the land of the long white cloud
- Peter hall policy paradigms
- Paradigms of cognitive psychology
- Paradigm in hci
- Message ordering paradigms
- Distributed paradigm
- 3 paradigms
- The difference between positivism and interpretivism
- Paradigms of others examples
- Binding in programming paradigms
- Ktu programming paradigms notes
- Development support communication
- Designer now multimedia authoring
- Evaluation paradigms
- Current paradigms in psychopathology
- R programming language paradigms
- Development paradigms
- Structural functionalism examples
- Professional nursing practice 7th edition
- Historical and contemporary perspectives in midwifery
- Child development chapter 4
- Ib tok essay titles 2022
- Anthropological perspective example
- Unit 10 sociological perspectives health and social care
- Perspectives and methodology of economics
- The social and ethical perspectives of entrepreneurship
- Data integration problems approaches and perspectives
- Personological and life story perspectives
- Vanier institute of the family definition
- Writers viewpoints and perspectives
- Professional nursing practice concepts and perspectives
- Paper 2 writers’ viewpoints and perspectives
- New mexico brain injury resource center
- Perspectives on appeasement interactive notebook
- Strategic point of view
- Interactionist perspectives
- Three theoretical perspectives in sociology
- Theoretical perspectives on the family
- Four theoretical perspectives
- Kw 6 personal perspectives
- Psychological perspectives
- Andrea yates psychology worksheet
- A graduation poem for two
- Experientalist
- Major perspective in sociology
- Evolutionary perspective of psychology
- Perspectives in health information management
- Global perspective research topics
- Contemporary point of view
- Psychology perspectives
- 5 psychological perspectives
- Variety of perspectives
- Four main perspectives in multimedia authoring tools