Probabilistic Parsing Ling 571 Fei Xia Week 5
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25 -10/27/05
Outline • • Lexicalized CFG (Recap) Hw 5 and Project 2 Parsing evaluation measures: Parse. Val Collin’s parser • TAG • Parsing summary
Lexicalized CFG recap
Important equations
Lexicalized CFG • Lexicalized rules: • Sparse data problem – First generate the head – Then generate the unlexicalized rule
Lexicalized models
An example • he likes her
An example • he likes her
Head-head probability
Head-rule probability
Estimate parameters
Building a statistical tool • Design a model: – Objective function: generative model vs. discriminative model – Decomposition: independence assumption – The types of parameters and parameter size • Training: estimate model parameters – Supervised vs. unsupervised – Smoothing methods • Decoding:
Team Project 1 (Hw 5) • Form a team: program language, schedule, expertise, etc. • Understand the lexicalized model • Design the training algorithm • Work out the decoding (parsing) algorithm: augment CYK algorithm. • Illustrate the algorithms with a real example.
Team Project 2 • Task: parse real data with a real grammar extracted from a treebank. • Parser: PCFG or lexicalized PCFG • Training data: English Penn Treebank Section 02 -21 • Development data: section 00
Team Project 2 (cont) • Hw 6: extract PCFG from the treebank • Hw 7: make sure your parser works given real grammar and real sentences; measure parsing performance • Hw 8: improve parsing results • Hw 10: write a report and give a presentation
Parsing evaluation measures
Evaluation of parsers: Parse. Val • Labeled recall: • Labeled precision: • Labeled F-measure: • Complete match: % of sents where recall and precision are 100% • Average crossing: # of crossing per sent • No crossing: % of sents which have no crossing.
An example Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))) Parser output: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))
Parse. Val measures • Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) • System output: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) • Recall=4/4, Prec=4/5, crossing=0
A different annotation Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope))))) Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))
Parse. Val measures (cont) • Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6) • System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6) • Recall=4/6, Prec=4/6, crossing=1
EVALB • A tool that calculates Parse. Val measures • To run it: evalb –p parameter_file gold_file system_output • A copy is available in my dropbox • You will need it for Team Project 2
Summary of Parsing evaluation measures • Parse. Val is the widely used: F-measure is the most important • The results depend on annotation style • EVALB is a tool that calculates Parse. Val measures • Other measures are used too: e. g. , accuracy of dependency links
History-based models
History-based models • History-based approaches maps (T, S) into a decision sequence • Probability of tree T for sentence S is:
History-based models (cont) • PCFGs can be viewed as a history-based model • There are other history-based models – Magerman’s parser (1995) – Collin’s parsers (1996, 1997, …. ) – Charniak’s parsers (1996, 1997, …. ) – Ratnaparkhi’s parser (1997)
Collins’ models • Model 1: Generative model of (Collins, 1996) • Model 2: Add complement/adjunct distinction • Model 3: Add wh-movement
Model 1 • First generate the head constituent label • Then generate left and right dependents
Model 1(cont)
An example Sentence: Last week Marks bought Brooks.
Model 2 • • Generate a head label H Choose left and right subcat frames Generate left and right arguments Generate left and right modifiers
An example
Model 3 • Add Trace and wh-movement • Given that the LHS of a rule has a gap, there are three ways to pass down the gap – Head: S(+gap) NP VP(+gap) – Left: S(+gap) NP(+gap) VP – Right: SBAR(that)(+gap) WHNP(that) S(+gap)
Parsing results LR LP Model 1 87. 4% 88. 1% Model 2 88. 1% 88. 6% Model 3 88. 1% 88. 6%
Tree Adjoining Grammar (TAG)
TAG • TAG basics: • Extension of LTAG – Lexicalized TAG (LTAG) – Synchronous TAG (STAG) – Multi-component TAG (MCTAG) – ….
TAG basics • A tree-rewriting formalism (Joshi et. al, 1975) • It can generate mildly context-sensitive languages. • The primitive elements of a TAG are elementary trees. • Elementary trees are combined by two operations: substitution and adjoining. • TAG has been used in – parsing, semantics, discourse, etc. – Machine translation, summarization, generation, etc.
Two types of elementary trees Initial tree: Auxiliary tree: S VP VP NP V draft ADVP NP ADV still VP*
Substitution operation
They draft policies
Adjoining operation Y Y*
They still draft policies
Derivation tree Derived tree Elementary trees Derivation tree
Derived tree vs. derivation tree • The mapping is not 1 -to-1. • Finding the best derivation is not the same as finding the best derived tree.
Wh-movement What do they draft ? S NP S i S NP VP V NP N draft N what V do NP S* PN they S V NP i what S NP i do VP V NP PN draft they i
Long-distance wh-movement What does John think they draft ? S NPi S S S NP NP i VP V what NP V does draft i S NP V think S VP S* V does S John S NP VP S* NP VP V they draft NP i
Who did you have dinner with? S S NP NP VP V PN NP have PN who i VP who VP S* VP* have PP P with NP i VP NP V S NP S i NP PP P with NP i
TAG extension • • Lexicalized TAG (LTAG) Synchronized TAG (STAG) Multi-component TAG (MCTAG) ….
STAG • The primitive elements in STAG are elementary tree pairs. • Used for MT
Summary of TAG • • A formalism beyond CFG Primitive elements are trees, not rules Extended domain of locality Two operations: substitution and adjoining • Parsing algorithm: • Statistical parser for TAG • Algorithms for extracting TAG from treebanks.
Parsing summary
Types of parsers • • Phrase structure vs. dependency tree Statistical vs. rule-based Grammar-based or not Supervised vs. unsupervised Our focus: Phrase structure Mainly statistical Mainly Grammar-based: CFG, TAG Supervised
Grammars • Chomsky hierarchy: – Unstricted grammar (type 0) – Context-sensitive grammar – Context-free grammar – Regular grammar Human languages are beyond context-free • Other formalism – HPSG, LFG – TAG – Dependency grammars
Parsing algorithm for CFG • • • Top-down Bottom-up Top-down with bottom-up filter Earley algorithm CYK algorithm – Requiring CFG to be in CNF – Can be augmented to deal with PCFG, lexicalized CFG, etc.
Extensions of CFG • PCFG: find the most likely parse trees • Lexicalized CFG: – use less strong independence assumption – Account for certain types of lexical and structural dependency
Beyond CFG • History-based models – Collins’ parsers • TAG – Tree-writing – Mildly context-sensitive grammar – Many extensions: LTAG, STAG, …
Statistical approach • Modeling – Choose the objective function – Decompose the function: • Common equations: joint, conditional, marginal probabilities • Independency assumptions • Training: – Supervised vs. unsupervised – Smoothing • Decoding – Dynamic programming – Pruning
Evaluation of parsers • • • Accuracy: Parse. Val Robustness Resources needed Efficiency Richness
Other things • Converting into CNF: – CFG – PCFG – Lexicalized CFG • Treebank annotation – Tagset: syntactic labels, POS tag, function tag, empty categories – Format: indentation, brackets
- Slides: 59