Probabilistic and Lexicalized Parsing 1192022 COMS 4705 Fall
Probabilistic and Lexicalized Parsing 1/19/2022 COMS 4705 Fall 2004 1
Probabilistic CFGs • The probabilistic model – Assigning probabilities to parse trees • Disambiguate, LM for ASR, faster parsing • Getting the probabilities for the model • Parsing with probabilities – Slight modification to dynamic programming approach – Task: find max probability tree for an input string 1/19/2022 COMS 4705 – Fall 2004 2
Probability Model • Attach probabilities to grammar rules • Expansions for a given non-terminal sum to 1 VP -> V. 55 VP -> V NP. 40 VP -> V NP NP. 05 – Read this as P(Specific rule | LHS) – “What’s the probability that VP will expand to V, given that we have a VP? ” 1/19/2022 COMS 4705 – Fall 2004 3
Probability of a Derivation • A derivation (tree) consists of the set of grammar rules that are in the parse tree • The probability of a tree is just the product of the probabilities of the rules in the derivation • Note the independence assumption – why don’t we use conditional probabilities? 1/19/2022 COMS 4705 – Fall 2004 4
Probability of a Sentence • Probability of a word sequence (sentence) is probability of its tree in unambiguous case • Sum of probabilities of possible trees in ambiguous case 1/19/2022 COMS 4705 – Fall 2004 5
Getting the Probabilities • From an annotated database – E. g. the Penn Treebank – To get the probability for a particular VP rule just count all the times the rule is used and divide by the number of VPs overall – What if you have no treebank (e. g. for a ‘new’ language)? 1/19/2022 COMS 4705 – Fall 2004 6
Assumptions • We have a grammar to parse with • We have a large robust dictionary with parts of speech • We have a parser • Given all that… we can parse probabilistically 1/19/2022 COMS 4705 – Fall 2004 7
Typical Approach • Bottom-up (CYK) dynamic programming approach • Assign probabilities to constituents as they are completed and placed in the table • Use the max probability for each constituent going up the tree 1/19/2022 COMS 4705 – Fall 2004 8
How do we fill in the DP table? • Say we’re talking about a final part of a parse– finding an S, e. g. , of the rule – S->0 NPi. VPj The probability of S is… P(S->NP VP)*P(NP)*P(VP) The acqua part is already known, since we’re doing bottom-up parsing. We don’t need to recalculate the probabilities of constituents lower in the tree. 1/19/2022 COMS 4705 – Fall 2004 9
Using the Maxima • P(NP) is known • But what if there are multiple NPs for the span of text in question (0 to i)? • Take the max (Why? ) • Does not mean that other kinds of constituents for the same span are ignored (i. e. they might be in the solution) 1/19/2022 COMS 4705 – Fall 2004 10
CYK Parsing: John called Mary from Denver • • • S -> NP VP VP -> V NP NP -> NP PP VP -> VP PP PP -> P NP 1/19/2022 • NP -> John, Mary, Denver • V -> called • P -> from COMS 4705 – Fall 2004 11
Example S NP VP PP VP V John 1/19/2022 called NP Mary COMS 4705 – Fall 2004 NP P from Denver 12
Example S NP VP NP NP V John called 1/19/2022 Mary PP from Denver COMS 4705 – Fall 2004 13
Example John 1/19/2022 called Mary from COMS 4705 – Fall 2004 Denver 14
Base Case: A w NP P NP V NP Denver from Mary called John 1/19/2022 COMS 4705 – Fall 2004 15
Recursive Cases: A BC NP P NP X V NP called Denver from Mary John 1/19/2022 COMS 4705 – Fall 2004 16
NP P VP NP X V Mary NP called Denver from John 1/19/2022 COMS 4705 – Fall 2004 17
NP X P VP NP from X V Mary NP called Denver John 1/19/2022 COMS 4705 – Fall 2004 18
PP NP X P Denver VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 19
S NP PP NP X P Denver VP NP from V Mary called John 1/19/2022 COMS 4705 – Fall 2004 20
PP NP Denver X X P S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 21
NP X S VP NP X V Mary NP called PP NP P Denver from John 1/19/2022 COMS 4705 – Fall 2004 22
NP PP NP Denver X X X P S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 23
VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 24
VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 25
NP PP NP X VP 1 VP 2 X X P Denver S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 26
S NP PP NP X VP 1 VP 2 X X P Denver S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 27
S VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 1/19/2022 COMS 4705 – Fall 2004 28
Problems with PCFGs • The probability model we’re using is just based on the rules in the derivation… – Doesn’t use the words in any real way – e. g. PP attachment often depends on the verb, its object, and the preposition (I ate pickles with a fork. I ate pickles with relish. ) – Doesn’t take into account where in the derivation a rule is used – e. g. pronouns more often subjects than objects (She hates Mary hates her. ) 1/19/2022 COMS 4705 – Fall 2004 29
Solution • Add lexical dependencies to the scheme… – Add the predilections of particular words into the probabilities in the derivation – I. e. Condition the rule probabilities on the actual words 1/19/2022 COMS 4705 – Fall 2004 30
Heads • Make use of notion of the head of a phrase, e. g. – The head of an NP is its noun – The head of a VP is its verb – The head of a PP is its preposition • Phrasal heads – Can ‘take the place of’ whole phrases, in some sense – Define most important characteristics of the phrase – Phrases are generally identified by their heads 1/19/2022 COMS 4705 – Fall 2004 31
Example (correct parse) Attribute grammar 1/19/2022 COMS 4705 – Fall 2004 32
Example (wrong) 1/19/2022 COMS 4705 – Fall 2004 33
How? • We started with rule probabilities – VP -> V NP PP P(rule|VP) • E. g. , count of this rule divided by the number of VPs in a treebank • Now we want lexicalized probabilities – VP(dumped)-> V(dumped) NP(sacks)PP(in) – P(r|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) – Not likely to have significant counts in any treebank 1/19/2022 COMS 4705 – Fall 2004 34
Declare Independence • So, exploit independence assumption and collect the statistics you can… • Focus on capturing two things – Verb subcategorization • Particular verbs have affinities for particular VPs – Objects have affinities for their predicates (mostly their mothers and grandmothers) • Some objects fit better with some predicates than others 1/19/2022 COMS 4705 – Fall 2004 35
Verb Subcategorization • Condition particular VP rules on their head… so r: VP -> V NP PP P(r|VP) Becomes P(r | VP ^ dumped) What’s the count? How many times was this rule used with dump, divided by the number of VPs that dump appears in total 1/19/2022 COMS 4705 – Fall 2004 36
Preferences • Subcat captures the affinity between VP heads (verbs) and the VP rules they go with. • What about the affinity between VP heads and the heads of the other daughters of the VP? 1/19/2022 COMS 4705 – Fall 2004 37
Example (correct parse) 1/19/2022 COMS 4705 – Fall 2004 38
Example (wrong) 1/19/2022 COMS 4705 – Fall 2004 39
Preferences • The issue here is the attachment of the PP • So the affinities we care about are the ones between dumped and into vs. sacks and into. – Count the times dumped is the head of a constituent that has a PP daughter with into as its head and normalize (alternatively, P(into|PP, dumped is mother’s head)) – Vs. the situation where sacks is a constituent with into as the head of a PP daughter (or, P(into|PP, sacks is mother’s head)) 1/19/2022 COMS 4705 – Fall 2004 40
Another Example • Consider the VPs – Ate spaghetti with gusto – Ate spaghetti with marinara • The affinity of gusto for eat is much larger than its affinity for spaghetti • On the other hand, the affinity of marinara for spaghetti is much higher than its affinity for ate 1/19/2022 COMS 4705 – Fall 2004 41
But not a Head Probability Relationship • Note the relationship here is more distant and doesn’t involve a headword since gusto and marinara aren’t the heads of the PPs (Hindle & Rooth ’ 91) Vp (ate) Vp(ate) Pp(with) np v Ate spaghetti with gusto 1/19/2022 Np(spag) np Pp(with) v Ate spaghetti with marinara COMS 4705 – Fall 2004 42
Next Time • Midterm – Covers everything assigned and all lectures up through today – Short answers (2 -3 sentences), exercises, longer answers (1 para) – Closed book, calculators allowed but no laptops 1/19/2022 COMS 4705 – Fall 2004 43
- Slides: 43