Basic Parsing with ContextFree Grammars CS 4705 Analyzing
Basic Parsing with Context-Free Grammars CS 4705
Analyzing Linguistic Units • Morphological parsing: – analyze words into morphemes and affixes – rule-based, FSAs, FSTs • Ngrams for Language Modeling • POS Tagging • Syntactic parsing: – identify constituents and their relationships – to see if a sentence is grammatical – to assign an abstract representation of meaning
Syntactic Parsing • Declarative formalisms like CFGs define the legal strings of a language -- but don’t specify how to recognize or assign structure to them • Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses • Parsing useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…and almost every task in NLP
Parsing as a Form of Search • Searching FSAs – Finding the right path through the automaton – Search space defined by structure of FSA • Searching CFGs – Finding the right parse tree among all possible parse trees – Search space defined by the grammar • Constraints provided by the input sentence and the automaton or grammar
CFG for Fragment of English S NP VP S Aux NP VP S VP NP Det Nom VP V PP -> Prep NP N book | flight | meal | money V book | include | prefer NP Prop. N Nom Nom N Nom PP VP V NP Aux does Prep from | to | on Prop. N Houston | TWA Det that | this | a Top. D Bot. Up E. g. LC’s
Parse Tree for ‘Book that flight’ for Prior CFG S VP NP Nom V Book Det that flight N
Rule Expansion S NP VP S Aux NP VP S VP (1) VP V PP -> Prep NP N book | flight | meal | money NP Det Nom (3) NP Prop. N Nom Nom N (4) Nom PP V book | include | prefer Aux does Prep from | to | on Prop. N Houston | TWA Det that | this | a VP V NP (2) Top. D Bot. Up E. g. LC’s
Top-Down Parser • Builds from the root S node to the leaves • Assuming we build all trees in parallel: – – Find all trees with root S (or all rules w/lhs S) Next expand all constituents in these trees/rules Continue until leaves are pos Candidate trees failing to match pos of input string are rejected (e. g. Book that flight matches only one subtree)
Top-Down Search Space for CFG (expanding only leftmost leaves) S NP S VP S Aux NP S VP S S S VP NP VP Aux NP VP VP VP Det Nom Prop. N V NP V Det Nom N
Bottom-Up Parsing • Parser begins with words of input and builds up trees, applying grammar rules whose rhs match – Book that flight N Det N V Det N Book that flight – ‘Book’ ambiguous (2 pos appear in grammar) – Parse continues until an S root node reached or no further node expansion possible
Two Candidates: One Successful Parse S VP VP NP V Det Book that Nom N V flight Book that S ~ VP NP NP Nom Det N flight
What’s right/wrong with…. • Top-Down parsers – they never explore illegal parses (e. g. which can’t form an S) -- but waste time on trees that can never match the input • Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root) • For both: find a control strategy -- how explore search space efficiently? – Pursuing all parses in parallel or backtrack or …? – Which rule to apply next? – Which node to expand next?
A Possible Top-Down Parsing Strategy • Depth-first search: – Agenda of search states: expand search space incrementally, exploring most recently generated state (tree) each time – When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree) • Which node to expand? – Leftmost or rightmost • Which grammar rule to use? – Order in the grammar? How?
Top-Down, Depth-First, Left-Right Strategy • Initialize agenda with ‘S’ tree and ptr to first word and make this current search state (cur) • Loop until successful parse or empty agenda – Apply all applicable grammar rules to leftmost unexpanded node of cur • If this node is a POS category and matches that of the current input, push this onto agenda • O. w. push new trees onto agenda – Pop new cur from agenda • Does this flight include a meal?
Fig 10. 7 CFG
Left Corners: Top-Down Parsing with Bottom-Up Filtering • We saw: Top-Down, depth-first, L 2 R parsing – Expands non-terminals along the tree’s left edge down to leftmost leaf of tree – Moves on to expand down to next leftmost leaf… – Note: In successful parse, current input word will be first word in derivation of node the parser currently processing – So…. look ahead to left-corner of the tree • B is a left-corner of A if A =*=> B • Build table with left-corners of all non-terminals in grammar and consult before applying rule
Left Corners
Left-Corner Table for CFG
Left Recursion vs. Right Recursion • Depth-first search will never terminate if grammar is left recursive (e. g. NP --> NP PP)
• Solutions: – Rewrite the grammar (automatically? ) to a weakly equivalent one which is not left-recursive e. g. The man {on the hill with the telescope…} NP PP (wanted: Nom plus a sequence of PPs) NP Nom PP NP Nom Det N …becomes… NP Nom NP’ Nom Det N NP’ PP NP’ (wanted: a sequence of PPs) NP’ e • Not so obvious what these rules mean…
– Harder to detect and eliminate non-immediate left recursion – NP --> Nom PP – Nom --> NP – Fix depth of search explicitly – Rule ordering: non-recursive rules first • NP --> Det Nom • NP --> NP PP
The city hall parking lot in town • • • NP NP NP PP NP Det Nom NP Adj Nom NP Nom Nom NP Nom N PP Prep NP N city | hall | lot | town Adj parking Prep to | for | in
Structural ambiguity: • Multiple legal structures – Attachment (e. g. I saw a man on a hill with a telescope) – Coordination (e. g. younger cats and dogs) – NP bracketing (e. g. Spanish language teachers)
NP vs. VP Attachment
• Solution? – Return all possible parses and disambiguate using “other methods”
Summing Up • Parsing is a search problem which may be implemented with many control strategies – Top-Down or Bottom-Up approaches each have problems • Combining the two solves some but not all issues – Left recursion – Syntactic ambiguity • Next time: Making use of statistical information about syntactic constituents – Read Ch 11
- Slides: 26