CSA 3050 NLP Algorithms Parsing Algorithms 2 Problems
CSA 3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm October 2005 CSA 3180: Parsing Algorithms 2 1
Problems with DFTD Parser • Left Recursion • Ambiguity • Inefficiency October 2005 CSA 3180: Parsing Algorithms 2 2
Left Recursion • A grammar is left recursive if it contains at least one non-terminal A for which A * A and * • Intuitive idea: derivation of that category includes itself along its leftmost branch. NP PP NP and NP NP Det. P Nominal Det. P NP ' s October 2005 CSA 3180: Parsing Algorithms 2 3
Infinite Search October 2005 CSA 3180: Parsing Algorithms 2 4
Dealing with Left Recursion • Reformulate the grammar – A A | as – A A' A' | – Disadvantage: different (and probably unnatural) parse trees. • Use a different parse algorithm October 2005 CSA 3180: Parsing Algorithms 2 5
Ambiguity • Coordination Ambiguity: different scope of conjunction: Black cats and dogs like to play • Attachment Ambiguity: a constituent can be added to the parse tree in different places: I shot an elephant in my pyjamas • VP → VP PP NP → NP PP October 2005 CSA 3180: Parsing Algorithms 2 6
Catalan Numbers The nth Catalan number counts the ways of dissecting a polygon with n+2 sides into triangles by drawing nonintersecting diagonals. No of PPs # parses October 2005 2 2 3 4 5 14 5 6 7 8 132 469 1430 4867 CSA 3180: Parsing Algorithms 2 7
Handling Disambiguation • Statistical disambiguation • Semantic knowledge. October 2005 CSA 3180: Parsing Algorithms 2 8
Repeated Parsing of Subtrees a flight 4 from Indianapolis 3 to Houston 2 on TWA 1 A flight from Indianapolis 3 A flight from Indianapolis to 2 Houston A flight from Indianapolis to 1 Houston on TWA October 2005 CSA 3180: Parsing Algorithms 2 9
Earley Algorithm • Dynamic Programming: solution involves filling in table of solutions to subproblems. • Parallel Top Down Search • Worst case complexity = O(N 3) in length N of sentence. • Table, called a chart, contains N+1 entries ● book 0 October 2005 ● 1 that ● 2 CSA 3180: Parsing Algorithms 2 flight ● 3 10
The Chart • Each table entry contains a list of states • Each state represents all partial parses that have been reached so far at that point in the sentence. • States are represented using dotted rules containing information about – Rule/subtree: which rule has been used – Progress: dot indicates how much of rule's RHS has been recognised. – Position: text segment to which this parse applies October 2005 CSA 3180: Parsing Algorithms 2 11
Examples of Dotted Rules • Initial S Rule S → ● VP, [0, 0] • Partially recognised NP NP → Det ● Nominal, [1, 2] • Fully recognised VP VP → V VP ● , [0, 3] • These states can also be represented graphically October 2005 CSA 3180: Parsing Algorithms 2 12
The Chart October 2005 CSA 3180: Parsing Algorithms 2 13
Earley Algorithm • Main Algorithm: proceeds through each text position, applying one of the three operators below. • Predictor: Creates "initial states" (ie states whose RHS is completely unparsed). • Scanner: checks current input when next category to be recognised is pre-terminal. • Completer: when a state is "complete" (nothing after dot), advance all states to the left that are looking for the associated category. October 2005 CSA 3180: Parsing Algorithms 2 14
Early Algorithm – Main Function October 2005 CSA 3180: Parsing Algorithms 2 15
Early Algorithm – Sub Functions October 2005 CSA 3180: Parsing Algorithms 2 16
October 2005 CSA 3180: Parsing Algorithms 2 17
October 2005 CSA 3180: Parsing Algorithms 2 18
fl October 2005 CSA 3180: Parsing Algorithms 2 19
Retrieving Trees • To turn recogniser into a parser, representation of each state must also include information about completed states that generated its constituents October 2005 CSA 3180: Parsing Algorithms 2 20
October 2005 CSA 3180: Parsing Algorithms 2 21
Chart[3] ↑ Extra Field October 2005 CSA 3180: Parsing Algorithms 2 22
- Slides: 22