Basic Parsing Algorithms Earley Parser and Left Corner

Basic Parsing Algorithms: Earley Parser and Left Corner Parsing Alexandr Chernov Recent Advances in Parsing Technology

Chomsky hierarchy ● ● Type-0 grammars (unrestricted grammars) include all formal grammars Type-1 grammars (context-sensitive grammars) generate the context-sensitive languages Type-2 grammars (context-free grammars) generate the context-free languages Type-3 grammars (regular grammars) generate the regular languages 2

Context-free Grammar ● ● A context-free grammar (for short, CFG) is a quadruple G = (V, Σ, P, S), where – V is a finite set of symbols called the vocabulary (or set of grammar symbols); – Σ ⊆ V is the set of terminal symbols (for short, terminals); – S ∈ (V − Σ) is a designated symbol called the start symbol; – P ⊆ (V − Σ) × V∗ is a finite set of productions (or rewrite rules, or rules). The set N = V −Σ is called the set of nonterminal symbols (for short, nonterminals). Thus, P ⊆ N × V∗, and every production �A , α� is also denoted. Aas →α 3

Rewrite Rules S → NP VP NP → Det N Det → the NP → the N. . . 4

Formal Grammar ● Terminals – ● Letters, numbers, words (cannot be broken down into "smaller" units) Nonterminals – Syntactic variable (category), formula, arithmetic expression 5

Parsers ➢ Parsing algorithms for context-free grammar play an important role in the implementation of: ➢ ➢ compilers and interpreters for programming languages programs which "understand" or translate natural languages 6

Two common types of parsers ➢ ➢ The main task of parsing is to connect the root node S with the tree leaves, the input Top-down parsers: starts constructing the parse tree from the root and move down towards the leaves. Easy to implement, but work with restricted grammars. Examples: ➢ Predictive parsers (e. g. , LL(k)) Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. Examples: ➢ Shift-reduce parser (or LR(k) parsers) Both are general techniques that can be made to work 7 for all languages (but not all grammars!).

Basic Parsing Algorithms ➢ Earley parser ➢ Chart parser ➢ CKY (Cocke-Younger-Kasami) ➢ Head Driven / Left Corner Parsing 8

Earley Parser ➢ ➢ ➢ Can parse all context-free languages Complexity O(n³), where n is the length of the parsed string, O(n²) for unambiguous grammars Top-down dynamic programming algorithm http: //jayearley. com/ 9

Special Symbols ➢ ┤ - right terminator ➢ . (dot) – position between terminals/nonterminals E→. E+T E→ E. +T Φ – complete production ➢ 10

Earley Parser's Steps ➢ ➢ ➢ Predictor (applicable to a state when there is a nonterminal to the right of the dot) Scanner (applicable if there is a terminal to the right of the dot) Completer (applicable to a state if its dot is at the end of its production) 11

Earley Parser Algorithm ● Grammar AE root: E→T | E+T T→P | T*P P→a S 0 (x 1=a) Φ→. E ┤ E→. E+T E→. T T→. T*P T→. P P→. a input string = a+a*a S 1 (x 2=+) P→ a. T→ P. E→ T. T→ T. *P Φ→ E. ┤ E→ E. +T S 2 (x 3=a) E→ E+. T T→. T*P T→. P P→. a S 3 (x 4=*) P→ a. T→ P. E→ E+T. T→ T. *P S 4 (x 5=a) T→ T*. P P→. a S 5 (x 6= ┤) P→ a. T→ T*P. E→ E+T. T→ T. *P Φ→ E. ┤ E→ E. +T S 6 Φ→ E ┤. 12

Left-Corner Parsing ● ● For some grammars top-down prediction can fail to terminate, bottom-up parser is needed Going Wrong with Top-down Parsing – Input string: John died S → NP VP NP → Det N NP → PN VP → IV Det → the N → robber PN → John IV → died 13

Left-Corner Parsing ● Going Wrong with Bottom-up Parsing – Input string: The plant died S → NP VP NP → Det N VP → IV VP → TV NP TV → plant IV → died Det → the N → plant 14

Left-Corner Parsing ● The key idea of left-corner parsing is to combine top-down and bottom-up processing – Left corner of a rule S → NP VP VP → IV PN → John 15

Left-Corner Parsing ● How does it work? S NP VP PN IV died S → NP VP NP → Det N NP → PN VP → IV Det → the N → robber PN → John IV → died 16

Head-Corner Parsing ● ● ● Head-Corner Parser starts by locating a potential head of the phrase and then proceeds by parsing the daughters to the left and the right of the head Head-Corner Parser is a generalization of Left. Corner Parser Left-Corner Parser is 10% faster 17

Head-Corner Parsing ● The daughters left of the head are parsed from right to left (starting from the head), the daughters right of the head are parsed from left to right (starting from the head) 18

19

Head-Corner Parsing ● Input string: – Time flies like an arrow 20

Summary ● ● Bottom-up parsing is used for analyzing unknown data relationships in attempt to identify the most fundamental units first, and then to infer higher-order structures from them Top-down parsing is employed for analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis 21

Possible ways of using ● ● Chart parsers can be used for parsing computer languages. Earley Parsers in particular have been used in compilers where their ability to parse using arbitrary CFG eases the task of writing the grammar for a particular language. Left-Corner Parser can be used for processing of natural languages as long as it recognizes ambiguity 22

Thank you for attention Questions? 23

Sources ● ● ● Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2): 94– 102, 1970 Gertjan van Noord. An efficient implementation of the head-corner parser. Computational Linguistics, 23(3): 425– 456, 1997 http: //cs. union. edu/~striegnk/courses/nlp-withprolog/html/node 53. html 24
- Slides: 24