Parsing Prof Busch LSU 1 Compiler Lexical analyzer
Parsing Prof. Busch - LSU 1
Compiler Lexical analyzer Input String parser Output Program file machine code Prof. Busch - LSU 2
Lexical analyzer: • Recognizes the lexemes of the input program file: Keywords (if, then, else, while, …), Integers, Identifiers (variables), etc • It is built with DFAs (based on theory of regular languages) Prof. Busch - LSU 3
Parser: • Knows the grammar of the programming language to be compiled • Constructs derivation (and derivation tree) for input program file (input string) • Converts derivation to machine code Prof. Busch - LSU 4
The parser finds the derivation of a particular input file Input string 10 + 2 * 5 Example Parser E -> E + E |E*E | INT Prof. Busch - LSU derivation E => E + E * E => 10 + E*E => 10 + 2 * 5 5
derivation E => E + E * E => 10 + E*E => 10 + 2 * 5 derivation tree E b E + E a 10 E * 2 E 5 machine code mult a, 2, 5 add b, 10, a Derivation trees are used to build Machine code Prof. Busch - LSU 6
A simple (exhaustive) parser Prof. Busch - LSU 7
We will build an exhaustive search parser that examines all possible derivations input string Exhaustive Parser grammar Prof. Busch - LSU derivation 8
Example: Find derivation of string Exhaustive Parser derivation Input string ? Prof. Busch - LSU 9
Exhaustive Search Phase 1: Find derivation of All possible derivations of length 1 Prof. Busch - LSU 10
Phase 1: Find derivation of Cannot possibly produce Prof. Busch - LSU 11
Phase 1 In Phase 2, explore the next step of each derivation from Phase 1 Prof. Busch - LSU 12
Phase 2 Phase 1 Find derivation of Prof. Busch - LSU 13
Phase 2 Find derivation of In Phase 3 explore all possible derivations Prof. Busch - LSU 14
Phase 2 Find derivation of A possible derivation of Phase 3 Prof. Busch - LSU 15
Final result of exhaustive search Input string Exhaustive Parser derivation Prof. Busch - LSU 16
Time Complexity Suppose that the grammar does not have productions of the form ( -productions) (unit productions) Prof. Busch - LSU 17
Since the are no -productions For any derivation of a string of terminals for all it holds that Prof. Busch - LSU 18
Since the are no unit productions 1. At most derivation steps are needed to produce a string with at most variables 2. At most derivation steps are needed to convert the variables of to the string of terminals Prof. Busch - LSU 19
Therefore, at most derivation steps are required to produce The exhaustive search requires at most phases Prof. Busch - LSU 20
Suppose the grammar has productions Possible derivation choices to be examined in phase 1: at most Prof. Busch - LSU 21
Choices for phase 2: at most Choices of phase 1 Number of Productions In General Choices for phase i: at most Choices of phase i-1 Prof. Busch - LSU Number of Productions 22
Total exploration choices for string phase 1 phase 2 : phase 2|w| Exponential to the string length Extremely bad!!! Prof. Busch - LSU 23
Faster Parsers Prof. Busch - LSU 24
For general context-free grammars: Next, we give a parsing algorithm that parses a string in time (this time is very close to the worst case optimal since parsing can be used to solve the matrix multiplication problem) Prof. Busch - LSU 25
The CYK Parsing Algorithm Input: • Arbitrary Grammar in Chomsky Normal Form • String Output: Determine if Number of Steps: Can be easily converted to a Parser Prof. Busch - LSU 26
Basic Idea Consider a grammar In Chomsky Normal Form Denote by the set of variables that generate a string if Prof. Busch - LSU 27
Suppose that we have computed Check if : YES NO Prof. Busch - LSU 28
can be computed recursively: prefix suffix Write If and there is production Then Prof. Busch - LSU 29
Examine all prefix-suffix decompositions of Set of Variables that generate Length 1 2 |w|-1 Result: Prof. Busch - LSU 30
At the basis of the recursion we have strings of length 1 symbol Very easy to find Prof. Busch - LSU 31
Remark: The whole algorithm can be implemented with dynamic programming: First compute for smaller substrings and then use this to compute the result for larger substrings of Prof. Busch - LSU 32
Example: • Grammar : • Determine if Prof. Busch - LSU 33
Length Decompose the string to all possible substrings 1 2 3 4 5 Prof. Busch - LSU 34
Prof. Busch - LSU 35
Prof. Busch - LSU 36
prefix suffix There is no production of form Thus, prefix suffix There are two productions of form Thus, Prof. Busch - LSU 37
Prof. Busch - LSU 38
Decomposition 1 prefix suffix There is no production of form There are 2 productions of form Prof. Busch - LSU 39
Decomposition 2 prefix suffix There is no production of form Prof. Busch - LSU 40
Since Prof. Busch - LSU 41
Approximate time complexity: Number of substrings Number of Prefix-suffix decompositions for a string Prof. Busch - LSU 42
- Slides: 42