Syntax Analysis source program lexical analyzer tokens syntax

  • Slides: 51
Download presentation
Syntax Analysis source program lexical analyzer tokens syntax analyzer parse tree semantic analyzer parser

Syntax Analysis source program lexical analyzer tokens syntax analyzer parse tree semantic analyzer parser tree by Neng-Fa Zhou

The Role of the Parser 4 Construct a parse tree 4 Report and recover

The Role of the Parser 4 Construct a parse tree 4 Report and recover from errors 4 Collect information into symbol tables by Neng-Fa Zhou

Context-free Grammars G=(S , N, P, S) – – S is a finite set

Context-free Grammars G=(S , N, P, S) – – S is a finite set of terminals N is a finite set of non-terminals P is a finite subset of production rules S is the start symbol by Neng-Fa Zhou

CFG: Examples 4 Arithmetic expressions E : : = T | E + T

CFG: Examples 4 Arithmetic expressions E : : = T | E + T | E - T T : : = F | T * F |T / F F : : = id | (E) 4 Statements If. Statement : : = if E then Statement else Statement by Neng-Fa Zhou

CFG vs. Regular Expressions 4 CFG is more expressive than RE – Every language

CFG vs. Regular Expressions 4 CFG is more expressive than RE – Every language that can be described by regular expressions can also be described by a CFG 4 Example languages that are CFG but not RE – if-then-else statement, {anbn | n>=1} 4 Non-CFG – L 1={wcw | w is in (a|b)*} Neng-Fa Zhou – L 2={anbmcndm by| n>=1 and m>=1}

Derivations a. Ab agb if a * a a * b and b A

Derivations a. Ab agb if a * a a * b and b A : : = g g then a * g a is a sentential form S * a a is a sentence if it contains only terminal symbols by Neng-Fa Zhou

Derivations 4 leftmost derivation a. Ab agb if a is a string of terminals

Derivations 4 leftmost derivation a. Ab agb if a is a string of terminals 4 Rightmost derivation a. Ab agb if b is a string of terminals by Neng-Fa Zhou

Parse Trees 4 A parse tree is any tree in which – The root

Parse Trees 4 A parse tree is any tree in which – The root is labeled with S – Each leaf is labeled with a token a or e – Each interior node is labeled by a nonterminal – If an interior node is labeled A and has children labeled X 1, . . Xn, then A : : = X 1. . . Xn is a production. by Neng-Fa Zhou

Parse Trees and Derivations E : : = E + E | E *

Parse Trees and Derivations E : : = E + E | E * E | E - E | ( E ) | id by Neng-Fa Zhou

Ambiguity 4 A grammar that produces more than one parse tree for some sentence

Ambiguity 4 A grammar that produces more than one parse tree for some sentence is said to be ambiguous. by Neng-Fa Zhou

Eliminating Ambiguity 4 Rewrite productions to take the precedence of operators into account stmt

Eliminating Ambiguity 4 Rewrite productions to take the precedence of operators into account stmt : : = matched_stmt | unmatched_stmt : : = if E then matched_stmt else matched_stmt | other unmatched_stmt : : = if E then stmt | if E then matched_stmt else unmatched_stmt by Neng-Fa Zhou

Eliminating Left-Recursion 4 Direct left-recursion A : : = Aa | b A :

Eliminating Left-Recursion 4 Direct left-recursion A : : = Aa | b A : : = b. A' A' : : = a. A' | e A : : = Aa 1 |. . . |Aam|b 1|. . . |bn A : : = b 1 A' |. . . |bn. A' A' : : = a 1 A' |. . . | an. A' | e by Neng-Fa Zhou

Eliminating Indirect Left. Recursion 4 Indirect left-recursion 4 Algorithm S : : = Aa

Eliminating Indirect Left. Recursion 4 Indirect left-recursion 4 Algorithm S : : = Aa | b A : : = Ac | Sd | e Arrange the nonterminals in some order A 1, . . . , An. for (i in 1. . n) { for (j in 1. . i-1) { replace each production of the form Ai : : = Ajg by the productions Ai : : = d 1 g | d 2 g |. . . | dkg where Aj : : = d 1 | d 2 |. . . | dk } eliminate the immediate left recursion among Ai productions } by Neng-Fa Zhou

Left Factoring A : : = ab 1 |. . . | abn |

Left Factoring A : : = ab 1 |. . . | abn | g A : : = a. A' | g A' : : = b 1 |. . . | bn by Neng-Fa Zhou

Top-Down Parsing 4 Start from the start symbol and build the parse tree top-down

Top-Down Parsing 4 Start from the start symbol and build the parse tree top-down 4 Apply a production to a nonterminal. The right-hand of the production will be the children of the nonterminal 4 Match terminal symbols with the input 4 May require backtracking 4 Some grammars are backtrack-free by Neng-Fa Zhou (predictive)

Construct Parse Trees Top-Down – Start with the tree of one node labeled with

Construct Parse Trees Top-Down – Start with the tree of one node labeled with the start symbol and repeat the following steps until the fringe of the parse tree matches the input string • 1. At a node labeled A, select a production with A on its LHS and for each symbol on its RHS, construct the appropriate child • 2. When a terminal is added to the fringe that doesn't match the input string, backtrack • 3. Find the next node to be expanded by Neng-Fa Zhou – ! Minimize the number of backtracks

Example Left-recursive E : : = T : : = F : : =

Example Left-recursive E : : = T : : = F : : = T |E+T |E-T F |T*F |T/F id | number | (E) Right-recursive E : : = E': : = T E' + T E' | - T E' |e T: : = F T' T' : : = * F T' | / F T' |e F : : = id | number x by- Neng-Fa 2 * y Zhou | (E)

Control Top-Down Parsing 4 Heuristics – Use input string to guide search 4 Backtrack-free

Control Top-Down Parsing 4 Heuristics – Use input string to guide search 4 Backtrack-free search – Lookahead is necessary • Predictive parsing by Neng-Fa Zhou

Predictive Parsing FIRST and FOLLOW 4 FIRST(X) – If X is a terminal •

Predictive Parsing FIRST and FOLLOW 4 FIRST(X) – If X is a terminal • FIRST(X)={X} – If X: : = e • Add e to FIRST(X) – If X: : =Y 1, Y 2, …, Yk • Add FIRST(Yi) to FIRST(X) if Y 1…Yi-1 =>* e • Add e to FIRST(X) if Y 1…Yk =>* e by Neng-Fa Zhou

Predictive Parsing FIRST and FOLLOW 4 FOLLOW(X) – Add $ to FOLLOW(S) – If

Predictive Parsing FIRST and FOLLOW 4 FOLLOW(X) – Add $ to FOLLOW(S) – If A : : = a. Bb • Add everything in FIRST(b) except for e to FOLLOW(B) – If A : : = a. Bb (b=>* e) or A : : = a. B • Add everything in FOLLOW(A) to FOLLOW(B) by Neng-Fa Zhou

Recursive Descent Parsing match(expected_token){ exp_prime(){ if (input_token!=expected_token) switch (input_token){ error(); case PLUS: else match(PLUS);

Recursive Descent Parsing match(expected_token){ exp_prime(){ if (input_token!=expected_token) switch (input_token){ error(); case PLUS: else match(PLUS); input_token = next_token(); term(); } exp_prime(); break; main(){ case MINUS: input_token = next_token(); match(MINUS); exp(); term(); match(EOS); exp_prime(); } break; case R_PAREN, EOS: exp(){ break; switch (input_token) { default: case ID, NUM, L_PAREN: error(); term(); } exp_prime(); } return; default: error(); by Neng-Fa Zhou } }

Top-Down Parsing (Nonrecursive predictive parser) by Neng-Fa Zhou

Top-Down Parsing (Nonrecursive predictive parser) by Neng-Fa Zhou

Top-Down Parsing (Nonrecursive predicative parser) parsing table by Neng-Fa Zhou

Top-Down Parsing (Nonrecursive predicative parser) parsing table by Neng-Fa Zhou

Example by Neng-Fa Zhou

Example by Neng-Fa Zhou

Parsing Table Construction for each production p = (A: : =a) { for each

Parsing Table Construction for each production p = (A: : =a) { for each terminal a in FIRST(a), add p to M[A, a]; if e is in FIRST(a) for each terminal b (including $) in FOLLOW(A) add p to M[A, b]; } by Neng-Fa Zhou

Example E : : = E+T | T T : : = T*F |

Example E : : = E+T | T T : : = T*F | F F : : = (E) | id E : : = TE' E' : : = +TE' | e T : : = FT' T' : : = *FT' | e F : : = (E) | id by Neng-Fa Zhou

LL(1) Grammar A grammar is said to be LL(1) if |M[A, a]|<=1 for each

LL(1) Grammar A grammar is said to be LL(1) if |M[A, a]|<=1 for each nonterminal A and terminal a. 4 Example (non-LL(1) grammar) S : : =i. Et. S | i. Et. Se. S | a E : : = b S : : = i. Et. SS' | a S' : : = e. S | e E : : = b by Neng-Fa Zhou

Bottom-Up Parsing 4 Start from the input sequence of tokens 4 Apply a production

Bottom-Up Parsing 4 Start from the input sequence of tokens 4 Apply a production to the sentential form and rewrite it to a new one 4 Keep track of the current sentential form by Neng-Fa Zhou

Construct Parse Trees Bottom-Up 1. a = the given string of tokens; 2. Repeat

Construct Parse Trees Bottom-Up 1. a = the given string of tokens; 2. Repeat (reduction) 2. 1 Matches the RHS of a production with a substring of a 2. 2 Replace RHS with the LHS of the production until no production rule applies (backtrack) or a becomes the start symbol (success); by Neng-Fa Zhou

Example abbcde S : : = a. ABe A : : = Abc |

Example abbcde S : : = a. ABe A : : = Abc | b B : : = d a. Abcde a. ABe S by Neng-Fa Zhou

Control Bottom-Up Parsing 4 Handle – A substring that matches the right side of

Control Bottom-Up Parsing 4 Handle – A substring that matches the right side of a production – Applying the production to the substring results in a right-sentential form, i. e. , a sentential form occurring in a right-most derivation 4 Example E : : = E+E E : : = E*E E : : = (E) E : : = id id + id * id E + E * id by Neng-Fa Zhou E+E*E

Bottom-Up Parsing Shift-Reduce Parsing push '$' onto the stack; token = next. Token(); repeat

Bottom-Up Parsing Shift-Reduce Parsing push '$' onto the stack; token = next. Token(); repeat if (there is a handle A: : =b on top of the stack){ reduce b to A; /* reduce */ pop b off the stack; push A onto the stack; } else {/* shift */ shift token onto the stack; token = next. Token(); } until (top of stack is S and token is eof) by Neng-Fa Zhou

Example by Neng-Fa Zhou

Example by Neng-Fa Zhou

A Problem in Shift-Reduce Parser The stack has to be scaned to see whether

A Problem in Shift-Reduce Parser The stack has to be scaned to see whether a handle appears on it. Use a state to uniquely identify a part of a handle (viable prefix) so that stack scanning becomes unnecessary by Neng-Fa Zhou

LR Parser by Neng-Fa Zhou

LR Parser by Neng-Fa Zhou

LR Parsing Program by Neng-Fa Zhou

LR Parsing Program by Neng-Fa Zhou

Example (1) E : : = E+T (2) E : : = T (3)

Example (1) E : : = E+T (2) E : : = T (3) T : : = T*F (4) T : : = F (5) F : : = (E) (6) F : : = id id * id + id by Neng-Fa Zhou

LR Grammars 4 LR grammar – A grammar is said to be an LR

LR Grammars 4 LR grammar – A grammar is said to be an LR grammar if we can construct a parsing table for it. 4 LR(k) grammar – lookahead of up to k input symbols 4 SLR(1), and LALR(1) grammars by Neng-Fa Zhou

SLR Parsing Tables 4 LR(0) item – A production with a dot at some

SLR Parsing Tables 4 LR(0) item – A production with a dot at some position of the RHS A : : = • XYZ A : : = X • YZ A : : = XY • Z A : : = XYZ • we are expecting XYZ we have seen XYZ by Neng-Fa Zhou

Closure of a Set of Items I by Neng-Fa Zhou

Closure of a Set of Items I by Neng-Fa Zhou

Closure of a Set of Items I Example E' : : = E E

Closure of a Set of Items I Example E' : : = E E : : = E+T | T T : : = T*F | F F : : = (E) | id closure({E': : = • E}) = ? by Neng-Fa Zhou

The goto Operation goto(I, X) =closure({A : : = a. X • b |

The goto Operation goto(I, X) =closure({A : : = a. X • b | A : : = a • X b is in I}) 4 Example I = {E' : : = E • , E : : = E • + T} goto(I, +) = ? by Neng-Fa Zhou

Canonical LR(0) Collection of Set of Items by Neng-Fa Zhou

Canonical LR(0) Collection of Set of Items by Neng-Fa Zhou

E T F ( by Neng-Fa Zhou

E T F ( by Neng-Fa Zhou

Constructing SLR Parsing Table 1. Construct C={I 0, I 1, …, In}, the collection

Constructing SLR Parsing Table 1. Construct C={I 0, I 1, …, In}, the collection of sets of LR(0) items for G’ (augmented grammar). 2. If [A a ab] is in Ii where a is a terminal and goto(Ij, a)=Ij, the set action[i, a] to “shift j”. 3. If [S’ S ] is in Ii, then set action[i, $] to “accept”. 4. If [A a ] is in Ii, then set action[i, a] to “reduce A a” for all a in FOLLOW(A). by Neng-Fa Zhou

Example (1) E : : = E+T (2) E : : = T (3)

Example (1) E : : = E+T (2) E : : = T (3) T : : = T*F (4) T : : = F (5) F : : = (E) (6) F : : = id id * id + id by Neng-Fa Zhou

Unambiguous Grammars that are not SLR(1) S : : = L = R S

Unambiguous Grammars that are not SLR(1) S : : = L = R S : : = R L : : = * R L : : = id R : : = L by Neng-Fa Zhou

LR(1) Parsing Tables 4 LR(1) item – LR(0) item + one look ahead terminal

LR(1) Parsing Tables 4 LR(1) item – LR(0) item + one look ahead terminal | 4 [A: : =a • , a] – reduce a to A only if the next symbol is a by Neng-Fa Zhou

by Neng-Fa Zhou

by Neng-Fa Zhou

LALR(1) 4 Treat item closures Ii and Ij as one state if Ii and

LALR(1) 4 Treat item closures Ii and Ij as one state if Ii and Ij differs from each other only in look ahead terminals. by Neng-Fa Zhou

Descriptive Power of Different Grammars LR(1) > LALR(1) > SLR(1) by Neng-Fa Zhou

Descriptive Power of Different Grammars LR(1) > LALR(1) > SLR(1) by Neng-Fa Zhou