CSCE 531 Compiler Construction Lecture 6 Grammar Modifications

  • Slides: 38
Download presentation
CSCE 531 Compiler Construction Lecture 6 Grammar Modifications Topics n Grammars for expressions and

CSCE 531 Compiler Construction Lecture 6 Grammar Modifications Topics n Grammars for expressions and if-then-else Formal proofs of L(G) n Top-down parsing n Left factoring Removing left recursion n n Readings: 4. 3 -4. 4 Homework: 4. 1, 4. 2 a, 4. 6 a, 4. 11 a January 30, 2006

Overview Last Time n n n Should have mentioned DFA minimization Grammars, Derivations, Ambiguity

Overview Last Time n n n Should have mentioned DFA minimization Grammars, Derivations, Ambiguity Lec 05 -Grammars: Slides 1 -27 Today’s Lecture n Ambiguity in classic programming language grammars l Expressions l If-Then-Else n Top-Down parsing References n n Sections 4. 3 -4. 4 Parse demos l http: //ag-kastens. uni- paderborn. de/lehre/material/compiler/parsdemo/ n Chomsky Hierarchy – types of grammars and recognizers l http: //en. wikipedia. org/wiki/Chomsky_hierarchy – 2– n Homework: 4. 1, 4. 2 a, 4. 6 a, 4. 11 a CSCE 531 Spring 2018

DFA Minimization Algorithm 3. 6 in text We will not cover this algorithm other

DFA Minimization Algorithm 3. 6 in text We will not cover this algorithm other than this slide. Partition states into F and Q-F (final and non final states) Refine the partitioning as much as possible. Refinement – a string x=x 1 x 2…xt distinguishes between two states Si and Sk if starting in each and following the path determined by x one ends in an accepting state and the other ends in a non-accepting state Si x Sa Accepting x – 3– Sk Sna Non-accepting CSCE 531 Spring 2018

LM Derivation of 5 * X + 3 * Y +17 E E +

LM Derivation of 5 * X + 3 * Y +17 E E + T | E–T | T Parse tree T T * F | T / F | F E F id | num | ( E ) E E+T E+E+T T+E+T T*F+E+T F*F+E+T num*F+E+T num*id+T+T num*id+T*F+T num*id+F*F+T num*id+num*F+T … – 4– CSCE 531 Spring 2018

Notes on rewritten grammar l It is more complex; more nonterminals, more productions. l

Notes on rewritten grammar l It is more complex; more nonterminals, more productions. l It requires more steps in the derivation l But it does eliminate the ambiguity, so we make the right choices in derivations. – 5– CSCE 531 Spring 2018

Ambiguous Grammar 2 If-else Another classic ambiguity problem in programming languages is the IF-ELSE

Ambiguous Grammar 2 If-else Another classic ambiguity problem in programming languages is the IF-ELSE Stmt if Expr then Stmt | if Expr then Stmt else Stmt | other stmts S if E then S | if E then S else S | OS – 6– CSCE 531 Spring 2018

Ambiguity This sentential form has two derivations if Expr 1 then if Expr 2

Ambiguity This sentential form has two derivations if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 – 7– CSCE 531 Spring 2018

Removing the ambiguity To eliminate the ambiguity l We must rewrite the grammar to

Removing the ambiguity To eliminate the ambiguity l We must rewrite the grammar to avoid generating the problem l We must associate each else with the innermost unmatched if S with. Else – 8– CSCE 531 Spring 2018

Removing the IF-ELSE Ambiguity Stmt if Expr then Stmt | if Expr then Stmt

Removing the IF-ELSE Ambiguity Stmt if Expr then Stmt | if Expr then Stmt else Stmt | other stmts Stmt Matched. Stmt | Unmatched. Stmt Matched. Stmt if Expr then Matched. Stmt else Matched. Stmt | Others. Statements Unmatched. Stmt if Expr then Matched. Stmt else | if Expr then Matched. Stmt else Umatched. Stmt – 9– CSCE 531 Spring 2018

Ambiguity if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2

Ambiguity if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 – 10 – CSCE 531 Spring 2018

Ambiguity that is more than Grammar The examples of Ambiguity that we have looked

Ambiguity that is more than Grammar The examples of Ambiguity that we have looked at are solved by tweaking the CFG Overloading can create deeper ambiguity, a = f(17) In some languages, f could be either a function or a subscripted variable Disambiguating this requires semantics not just syntax n n – 11 – Declarations, type information to say what “f” is. Requires an extra-grammatical solution Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar CSCE 531 Spring 2018

Regular versus Context free Languages A regular language is a set of strings that

Regular versus Context free Languages A regular language is a set of strings that can be: a. Recoginzed by a DFA, b. Recognized by an NFA, or (/and) c. Denoted by regular expressions. Example of non-regular languages? A context free language is one that is generated by a context free grammar. S 0 S 1 | ε – 12 – CSCE 531 Spring 2018

Formal verification of L(G) Example 4. 7: l Induction on length of derivation of

Formal verification of L(G) Example 4. 7: l Induction on length of derivation of a sentential forms l Formulate inductive hypothesis in terms of sentential forms l Basis step n=1 l Assume derivations of length n satisfy the Inductive Hypothesis. l Show that derivations of length n+1 also satisfy – 13 – CSCE 531 Spring 2018

Regular Grammars (Linear Grammars) A right-linear grammar is a restricted form of context free

Regular Grammars (Linear Grammars) A right-linear grammar is a restricted form of context free grammar in which the productions have a special form: l N T* N 2 l N T* l Where N and N 2 (possibly the same) are nonterminals and T* is a string of tokens l In these productions if there is a non-terminal on the right hand side then it is the last symbol l Linear grammars (right and left linear) are also called regular grammars. Why? – 14 – CSCE 531 Spring 2018

DFA Right-linear Grammar Consider DFA M = (Q, Σ, δ, q 0, F) n

DFA Right-linear Grammar Consider DFA M = (Q, Σ, δ, q 0, F) n (notice re-ordering! and Q!) Construct a grammar G = (N, T, P, S) where l N = Q i. e. each state corresponds to a non-terminal l T=Σ l For each transition δ(si, a) = sj, we have a production n Si a Sj l And for each state S in F we add a production n S ε Then L(M) = L(G) How would we formally prove this? Thus regular languages are a subset of the Context free languages – 15 – CSCE 531 Spring 2018

Example DFA Regular Grammar Fig 3. 23 p 117 N 0 a N 1

Example DFA Regular Grammar Fig 3. 23 p 117 N 0 a N 1 | b N 0 N 1 a N 1 | b N 2 … N 3 … – 16 – CSCE 531 Spring 2018

Chomsky Hierarchy Noam Chomsky linguist: Formal levels of grammars n Regular grammars, N T*

Chomsky Hierarchy Noam Chomsky linguist: Formal levels of grammars n Regular grammars, N T* N n Context-free grammars, N (N U T)* Context sensitive grammars, αNω αβω n l We can rewrite αNω β, but only in the “context” αNω n Unrestricted grammars, α β with α and β in (N U T)* Recognizers: n n – 17 – DFA (regular) Pushdown automata, DFA augmented with stack Linear bounded Turing machine http: //en. wikipedia. org/wiki/Chomsky_hierarchy CSCE 531 Spring 2018

Non-Context Free Languages Certain languages cannot have a context free grammar that generates them,

Non-Context Free Languages Certain languages cannot have a context free grammar that generates them, they are not context free languages Examples l Σ = { a, b, c}, L = {wcw | w is in Σ*} l {anbncn | n > 0} However they are context sensitive, or are S abc | a. SBc c. B Bc b. B bb Alternative form of Cont. Sensitive productions α β satisfy |α| <= they? |β| l Well, not relevant for this course. l We would eliminate any non-context-free construct from a programming language! (at least for parsing) – 18 – CSCE 531 Spring 2018

Parsing Techniques l Top-down parsers n Start at the root and try to generate

Parsing Techniques l Top-down parsers n Start at the root and try to generate the parse tree n Pick a production and try to match the input If we make a bad choice then backtrack and try another choice Grammars that allow backtrack-free parsing sometimes will exist and are n n l Bottom-up parsers n n – 19 – Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars CSCE 531 Spring 2018

Top-down Parsing Algorithm Add the start symbol as the root of the parse tree

Top-down Parsing Algorithm Add the start symbol as the root of the parse tree While the frontier of the parse tree != input { Pick the “leftmost” non-terminal in the frontier, A Choose an A-production, A β 1, β 2, … βk, and expand the tree (other choices saved on stack) If a token is added to the frontier that does not match the input backtrack and choose another production (if we run out of choices the parse fails. ) } We now will look at modifications to grammars to facilitate top-down parsing. – 20 – CSCE 531 Spring 2018

Reconsider Our Expression Grammar First we number the productions for documentation Prod num Sentential

Reconsider Our Expression Grammar First we number the productions for documentation Prod num Sentential Form 1 E+T How did we choose this one? l E E + T 1 E+E +T How/why? l E E–T 3 T +E+T ? l E T 4 T*F +E+T l T T * F 6 F*F +E+T l T T / F 8 num *F +E+T l T F 7 num* id +E+T ? l F id l F num l F ( E ) Example: 5 * X + 3 * Y +17 Token seq. : num * id + num – 21 – CSCE 531 Spring 2018

How do we choose which production? l It should be guided by trying to

How do we choose which production? l It should be guided by trying to match the input l E. g. , if the next input symbol is the token “if” and we are choosing between n S if Expr then S else S S while Expr do S n What choice is best? Well the choice is obvious! n l But if the next input symbol is the token “if” and we are choosing between n S if Expr then S else S S while Expr do S n What choice is best? Well the choice is obvious! n – 22 – CSCE 531 Spring 2018

How do we choose which production? (continued) l But if the next input symbol

How do we choose which production? (continued) l But if the next input symbol is the token “if” and we are choosing between S if Expr then S else S n S if Expr then S n What choice is best? n Well now the choice is not obvious! n – 23 – CSCE 531 Spring 2018

Other Grammar Modifications to Guide Parser l Left Factoring n Stmt if Expr then

Other Grammar Modifications to Guide Parser l Left Factoring n Stmt if Expr then Stmt else Stmt n If the next tokens are “if” and “id” then we have no basis to choose, in fact we have to look ahead to see the “else” Stmt if Expr then Stmt Rest else Stmt | ε n n | if Expr then Stmt l Left Recursion n A Aα | β n Why recursive? A Aα Aααα … Aαn βαn n n – 24 – What do we do? A βA’ and A’ αA’ | ε A βA’ βααA’ … βαn. A’ βαn CSCE 531 Spring 2018

General Left Factoring Algorithm 4. 2 Input: a grammar G Output: an equivalent left-factored

General Left Factoring Algorithm 4. 2 Input: a grammar G Output: an equivalent left-factored grammar. Method: For each nonterminal A l find the longest prefix α common to two or more A-productions l A αβ 1 | αβ 2 | … | αβm | ξ , where ξ represents the Aproductions that don’t start with the prefix α l Replace with l A αA’ | ξ l A’ β 1 | β 2 | … | βm – 25 – CSCE 531 Spring 2018

Left Factoring A graphical explanation for the same idea 1 A 1 | 2

Left Factoring A graphical explanation for the same idea 1 A 1 | 2 | 3 A 2 3 becomes … A Z Z 1 | 2 | n 1 A Z 2 3 – 26 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Left Factoring Graphically Identifier Factor No basis for choice Identifier [ Expr. List ]

Left Factoring Graphically Identifier Factor No basis for choice Identifier [ Expr. List ] Identifier ( Expr. List ) [ Expr. List ] ( Expr. List ) becomes … Factor Identifier Word determines correct choice – 27 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Eliminating Left Recursion: Expr Grammar General approach for immediate left recursion l Replace A

Eliminating Left Recursion: Expr Grammar General approach for immediate left recursion l Replace A Aα | β l with A βA’ and A’ αA’ | ε So for the expression grammar E E + T | E–T | T We rewrite the E productions as l E T E’ l E’ + T E’ | ε l – 28 – CSCE 531 Spring 2018

Eliminating Left Recursion: Expr Grammar Replace T T * F | F with T

Eliminating Left Recursion: Expr Grammar Replace T T * F | F with T F T’ T’ * F T’ | ε No replacing needed for the F productions, so the grammar becomes: E T E’ E’ + T E’ | - T E’ | ε T F T’ T’ * F T’ | / F T’ | ε F id | num | ( E ) – 29 – CSCE 531 Spring 2018

Eliminating Immediate Left Recursion In general consider all the A productions A A α

Eliminating Immediate Left Recursion In general consider all the A productions A A α 1 | A α 2 | … | A α n | β 1 | β 2 | … | βm Replace them with A β 1 A’ | β 2 A’ | … | βm. A’ A’ α 1 A’ | α 2 A’ | … | αn. A’ | ε But not all left recursion is immediate. Consider S Aa | Bb |c Then S Aa Caa Scaa A Ca | a. A | a A * Aβ C Sc B b. B|b – 30 – CSCE 531 Spring 2018

Eliminating Left Recursion Algorithm 4. 1 Eliminating Left Recursion Input: Grammar with no cycles

Eliminating Left Recursion Algorithm 4. 1 Eliminating Left Recursion Input: Grammar with no cycles or ε-productions Output: Equivalent Grammar with no left recursion Arrange the nonterminals in order A 1, A 2, … Ann for i = 1 to n do for J = 1 to i-1 do replace each production of the form Ai AJξ δ by the productions Ai δ 1ξ | δ 2ξ | … | δkξ where A J δ 1 | δ 2 | … | δk the current Ai-productions end Eliminate immediate left recursion in the Ai-productions end – 31 – CSCE 531 Spring 2018

Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the

Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through Nonterminals in some order 3. Inner loop ensures that a production expanding Ai has no nonterminal AJ in its rhs, for J < i 4. Last step in outer loop converts any direct recursion on Ai to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the ith outer loop iteration For all k < i, no production that expands Ak contains a non-terminal As in its rhs, for s < k – 32 – CSCE 531 Spring 2018

Example Order of symbols: G, E, T G E E E+T E T T

Example Order of symbols: G, E, T G E E E+T E T T E~T T id – 33 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Example Order of symbols: G, E, T 1. Ai = G G E E

Example Order of symbols: G, E, T 1. Ai = G G E E E+T E T T E~T T id – 34 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Example Order of symbols: G, E, T 1. Ai = G 2. Ai =

Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E G E E E+T E T E' E T E' + T E' T E~T E' T id T E~T T id – 35 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Example Order of symbols: G, E, T 1. Ai = G 2. Ai =

Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E 3. Ai = T, As = E G E G E E E+T E T E' E T E' + T E' T E~T E' E' T id T E~T T T E' ~ T T id Go to Algorithm – 36 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Example Order of symbols: G, E, T 1. Ai = G 2. Ai =

Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E 3. Ai = T, As = E 4. Ai = T G E G E E E+T E T E' E T E' + T E' T E~T E' E' E' T id T E~T T T E' ~ T T id T' E' ~ T T' T' – 37 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018

Predictive Parsing Basic idea Given A , the parser should be able to choose

Predictive Parsing Basic idea Given A , the parser should be able to choose between & FIRST sets For some rhs G, define FIRST( ) as the set of tokens that appear as the first symbol in some string that derives from That is, x FIRST( ) iff * x , for some If A and A both appear in the grammar, and FIRST( ) = This would appear to allow the parser to make a correct choice with a lookahead of exactly one symbol ! (if there are no e-productions then it does. ) – 38 – CSCE 531 Spring 2018