CSCE 531 Compiler Construction Lecture 6 Grammar Modifications
- Slides: 38
CSCE 531 Compiler Construction Lecture 6 Grammar Modifications Topics n Grammars for expressions and if-then-else Formal proofs of L(G) n Top-down parsing n Left factoring Removing left recursion n n Readings: 4. 3 -4. 4 Homework: 4. 1, 4. 2 a, 4. 6 a, 4. 11 a January 30, 2006
Overview Last Time n n n Should have mentioned DFA minimization Grammars, Derivations, Ambiguity Lec 05 -Grammars: Slides 1 -27 Today’s Lecture n Ambiguity in classic programming language grammars l Expressions l If-Then-Else n Top-Down parsing References n n Sections 4. 3 -4. 4 Parse demos l http: //ag-kastens. uni- paderborn. de/lehre/material/compiler/parsdemo/ n Chomsky Hierarchy – types of grammars and recognizers l http: //en. wikipedia. org/wiki/Chomsky_hierarchy – 2– n Homework: 4. 1, 4. 2 a, 4. 6 a, 4. 11 a CSCE 531 Spring 2018
DFA Minimization Algorithm 3. 6 in text We will not cover this algorithm other than this slide. Partition states into F and Q-F (final and non final states) Refine the partitioning as much as possible. Refinement – a string x=x 1 x 2…xt distinguishes between two states Si and Sk if starting in each and following the path determined by x one ends in an accepting state and the other ends in a non-accepting state Si x Sa Accepting x – 3– Sk Sna Non-accepting CSCE 531 Spring 2018
LM Derivation of 5 * X + 3 * Y +17 E E + T | E–T | T Parse tree T T * F | T / F | F E F id | num | ( E ) E E+T E+E+T T+E+T T*F+E+T F*F+E+T num*F+E+T num*id+T+T num*id+T*F+T num*id+F*F+T num*id+num*F+T … – 4– CSCE 531 Spring 2018
Notes on rewritten grammar l It is more complex; more nonterminals, more productions. l It requires more steps in the derivation l But it does eliminate the ambiguity, so we make the right choices in derivations. – 5– CSCE 531 Spring 2018
Ambiguous Grammar 2 If-else Another classic ambiguity problem in programming languages is the IF-ELSE Stmt if Expr then Stmt | if Expr then Stmt else Stmt | other stmts S if E then S | if E then S else S | OS – 6– CSCE 531 Spring 2018
Ambiguity This sentential form has two derivations if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 – 7– CSCE 531 Spring 2018
Removing the ambiguity To eliminate the ambiguity l We must rewrite the grammar to avoid generating the problem l We must associate each else with the innermost unmatched if S with. Else – 8– CSCE 531 Spring 2018
Removing the IF-ELSE Ambiguity Stmt if Expr then Stmt | if Expr then Stmt else Stmt | other stmts Stmt Matched. Stmt | Unmatched. Stmt Matched. Stmt if Expr then Matched. Stmt else Matched. Stmt | Others. Statements Unmatched. Stmt if Expr then Matched. Stmt else | if Expr then Matched. Stmt else Umatched. Stmt – 9– CSCE 531 Spring 2018
Ambiguity if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 – 10 – CSCE 531 Spring 2018
Ambiguity that is more than Grammar The examples of Ambiguity that we have looked at are solved by tweaking the CFG Overloading can create deeper ambiguity, a = f(17) In some languages, f could be either a function or a subscripted variable Disambiguating this requires semantics not just syntax n n – 11 – Declarations, type information to say what “f” is. Requires an extra-grammatical solution Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar CSCE 531 Spring 2018
Regular versus Context free Languages A regular language is a set of strings that can be: a. Recoginzed by a DFA, b. Recognized by an NFA, or (/and) c. Denoted by regular expressions. Example of non-regular languages? A context free language is one that is generated by a context free grammar. S 0 S 1 | ε – 12 – CSCE 531 Spring 2018
Formal verification of L(G) Example 4. 7: l Induction on length of derivation of a sentential forms l Formulate inductive hypothesis in terms of sentential forms l Basis step n=1 l Assume derivations of length n satisfy the Inductive Hypothesis. l Show that derivations of length n+1 also satisfy – 13 – CSCE 531 Spring 2018
Regular Grammars (Linear Grammars) A right-linear grammar is a restricted form of context free grammar in which the productions have a special form: l N T* N 2 l N T* l Where N and N 2 (possibly the same) are nonterminals and T* is a string of tokens l In these productions if there is a non-terminal on the right hand side then it is the last symbol l Linear grammars (right and left linear) are also called regular grammars. Why? – 14 – CSCE 531 Spring 2018
DFA Right-linear Grammar Consider DFA M = (Q, Σ, δ, q 0, F) n (notice re-ordering! and Q!) Construct a grammar G = (N, T, P, S) where l N = Q i. e. each state corresponds to a non-terminal l T=Σ l For each transition δ(si, a) = sj, we have a production n Si a Sj l And for each state S in F we add a production n S ε Then L(M) = L(G) How would we formally prove this? Thus regular languages are a subset of the Context free languages – 15 – CSCE 531 Spring 2018
Example DFA Regular Grammar Fig 3. 23 p 117 N 0 a N 1 | b N 0 N 1 a N 1 | b N 2 … N 3 … – 16 – CSCE 531 Spring 2018
Chomsky Hierarchy Noam Chomsky linguist: Formal levels of grammars n Regular grammars, N T* N n Context-free grammars, N (N U T)* Context sensitive grammars, αNω αβω n l We can rewrite αNω β, but only in the “context” αNω n Unrestricted grammars, α β with α and β in (N U T)* Recognizers: n n – 17 – DFA (regular) Pushdown automata, DFA augmented with stack Linear bounded Turing machine http: //en. wikipedia. org/wiki/Chomsky_hierarchy CSCE 531 Spring 2018
Non-Context Free Languages Certain languages cannot have a context free grammar that generates them, they are not context free languages Examples l Σ = { a, b, c}, L = {wcw | w is in Σ*} l {anbncn | n > 0} However they are context sensitive, or are S abc | a. SBc c. B Bc b. B bb Alternative form of Cont. Sensitive productions α β satisfy |α| <= they? |β| l Well, not relevant for this course. l We would eliminate any non-context-free construct from a programming language! (at least for parsing) – 18 – CSCE 531 Spring 2018
Parsing Techniques l Top-down parsers n Start at the root and try to generate the parse tree n Pick a production and try to match the input If we make a bad choice then backtrack and try another choice Grammars that allow backtrack-free parsing sometimes will exist and are n n l Bottom-up parsers n n – 19 – Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars CSCE 531 Spring 2018
Top-down Parsing Algorithm Add the start symbol as the root of the parse tree While the frontier of the parse tree != input { Pick the “leftmost” non-terminal in the frontier, A Choose an A-production, A β 1, β 2, … βk, and expand the tree (other choices saved on stack) If a token is added to the frontier that does not match the input backtrack and choose another production (if we run out of choices the parse fails. ) } We now will look at modifications to grammars to facilitate top-down parsing. – 20 – CSCE 531 Spring 2018
Reconsider Our Expression Grammar First we number the productions for documentation Prod num Sentential Form 1 E+T How did we choose this one? l E E + T 1 E+E +T How/why? l E E–T 3 T +E+T ? l E T 4 T*F +E+T l T T * F 6 F*F +E+T l T T / F 8 num *F +E+T l T F 7 num* id +E+T ? l F id l F num l F ( E ) Example: 5 * X + 3 * Y +17 Token seq. : num * id + num – 21 – CSCE 531 Spring 2018
How do we choose which production? l It should be guided by trying to match the input l E. g. , if the next input symbol is the token “if” and we are choosing between n S if Expr then S else S S while Expr do S n What choice is best? Well the choice is obvious! n l But if the next input symbol is the token “if” and we are choosing between n S if Expr then S else S S while Expr do S n What choice is best? Well the choice is obvious! n – 22 – CSCE 531 Spring 2018
How do we choose which production? (continued) l But if the next input symbol is the token “if” and we are choosing between S if Expr then S else S n S if Expr then S n What choice is best? n Well now the choice is not obvious! n – 23 – CSCE 531 Spring 2018
Other Grammar Modifications to Guide Parser l Left Factoring n Stmt if Expr then Stmt else Stmt n If the next tokens are “if” and “id” then we have no basis to choose, in fact we have to look ahead to see the “else” Stmt if Expr then Stmt Rest else Stmt | ε n n | if Expr then Stmt l Left Recursion n A Aα | β n Why recursive? A Aα Aααα … Aαn βαn n n – 24 – What do we do? A βA’ and A’ αA’ | ε A βA’ βααA’ … βαn. A’ βαn CSCE 531 Spring 2018
General Left Factoring Algorithm 4. 2 Input: a grammar G Output: an equivalent left-factored grammar. Method: For each nonterminal A l find the longest prefix α common to two or more A-productions l A αβ 1 | αβ 2 | … | αβm | ξ , where ξ represents the Aproductions that don’t start with the prefix α l Replace with l A αA’ | ξ l A’ β 1 | β 2 | … | βm – 25 – CSCE 531 Spring 2018
Left Factoring A graphical explanation for the same idea 1 A 1 | 2 | 3 A 2 3 becomes … A Z Z 1 | 2 | n 1 A Z 2 3 – 26 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Left Factoring Graphically Identifier Factor No basis for choice Identifier [ Expr. List ] Identifier ( Expr. List ) [ Expr. List ] ( Expr. List ) becomes … Factor Identifier Word determines correct choice – 27 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Eliminating Left Recursion: Expr Grammar General approach for immediate left recursion l Replace A Aα | β l with A βA’ and A’ αA’ | ε So for the expression grammar E E + T | E–T | T We rewrite the E productions as l E T E’ l E’ + T E’ | ε l – 28 – CSCE 531 Spring 2018
Eliminating Left Recursion: Expr Grammar Replace T T * F | F with T F T’ T’ * F T’ | ε No replacing needed for the F productions, so the grammar becomes: E T E’ E’ + T E’ | - T E’ | ε T F T’ T’ * F T’ | / F T’ | ε F id | num | ( E ) – 29 – CSCE 531 Spring 2018
Eliminating Immediate Left Recursion In general consider all the A productions A A α 1 | A α 2 | … | A α n | β 1 | β 2 | … | βm Replace them with A β 1 A’ | β 2 A’ | … | βm. A’ A’ α 1 A’ | α 2 A’ | … | αn. A’ | ε But not all left recursion is immediate. Consider S Aa | Bb |c Then S Aa Caa Scaa A Ca | a. A | a A * Aβ C Sc B b. B|b – 30 – CSCE 531 Spring 2018
Eliminating Left Recursion Algorithm 4. 1 Eliminating Left Recursion Input: Grammar with no cycles or ε-productions Output: Equivalent Grammar with no left recursion Arrange the nonterminals in order A 1, A 2, … Ann for i = 1 to n do for J = 1 to i-1 do replace each production of the form Ai AJξ δ by the productions Ai δ 1ξ | δ 2ξ | … | δkξ where A J δ 1 | δ 2 | … | δk the current Ai-productions end Eliminate immediate left recursion in the Ai-productions end – 31 – CSCE 531 Spring 2018
Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through Nonterminals in some order 3. Inner loop ensures that a production expanding Ai has no nonterminal AJ in its rhs, for J < i 4. Last step in outer loop converts any direct recursion on Ai to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the ith outer loop iteration For all k < i, no production that expands Ak contains a non-terminal As in its rhs, for s < k – 32 – CSCE 531 Spring 2018
Example Order of symbols: G, E, T G E E E+T E T T E~T T id – 33 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Example Order of symbols: G, E, T 1. Ai = G G E E E+T E T T E~T T id – 34 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E G E E E+T E T E' E T E' + T E' T E~T E' T id T E~T T id – 35 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E 3. Ai = T, As = E G E G E E E+T E T E' E T E' + T E' T E~T E' E' T id T E~T T T E' ~ T T id Go to Algorithm – 36 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Example Order of symbols: G, E, T 1. Ai = G 2. Ai = E 3. Ai = T, As = E 4. Ai = T G E G E E E+T E T E' E T E' + T E' T E~T E' E' E' T id T E~T T T E' ~ T T id T' E' ~ T T' T' – 37 – From Engineering a Compiler by Keith D. Cooper and Linda Torczon CSCE 531 Spring 2018
Predictive Parsing Basic idea Given A , the parser should be able to choose between & FIRST sets For some rhs G, define FIRST( ) as the set of tokens that appear as the first symbol in some string that derives from That is, x FIRST( ) iff * x , for some If A and A both appear in the grammar, and FIRST( ) = This would appear to allow the parser to make a correct choice with a lookahead of exactly one symbol ! (if there are no e-productions then it does. ) – 38 – CSCE 531 Spring 2018
- Yacc tutorial
- Cross compiler in compiler design
- Cse 531
- Onap network slicing
- Cpsc 531
- Cpsc 531
- Concentration camps vs internment camps venn diagram
- Cpsc 531
- Amg 531
- 531 area code
- Cpsc 531
- Compiler lecture
- Explain compiler construction tools
- Machine independent code optimization
- Compiler construction principles and practice
- What is front end compiler
- Attributes of tokens in compiler design
- Compiler construction principles and practice pdf
- Preprocessor in compiler construction
- Thompson construction in compiler design
- Type checker in compiler design
- Compiler vs interpreter advantages and disadvantages
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Accommodation vs modification
- Accommodations and modifications
- Modification of roots stems and leaves
- Difference in accommodations and modifications
- Styl direct et indirect
- Blackhawk modifications
- Accommodation vs modification
- Kennedy classification class 5
- Tongue guard appliance
- 4 p's of software project management
- Right linear grammar
- Left linear grammar to right linear grammar
- Characteristics of traditional grammar
- Types of grammar
- Undergeneralization in language
- Functional grammar