Cse 321 Programming Languages and Compilers Lecture 7
- Slides: 43
Cse 321, Programming Languages and Compilers Lecture #7, Feb. 5, 2007 • Grammars • Top down parsing • Transition Diagrams • Ambiguity • Left recursion • Refactoring by adding levels • Recursive descent parsing • Predictive Parsers • First and Follow • Parsing tables 10/23/2021 1
Cse 321, Programming Languages and Compilers Assignments • Reading – Chapter 3 – Page 73 -106 – Quiz on Wednesday • Mid Term exam – Monday. Feb 19, 2007. Time: in class. • Next Homework – – On the web page, and the last page of this handout Due date to be negotiated. Recall Project #1 is due next Wednesday. I promised no homework Today or Wednesday. • Project 1 – Recall Project #1, the scanner is due Feb. 14 th 10/23/2021 2
Cse 321, Programming Languages and Compilers Grammars 1 • Grammar – A set of tokens (terminals): T – A set of non-terminals: N – A set of productions { lhs : : = rhs , . . . } » lhs in N » rhs is a sequence of N U T – A Start symbol: S (in N) • Shorthands – Provide only the productions » All lhs symbols comprise N » All other sysmbols comprise T » lhs of first production is S 10/23/2021 3
Cse 321, Programming Languages and Compilers Grammars 2 • Rewriting rules – Pick a non-terminal to replace. Which order? » left-to-right » right-to-left • Derivations (a list if productions used to derive a string from a grammar). • A sentence of G: L(G) – Start with S – only terminal symbols – all strings derivable from G in 1 or more steps 10/23/2021 4
Cse 321, Programming Languages and Compilers Grammars 3 • Parse trees. – Graphical representations of derivations. – The leaves of a parse tree for fully filled out tree is a sentence. • Context Free Grammars – how do they compare to regular expressions? – Nesting (matched ()’s) requires CFG, ’s RE's are not powerful enough. • Ambiguity – A string has two derivations – E : : = E + E | E*E | » x+x*y id • Left-recursion – E : : = E + E | E * E | id – Makes certain top-down parsers loop 10/23/2021 5
Cse 321, Programming Languages and Compilers Top Down Parsing • Begin with the start symbol and try and derive the parse tree from the root. • Consider the grammar Exp : : = id | Exp + Exp | Exp * Exp | ( Exp ) derives x, x+x+x, x*y 10/23/2021 x+y*z . . . 6
Cse 321, Programming Languages and Compilers Example Parse (top down) – stack input Exp Exp / | Exp + Exp | id(x) 10/23/2021 x+y*z y*z 7
Cse 321, Programming Languages and Compilers Top Down Parse (cont) Exp y*z / | Exp + Exp | /| id(x) Exp * Exp / | Exp + Exp | / | id(x) Exp * Exp | id(y) 10/23/2021 z 8
Cse 321, Programming Languages and Compilers Top Down Parse (cont. ) Exp / | Exp + Exp | / | id(x) Exp * Exp | | id(y) id(z) 10/23/2021 9
Cse 321, Programming Languages and Compilers Transition Diagrams • Transition diagrams for predictive parsers – One diagram for each Non-terminal – Shouldn't have left recursion ( left factored ) – Diagrams can (recursively) mention each other E -> T E' E' -> + T E' | <empty> T -> F T' T' -> * F T' | <empty> F -> ( E ) | id T E E’ E’ F T 10/23/2021 T E’ * F T’ T’ T’ F + E ( id ) 10
Cse 321, Programming Languages and Compilers Problems with Top Down Parsing • Backtracking may be necessary: – S : : = ee | b. Ac | b. Ae – A : : = d | c. A try on string “bcde” • Infinite loops possible from (indirect) left recursive grammars. – E : : = E + id | id • Ambiguity is a problem when a unique parse is not possible. • These often require extensive grammar restructuring (grammar debugging). 10/23/2021 11
Cse 321, Programming Languages and Compilers Grammar Transformations • Removing ambiguity. • Removing Left Recursion • Backtracking and Factoring 10/23/2021 12
Cse 321, Programming Languages and Compilers Removing ambiguity. • Adding levels to a grammar – E : = E + E | E * E | id | ( E ) – E : : = E + T | T – T : : = T * F | F – F : : = id | ( E ) • The dangling else grammar. – st : : = if exp then st else st | if exp then st | id : = exp – Note that the following has two possible parses if x=2 then if x=3 then y: =2 else y : = 4 10/23/2021 if x=2 then (if x=3 then y: =2 ) else y : = 4 if x=2 then (if x=3 then y: =2 else y : = 4) 13
Cse 321, Programming Languages and Compilers Adding levels (cont) • Original grammar st : : = if exp then st else st | if exp then st | id : = exp • Assume that every st between then and else must be matched, i. e. it must have both a then and an else. • New Grammar with addtional levels st match -> -> | unmatch -> | 10/23/2021 match | unmatch if exp then match else match id : = exp if exp then st if exp then match else unmatch 14
Cse 321, Programming Languages and Compilers Removing Left Recursion • Top down recursive descent parsers require non-left recursive grammars • Technique: Left Factoring » E : = E + E » E : : = id E’ » E’ : : = + E | E * E | E’ | id * E E’ | ε • General Technique to remove direct left recursion – Every Non terminal with productions (a | b) (n | m) * » T : : = T n | T m (left recursive productions) | a | b (non-left recursive productions) T – Make a new non-terminal T’ “a” and “b” because T n – Remove the old productions they are the rhs of the non-left recurive productions. – Add the following productions T n » T : : = a T’ | b T’ n T » T’ : : = n T’ | m T’ | ε a 10/23/2021 15
Cse 321, Programming Languages and Compilers Backtracking and Factoring • Backtracking may be necessary: – S : : = ee – A : : = d | | b. Ac c. A | b. Ae • try on string “bcde” S -> b. Ac (by S -> b. Ac) -> bc. Ae (by A -> c. A) -> bcde (by A -> d) • But this is the wrong answer! • Factoring a grammar – Factor common prefixes and make the different postfixes into a new non-terminal – S : : = ee | b. AQ – Q : : = c | e – A : : = d | c. A 10/23/2021 16
Cse 321, Programming Languages and Compilers Recursive Descent Parsing • One procedure (function) for each non-terminal. • Procedures are often (mutually) recursive. • They can return a bool (the input matches that nonterminal) or more often they return a data-structure (the input builds this parse tree) • Need to control the lexical analyzer (requiring it to “back-up” on occasion) 10/23/2021 17
Cse 321, Programming Languages and Compilers R. D. parser for R. E. ’s • Build an instance of the datatype: datatype Re = empty of int | simple of string * int | concat of Re * Re | closure of Re | union of Re * Re; • The lexical Analyzer datatype token = Done | Bar | Star | Hash | Leftparen | Rightparen | Single of string; 10/23/2021 18
Cse 321, Programming Languages and Compilers Ambiguous grammar 1. RE 2. RE 3. RE 4. RE 5. RE 6. RE -> -> -> RE bar RE RE * id # ( RE ) • Transform grammar by layering • Tightest binding operators (*) at the lowest layer • Layers are Alt, then Concat, then Closure, then Simple. Alt -> Alt bar Concat Alt -> Concat Closure Concat -> Closure -> simple star Closure -> simple -> id 10/23/2021 | (Alt ) | # 19
Cse 321, Programming Languages and Compilers Left Recursive Grammar Alt -> Alt bar Concat Alt -> Concat Closure Concat -> Closure -> simple star Closure -> simple -> id | (Alt ) | # 1. For every Non terminal with productions T : : = T n | T m (left recursive productions) | a | b (non-left recursive productions) 1. Make a new non-terminal T’ 2. Remove the old productions 3. Add the following productions T : : = a T’ | b T’ T’ : : = n T’ | m T’ | ε 10/23/2021 Alt more. Alt : : = | Concat : : = more. Concat : : = | Closure : : = | Simple : : = | | Concat more. Alt Bar Concat more. Alt ε Closure more. Concat ε Simple Star Simple Id ( Alt ) # 20
Cse 321, Programming Languages and Compilers Lookahead and the Lexer val lookahead = ref Done; val input = ref [Done]; val location = ref 0; fun nextloc () = (location : = (!location) + 1; !location) fun init s = (location : = 0; input : = lexan s; lookahead : = hd(!input); input : = tl(!input)) • Lex’s the whole input • Stores it in the variable input • Keeps track of next token (so that backup is possible) 10/23/2021 21
Cse 321, Programming Languages and Compilers Matching a single Terminal fun match t = if (!lookahead) = t then (if null(!input) then lookahead : = Done else (lookahead : = hd(!input) ; input : = tl(!input))) else error ("looking for: "^(tok 2 str t)^ " found: "^(tok 2 str (!lookahead))); • • • Match one token Advance the input Handle the end of file correctly Report errors in a sensible way This function will be called a lot!! 10/23/2021 22
Cse 321, Programming Languages and Compilers more. Alt and more. Concat When we removed left recursion, we added nonterminals that might recognize ε. i. e. more. Alt and more. Concat Observe the shape of parse trees using those productions. more. Concat : : = Closure more. Concat | <empty> more. Concat closure . . . 10/23/2021 They always end in ε at the far right of the tree more. Concat closure . . . more. Concat ε 23
Cse 321, Programming Languages and Compilers 1 Function for each NT • A simple way to write a parser for a language is the technique called recursive descent parsing. • Each Non-terminal is represented by a function that returns a syntax item corresponding to the element that production parses. • If it can’t parse that element it raises an error. • When a production might return the empty string we need to handle that by using the SML ‘a option Alt : : = Concat more. Alt datatype. Alt : unit -> Re more. Alt : unit -> Re option Concat : unit -> Re Closure : unit -> Re more. Concat : unit -> Re option Simple : unit -> Re 10/23/2021 more. Alt : : = | Concat : : = more. Concat : : = | Closure : : = | Simple : : = | | Bar Alt more. Alt ε Closure more. Concat ε Simple Star Simple Id ( Alt ) # 24
Cse 321, Programming Languages and Compilers Alt : : = Concat more. Alt fun Alt () = let val x = Concat () val y = more. Alt () in case y of NONE => x | SOME z => union(x, z) end 10/23/2021 25
Cse 321, Programming Languages and Compilers more. Alt : : = Bar Alt more. Alt | ε and more. Alt () = case (!lookahead) of Bar => let val _ = match Bar val x = Alt() “and” separates val y = more. Alt () mutually recursive in case y of functions NONE => SOME x | (SOME z) => SOME(union(x, z)) end | _ => NONE 10/23/2021 26
Cse 321, Programming Languages and Compilers Concat : : = Closure more. Concat and Concat () = let val x = Closure () val y = more. Concat () in case y of NONE => x | SOME z => concat(x, z) end 10/23/2021 27
Cse 321, Programming Languages and Compilers more. Concat : : = Closure more. Concat | ε and more. Concat () = case (!lookahead) of (Single _ | Leftparen | Hash) => let val x = Closure() val y = more. Concat() in case y of NONE => SOME x | SOME z => SOME(concat(x, z)) end | _ => NONE 10/23/2021 28
Cse 321, Programming Languages and Compilers Closure : : = Simple Star | Simple A simple form of leftfactoring is used here and Closure () = let val x = Simple() in case !lookahead of Star => (match Star; closure x) | other => x end 10/23/2021 29
Cse 321, Programming Languages and Compilers Simple : : = Id | ( Alt ) | # and Simple () = case !lookahead of Single c => let val _ = match (Single c) val n = nextloc() in simple(c, n) end | Leftparen => let val _ = match Leftparen val x = Alt(); val _ = match Rightparen in x end | Hash => let val _ = match Hash val n = nextloc() in empty n end | x => error ("In Simple no match: "^(tok 2 str x)); 10/23/2021 30
Cse 321, Programming Languages and Compilers Top Level Parser fun parse s let val _ = val ans val _ = in ans end; = init s = Alt() match Done parse "a(b*|c)#"; concat (simple ("a", 1), concat (union (closure (simple ("b", 2)), simple ("c", 4)), empty 8)) 10/23/2021 31
Cse 321, Programming Languages and Compilers Predictive Parsers • Using a stack to avoid recursion. Encoding the diagrams in a table • The Nullable, First, and Follow functions – Nullable: Can a symbol derive the empty string. False for every terminal symbol. – First: all the terminals that a non-terminal could possibly derive as its first symbol. » term or nonterm -> set( term ) » sequence(term + nonterm) -> set( term) – Follow: all the terminals that could immediately follow the string derived from a non-terminal. » non-term -> set( term ) 10/23/2021 32
Cse 321, Programming Languages and Compilers Example First and Follow Sets E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε F T' * F T' ε (E) id First E = { "(", "id"} First F = { "(", "id"} First T = { "(", "id"} First E' = { "+", ε} First T' = { "*", ε} Follow E = Follow F = Follow T = Follow E' = Follow T' = {")", "$"} {"+", "*", ”)”, "$"} {{"+", ")", "$"} • First of a terminal is itself. • First can be extended to sequence of symbols. 10/23/2021 33
Cse 321, Programming Languages and Compilers Nullable • if ε is in First(symbol) then that symbol is nullable. • Sometime rather than let ε be a symbol we derive an additional function E : : = T E' $ nullable. E' : : = + T E' E’ : : = T' : : = T’ : : = F : : = ε F T' * F T' ε (E) id • Nullable (E’) = true • Nullable(T’) = true • Nullable for all other symbols is false 10/23/2021 34
Cse 321, Programming Languages and Compilers Computing First • Use the following rules until no more terminals can be added to any FIRST set. 1) if X is a term. FIRST(X) = {X} 2) if X : : = ε is a production then add ε to FIRST(X), (Or set nullable of X to true). 3) if X is a non-term and – X : : = Y 1 Y 2. . . Yk – add a to FIRST(X) » if a in FIRST(Yi) and » for all j<i ε in FIRST(Yj) • E. g. . if Y 1 can derive ε then if a is in FIRST(Y 2) it is surely in FIRST(X) as well. 10/23/2021 35
Cse 321, Programming Languages and Compilers Example First Computation • Terminals – First($) = {$} First(*) = {*} First(+) = {+}. . . • Empty Productions – add ε to First(E’), add ε to First(T’) • Other Non. Terminals – Computing from the lowest layer (F) up » First(F) = {id , ( } » First(T’) = { ε, * } » First(T) = First(F) = {id, ( } » First(E’) = { ε, + } » First(E) = First(T) = {id, ( } 10/23/2021 E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε F T' * F T' ε (E) id 36
Cse 321, Programming Languages and Compilers Computing Follow • Use the following rules until nothing can be added to any follow set. 1) Place $ (the end of input marker) in FOLLOW(S) where S is the start symbol. 2) If A : : = a B b then everything in FIRST(b) except ε is in FOLLOW(B) 3) If there is a production A : : = a B or A : : - a B b where FIRST(b) contains ε (i. e. b can derive the empty string) then everything in FOLLOW(A) is in FOLLOW(B) 10/23/2021 37
Cse 321, Programming Languages and Compilers Ex. Follow Computation • Rule 1, Start symbol – Add $ to Follow(E) • Rule 2, Productions with embedded nonterms – – Add First( ) ) = { ) } to follow(E) Add First($) = { $ } to Follow(E’) Add First(E’) = {+, ε } to Follow(T) Add First(T’) = {*, ε} to Follow(F) • Rule 3, Nonterm in last position – – 10/23/2021 Add follow(E’) to follow(E’) (doesn’t do much) Add follow (T) to follow(T’) Add follow(T) to follow(F) since T’ --> ε Add follow(T’) to follow(F) since T’ --> ε E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε FT' * F T' ε (E) id 38
Cse 321, Programming Languages and Compilers Table from First and Follow 1. For each production A -> alpha do 2 & 3 2. For each a in First alpha do add A -> alpha to M[A, a] 3. if ε is in First alpha, add A -> alpha to M[A, b] for each terminal b in Follow A. If ε is in First alpha and $ is in Follow A add A > alpha to M[A, $]. First E = {"(", "id"} First F = {"(", "id"} First T = {"(", "id"} First E' = {"+", ε} First T' = {"*", ε} Follow E = Follow F = Follow T = Follow E' = Follow T' = 1 2 3 4 5 6 7 8 {")", "$"} {"+", "*", ”)”, "$"} {{"+", ")", "$"} M[A, t] terminals 10/23/2021 n o n t e r m s + E E’ T T’ F * 2 6 ) 3 5 6 ( 1 E : : = E' : : = | T : : = T' : : = | F : : = | T E' $ + T E' ε F T' * F T' ε ( E ) id id $ 1 4 4 7 8 3 6 39
Cse 321, Programming Languages and Compilers Predictive Parsing Table id E 10/23/2021 ( ε F T’ $ ε F T’ ε id ) T E’ + T E’ T’ F * T E’ E’ T + * F T’ ε ε (E) 40
Cse 321, Programming Languages and Compilers Table Driven Algorithm push start symbol Repeat begin let X top of stack, A next input if terminal(X) then if X=A then pop X; remove A else error() else (* nonterminal(X) *) begin if M[X, A] = Y 1 Y 2. . . Yk then pop X; push Yk YK-1. . . Y 1 else error() end until stack is empty, input = $ 10/23/2021 41
Cse 321, Programming Languages and Compilers Example Parse id E 10/23/2021 Input x + y $ + y $ y $ $ $ $ F ( ε F T’ $ ε F T’ ε id ) T E’ + T E’ T’ Stack E E’ T’ F E’ T’ id E’ T’ E’ E’ T + E’ T’ F E’ T’ id E’ T’ E’ * T E’ E’ T + * F T’ ε ε (E) 42
Cse 321, Programming Languages and Compilers CS 321 Prog Lang & Compilers Assigned: Feb 5, 2007 Assignment # 7 Due: Wed. Feb 14, 2007 Cut and paste the following into your solution file. ================================= datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char; ================================ The purpose of today's home work is to write functions analogous to "first" and "follow" for context free grammars from todays lecture. There are two differences, The functions you will write for homework are for regular expressions, not context free grammars, and since REs don't have non-terminal and terminal symbols, the functions are for a complete RE rather than a symbol. Write 3 ML functions 1) Write (nullable: RE -> boolean) Returns a boolean, true if the empty string is a member of the set of strings recognized by the RE, false otherwise. 2) Write (first: RE -> char list) Returns a (char list) which contains those characters which may appear as the first character in the strings recognized by that RE. 3) Write (last: RE -> char list). Returns a char list which contains those characters which may appear as the last character in the strings recognized by that RE. All these functions a simple functions defined with pattern matching. One clause for each constructor of RE. 10/23/2021 43
- Cs 421
- Cs 421 uiuc
- Cse 340 principles of programming languages
- Vineeth kashyap
- Cse 321
- Pros and cons of compilers and interpreters
- Finding and understanding bugs in c compilers
- Lex leblanc
- Compiler vs interpreter advantages and disadvantages
- Real-time systems and programming languages
- Advantages of application software
- Real time programming language
- Binarymove compilers
- Cousins of compiler
- Crafting a compiler
- Functions of compilers
- Explain compiler construction tools
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Multithreaded programming languages
- Programming languages levels
- Introduction to programming languages
- Plc
- Joey paquet
- Comparative programming languages
- Alternative programming languages
- Types of programming languages
- Transmission programming languages
- Types of programming languages
- Xenia programming languages
- Mainstream programming languages
- Programming languages
- Programming languages
- Programming languages
- Programming languages
- Tiny programming language
- Brief history of programming languages
- Lisp_q
- Low level language
- If programming languages were cars
- Reasons for studying concepts of programming languages
- Cornell programming languages
- Low level programming language
- Middle level programming languages
- Programming languages flowchart