Cse 321 Programming Languages and Compilers Lecture 7

  • Slides: 43
Download presentation
Cse 321, Programming Languages and Compilers Lecture #7, Feb. 5, 2007 • Grammars •

Cse 321, Programming Languages and Compilers Lecture #7, Feb. 5, 2007 • Grammars • Top down parsing • Transition Diagrams • Ambiguity • Left recursion • Refactoring by adding levels • Recursive descent parsing • Predictive Parsers • First and Follow • Parsing tables 10/23/2021 1

Cse 321, Programming Languages and Compilers Assignments • Reading – Chapter 3 – Page

Cse 321, Programming Languages and Compilers Assignments • Reading – Chapter 3 – Page 73 -106 – Quiz on Wednesday • Mid Term exam – Monday. Feb 19, 2007. Time: in class. • Next Homework – – On the web page, and the last page of this handout Due date to be negotiated. Recall Project #1 is due next Wednesday. I promised no homework Today or Wednesday. • Project 1 – Recall Project #1, the scanner is due Feb. 14 th 10/23/2021 2

Cse 321, Programming Languages and Compilers Grammars 1 • Grammar – A set of

Cse 321, Programming Languages and Compilers Grammars 1 • Grammar – A set of tokens (terminals): T – A set of non-terminals: N – A set of productions { lhs : : = rhs , . . . } » lhs in N » rhs is a sequence of N U T – A Start symbol: S (in N) • Shorthands – Provide only the productions » All lhs symbols comprise N » All other sysmbols comprise T » lhs of first production is S 10/23/2021 3

Cse 321, Programming Languages and Compilers Grammars 2 • Rewriting rules – Pick a

Cse 321, Programming Languages and Compilers Grammars 2 • Rewriting rules – Pick a non-terminal to replace. Which order? » left-to-right » right-to-left • Derivations (a list if productions used to derive a string from a grammar). • A sentence of G: L(G) – Start with S – only terminal symbols – all strings derivable from G in 1 or more steps 10/23/2021 4

Cse 321, Programming Languages and Compilers Grammars 3 • Parse trees. – Graphical representations

Cse 321, Programming Languages and Compilers Grammars 3 • Parse trees. – Graphical representations of derivations. – The leaves of a parse tree for fully filled out tree is a sentence. • Context Free Grammars – how do they compare to regular expressions? – Nesting (matched ()’s) requires CFG, ’s RE's are not powerful enough. • Ambiguity – A string has two derivations – E : : = E + E | E*E | » x+x*y id • Left-recursion – E : : = E + E | E * E | id – Makes certain top-down parsers loop 10/23/2021 5

Cse 321, Programming Languages and Compilers Top Down Parsing • Begin with the start

Cse 321, Programming Languages and Compilers Top Down Parsing • Begin with the start symbol and try and derive the parse tree from the root. • Consider the grammar Exp : : = id | Exp + Exp | Exp * Exp | ( Exp ) derives x, x+x+x, x*y 10/23/2021 x+y*z . . . 6

Cse 321, Programming Languages and Compilers Example Parse (top down) – stack input Exp

Cse 321, Programming Languages and Compilers Example Parse (top down) – stack input Exp Exp / | Exp + Exp | id(x) 10/23/2021 x+y*z y*z 7

Cse 321, Programming Languages and Compilers Top Down Parse (cont) Exp y*z / |

Cse 321, Programming Languages and Compilers Top Down Parse (cont) Exp y*z / | Exp + Exp | /| id(x) Exp * Exp / | Exp + Exp | / | id(x) Exp * Exp | id(y) 10/23/2021 z 8

Cse 321, Programming Languages and Compilers Top Down Parse (cont. ) Exp / |

Cse 321, Programming Languages and Compilers Top Down Parse (cont. ) Exp / | Exp + Exp | / | id(x) Exp * Exp | | id(y) id(z) 10/23/2021 9

Cse 321, Programming Languages and Compilers Transition Diagrams • Transition diagrams for predictive parsers

Cse 321, Programming Languages and Compilers Transition Diagrams • Transition diagrams for predictive parsers – One diagram for each Non-terminal – Shouldn't have left recursion ( left factored ) – Diagrams can (recursively) mention each other E -> T E' E' -> + T E' | <empty> T -> F T' T' -> * F T' | <empty> F -> ( E ) | id T E E’ E’ F T 10/23/2021 T E’ * F T’ T’ T’ F + E ( id ) 10

Cse 321, Programming Languages and Compilers Problems with Top Down Parsing • Backtracking may

Cse 321, Programming Languages and Compilers Problems with Top Down Parsing • Backtracking may be necessary: – S : : = ee | b. Ac | b. Ae – A : : = d | c. A try on string “bcde” • Infinite loops possible from (indirect) left recursive grammars. – E : : = E + id | id • Ambiguity is a problem when a unique parse is not possible. • These often require extensive grammar restructuring (grammar debugging). 10/23/2021 11

Cse 321, Programming Languages and Compilers Grammar Transformations • Removing ambiguity. • Removing Left

Cse 321, Programming Languages and Compilers Grammar Transformations • Removing ambiguity. • Removing Left Recursion • Backtracking and Factoring 10/23/2021 12

Cse 321, Programming Languages and Compilers Removing ambiguity. • Adding levels to a grammar

Cse 321, Programming Languages and Compilers Removing ambiguity. • Adding levels to a grammar – E : = E + E | E * E | id | ( E ) – E : : = E + T | T – T : : = T * F | F – F : : = id | ( E ) • The dangling else grammar. – st : : = if exp then st else st | if exp then st | id : = exp – Note that the following has two possible parses if x=2 then if x=3 then y: =2 else y : = 4 10/23/2021 if x=2 then (if x=3 then y: =2 ) else y : = 4 if x=2 then (if x=3 then y: =2 else y : = 4) 13

Cse 321, Programming Languages and Compilers Adding levels (cont) • Original grammar st :

Cse 321, Programming Languages and Compilers Adding levels (cont) • Original grammar st : : = if exp then st else st | if exp then st | id : = exp • Assume that every st between then and else must be matched, i. e. it must have both a then and an else. • New Grammar with addtional levels st match -> -> | unmatch -> | 10/23/2021 match | unmatch if exp then match else match id : = exp if exp then st if exp then match else unmatch 14

Cse 321, Programming Languages and Compilers Removing Left Recursion • Top down recursive descent

Cse 321, Programming Languages and Compilers Removing Left Recursion • Top down recursive descent parsers require non-left recursive grammars • Technique: Left Factoring » E : = E + E » E : : = id E’ » E’ : : = + E | E * E | E’ | id * E E’ | ε • General Technique to remove direct left recursion – Every Non terminal with productions (a | b) (n | m) * » T : : = T n | T m (left recursive productions) | a | b (non-left recursive productions) T – Make a new non-terminal T’ “a” and “b” because T n – Remove the old productions they are the rhs of the non-left recurive productions. – Add the following productions T n » T : : = a T’ | b T’ n T » T’ : : = n T’ | m T’ | ε a 10/23/2021 15

Cse 321, Programming Languages and Compilers Backtracking and Factoring • Backtracking may be necessary:

Cse 321, Programming Languages and Compilers Backtracking and Factoring • Backtracking may be necessary: – S : : = ee – A : : = d | | b. Ac c. A | b. Ae • try on string “bcde” S -> b. Ac (by S -> b. Ac) -> bc. Ae (by A -> c. A) -> bcde (by A -> d) • But this is the wrong answer! • Factoring a grammar – Factor common prefixes and make the different postfixes into a new non-terminal – S : : = ee | b. AQ – Q : : = c | e – A : : = d | c. A 10/23/2021 16

Cse 321, Programming Languages and Compilers Recursive Descent Parsing • One procedure (function) for

Cse 321, Programming Languages and Compilers Recursive Descent Parsing • One procedure (function) for each non-terminal. • Procedures are often (mutually) recursive. • They can return a bool (the input matches that nonterminal) or more often they return a data-structure (the input builds this parse tree) • Need to control the lexical analyzer (requiring it to “back-up” on occasion) 10/23/2021 17

Cse 321, Programming Languages and Compilers R. D. parser for R. E. ’s •

Cse 321, Programming Languages and Compilers R. D. parser for R. E. ’s • Build an instance of the datatype: datatype Re = empty of int | simple of string * int | concat of Re * Re | closure of Re | union of Re * Re; • The lexical Analyzer datatype token = Done | Bar | Star | Hash | Leftparen | Rightparen | Single of string; 10/23/2021 18

Cse 321, Programming Languages and Compilers Ambiguous grammar 1. RE 2. RE 3. RE

Cse 321, Programming Languages and Compilers Ambiguous grammar 1. RE 2. RE 3. RE 4. RE 5. RE 6. RE -> -> -> RE bar RE RE * id # ( RE ) • Transform grammar by layering • Tightest binding operators (*) at the lowest layer • Layers are Alt, then Concat, then Closure, then Simple. Alt -> Alt bar Concat Alt -> Concat Closure Concat -> Closure -> simple star Closure -> simple -> id 10/23/2021 | (Alt ) | # 19

Cse 321, Programming Languages and Compilers Left Recursive Grammar Alt -> Alt bar Concat

Cse 321, Programming Languages and Compilers Left Recursive Grammar Alt -> Alt bar Concat Alt -> Concat Closure Concat -> Closure -> simple star Closure -> simple -> id | (Alt ) | # 1. For every Non terminal with productions T : : = T n | T m (left recursive productions) | a | b (non-left recursive productions) 1. Make a new non-terminal T’ 2. Remove the old productions 3. Add the following productions T : : = a T’ | b T’ T’ : : = n T’ | m T’ | ε 10/23/2021 Alt more. Alt : : = | Concat : : = more. Concat : : = | Closure : : = | Simple : : = | | Concat more. Alt Bar Concat more. Alt ε Closure more. Concat ε Simple Star Simple Id ( Alt ) # 20

Cse 321, Programming Languages and Compilers Lookahead and the Lexer val lookahead = ref

Cse 321, Programming Languages and Compilers Lookahead and the Lexer val lookahead = ref Done; val input = ref [Done]; val location = ref 0; fun nextloc () = (location : = (!location) + 1; !location) fun init s = (location : = 0; input : = lexan s; lookahead : = hd(!input); input : = tl(!input)) • Lex’s the whole input • Stores it in the variable input • Keeps track of next token (so that backup is possible) 10/23/2021 21

Cse 321, Programming Languages and Compilers Matching a single Terminal fun match t =

Cse 321, Programming Languages and Compilers Matching a single Terminal fun match t = if (!lookahead) = t then (if null(!input) then lookahead : = Done else (lookahead : = hd(!input) ; input : = tl(!input))) else error ("looking for: "^(tok 2 str t)^ " found: "^(tok 2 str (!lookahead))); • • • Match one token Advance the input Handle the end of file correctly Report errors in a sensible way This function will be called a lot!! 10/23/2021 22

Cse 321, Programming Languages and Compilers more. Alt and more. Concat When we removed

Cse 321, Programming Languages and Compilers more. Alt and more. Concat When we removed left recursion, we added nonterminals that might recognize ε. i. e. more. Alt and more. Concat Observe the shape of parse trees using those productions. more. Concat : : = Closure more. Concat | <empty> more. Concat closure . . . 10/23/2021 They always end in ε at the far right of the tree more. Concat closure . . . more. Concat ε 23

Cse 321, Programming Languages and Compilers 1 Function for each NT • A simple

Cse 321, Programming Languages and Compilers 1 Function for each NT • A simple way to write a parser for a language is the technique called recursive descent parsing. • Each Non-terminal is represented by a function that returns a syntax item corresponding to the element that production parses. • If it can’t parse that element it raises an error. • When a production might return the empty string we need to handle that by using the SML ‘a option Alt : : = Concat more. Alt datatype. Alt : unit -> Re more. Alt : unit -> Re option Concat : unit -> Re Closure : unit -> Re more. Concat : unit -> Re option Simple : unit -> Re 10/23/2021 more. Alt : : = | Concat : : = more. Concat : : = | Closure : : = | Simple : : = | | Bar Alt more. Alt ε Closure more. Concat ε Simple Star Simple Id ( Alt ) # 24

Cse 321, Programming Languages and Compilers Alt : : = Concat more. Alt fun

Cse 321, Programming Languages and Compilers Alt : : = Concat more. Alt fun Alt () = let val x = Concat () val y = more. Alt () in case y of NONE => x | SOME z => union(x, z) end 10/23/2021 25

Cse 321, Programming Languages and Compilers more. Alt : : = Bar Alt more.

Cse 321, Programming Languages and Compilers more. Alt : : = Bar Alt more. Alt | ε and more. Alt () = case (!lookahead) of Bar => let val _ = match Bar val x = Alt() “and” separates val y = more. Alt () mutually recursive in case y of functions NONE => SOME x | (SOME z) => SOME(union(x, z)) end | _ => NONE 10/23/2021 26

Cse 321, Programming Languages and Compilers Concat : : = Closure more. Concat and

Cse 321, Programming Languages and Compilers Concat : : = Closure more. Concat and Concat () = let val x = Closure () val y = more. Concat () in case y of NONE => x | SOME z => concat(x, z) end 10/23/2021 27

Cse 321, Programming Languages and Compilers more. Concat : : = Closure more. Concat

Cse 321, Programming Languages and Compilers more. Concat : : = Closure more. Concat | ε and more. Concat () = case (!lookahead) of (Single _ | Leftparen | Hash) => let val x = Closure() val y = more. Concat() in case y of NONE => SOME x | SOME z => SOME(concat(x, z)) end | _ => NONE 10/23/2021 28

Cse 321, Programming Languages and Compilers Closure : : = Simple Star | Simple

Cse 321, Programming Languages and Compilers Closure : : = Simple Star | Simple A simple form of leftfactoring is used here and Closure () = let val x = Simple() in case !lookahead of Star => (match Star; closure x) | other => x end 10/23/2021 29

Cse 321, Programming Languages and Compilers Simple : : = Id | ( Alt

Cse 321, Programming Languages and Compilers Simple : : = Id | ( Alt ) | # and Simple () = case !lookahead of Single c => let val _ = match (Single c) val n = nextloc() in simple(c, n) end | Leftparen => let val _ = match Leftparen val x = Alt(); val _ = match Rightparen in x end | Hash => let val _ = match Hash val n = nextloc() in empty n end | x => error ("In Simple no match: "^(tok 2 str x)); 10/23/2021 30

Cse 321, Programming Languages and Compilers Top Level Parser fun parse s let val

Cse 321, Programming Languages and Compilers Top Level Parser fun parse s let val _ = val ans val _ = in ans end; = init s = Alt() match Done parse "a(b*|c)#"; concat (simple ("a", 1), concat (union (closure (simple ("b", 2)), simple ("c", 4)), empty 8)) 10/23/2021 31

Cse 321, Programming Languages and Compilers Predictive Parsers • Using a stack to avoid

Cse 321, Programming Languages and Compilers Predictive Parsers • Using a stack to avoid recursion. Encoding the diagrams in a table • The Nullable, First, and Follow functions – Nullable: Can a symbol derive the empty string. False for every terminal symbol. – First: all the terminals that a non-terminal could possibly derive as its first symbol. » term or nonterm -> set( term ) » sequence(term + nonterm) -> set( term) – Follow: all the terminals that could immediately follow the string derived from a non-terminal. » non-term -> set( term ) 10/23/2021 32

Cse 321, Programming Languages and Compilers Example First and Follow Sets E : :

Cse 321, Programming Languages and Compilers Example First and Follow Sets E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε F T' * F T' ε (E) id First E = { "(", "id"} First F = { "(", "id"} First T = { "(", "id"} First E' = { "+", ε} First T' = { "*", ε} Follow E = Follow F = Follow T = Follow E' = Follow T' = {")", "$"} {"+", "*", ”)”, "$"} {{"+", ")", "$"} • First of a terminal is itself. • First can be extended to sequence of symbols. 10/23/2021 33

Cse 321, Programming Languages and Compilers Nullable • if ε is in First(symbol) then

Cse 321, Programming Languages and Compilers Nullable • if ε is in First(symbol) then that symbol is nullable. • Sometime rather than let ε be a symbol we derive an additional function E : : = T E' $ nullable. E' : : = + T E' E’ : : = T' : : = T’ : : = F : : = ε F T' * F T' ε (E) id • Nullable (E’) = true • Nullable(T’) = true • Nullable for all other symbols is false 10/23/2021 34

Cse 321, Programming Languages and Compilers Computing First • Use the following rules until

Cse 321, Programming Languages and Compilers Computing First • Use the following rules until no more terminals can be added to any FIRST set. 1) if X is a term. FIRST(X) = {X} 2) if X : : = ε is a production then add ε to FIRST(X), (Or set nullable of X to true). 3) if X is a non-term and – X : : = Y 1 Y 2. . . Yk – add a to FIRST(X) » if a in FIRST(Yi) and » for all j<i ε in FIRST(Yj) • E. g. . if Y 1 can derive ε then if a is in FIRST(Y 2) it is surely in FIRST(X) as well. 10/23/2021 35

Cse 321, Programming Languages and Compilers Example First Computation • Terminals – First($) =

Cse 321, Programming Languages and Compilers Example First Computation • Terminals – First($) = {$} First(*) = {*} First(+) = {+}. . . • Empty Productions – add ε to First(E’), add ε to First(T’) • Other Non. Terminals – Computing from the lowest layer (F) up » First(F) = {id , ( } » First(T’) = { ε, * } » First(T) = First(F) = {id, ( } » First(E’) = { ε, + } » First(E) = First(T) = {id, ( } 10/23/2021 E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε F T' * F T' ε (E) id 36

Cse 321, Programming Languages and Compilers Computing Follow • Use the following rules until

Cse 321, Programming Languages and Compilers Computing Follow • Use the following rules until nothing can be added to any follow set. 1) Place $ (the end of input marker) in FOLLOW(S) where S is the start symbol. 2) If A : : = a B b then everything in FIRST(b) except ε is in FOLLOW(B) 3) If there is a production A : : = a B or A : : - a B b where FIRST(b) contains ε (i. e. b can derive the empty string) then everything in FOLLOW(A) is in FOLLOW(B) 10/23/2021 37

Cse 321, Programming Languages and Compilers Ex. Follow Computation • Rule 1, Start symbol

Cse 321, Programming Languages and Compilers Ex. Follow Computation • Rule 1, Start symbol – Add $ to Follow(E) • Rule 2, Productions with embedded nonterms – – Add First( ) ) = { ) } to follow(E) Add First($) = { $ } to Follow(E’) Add First(E’) = {+, ε } to Follow(T) Add First(T’) = {*, ε} to Follow(F) • Rule 3, Nonterm in last position – – 10/23/2021 Add follow(E’) to follow(E’) (doesn’t do much) Add follow (T) to follow(T’) Add follow(T) to follow(F) since T’ --> ε Add follow(T’) to follow(F) since T’ --> ε E : : = E' : : = E’ : : = T' : : = T’ : : = F : : = T E' $ + T E' ε FT' * F T' ε (E) id 38

Cse 321, Programming Languages and Compilers Table from First and Follow 1. For each

Cse 321, Programming Languages and Compilers Table from First and Follow 1. For each production A -> alpha do 2 & 3 2. For each a in First alpha do add A -> alpha to M[A, a] 3. if ε is in First alpha, add A -> alpha to M[A, b] for each terminal b in Follow A. If ε is in First alpha and $ is in Follow A add A > alpha to M[A, $]. First E = {"(", "id"} First F = {"(", "id"} First T = {"(", "id"} First E' = {"+", ε} First T' = {"*", ε} Follow E = Follow F = Follow T = Follow E' = Follow T' = 1 2 3 4 5 6 7 8 {")", "$"} {"+", "*", ”)”, "$"} {{"+", ")", "$"} M[A, t] terminals 10/23/2021 n o n t e r m s + E E’ T T’ F * 2 6 ) 3 5 6 ( 1 E : : = E' : : = | T : : = T' : : = | F : : = | T E' $ + T E' ε F T' * F T' ε ( E ) id id $ 1 4 4 7 8 3 6 39

Cse 321, Programming Languages and Compilers Predictive Parsing Table id E 10/23/2021 ( ε

Cse 321, Programming Languages and Compilers Predictive Parsing Table id E 10/23/2021 ( ε F T’ $ ε F T’ ε id ) T E’ + T E’ T’ F * T E’ E’ T + * F T’ ε ε (E) 40

Cse 321, Programming Languages and Compilers Table Driven Algorithm push start symbol Repeat begin

Cse 321, Programming Languages and Compilers Table Driven Algorithm push start symbol Repeat begin let X top of stack, A next input if terminal(X) then if X=A then pop X; remove A else error() else (* nonterminal(X) *) begin if M[X, A] = Y 1 Y 2. . . Yk then pop X; push Yk YK-1. . . Y 1 else error() end until stack is empty, input = $ 10/23/2021 41

Cse 321, Programming Languages and Compilers Example Parse id E 10/23/2021 Input x +

Cse 321, Programming Languages and Compilers Example Parse id E 10/23/2021 Input x + y $ + y $ y $ $ $ $ F ( ε F T’ $ ε F T’ ε id ) T E’ + T E’ T’ Stack E E’ T’ F E’ T’ id E’ T’ E’ E’ T + E’ T’ F E’ T’ id E’ T’ E’ * T E’ E’ T + * F T’ ε ε (E) 42

Cse 321, Programming Languages and Compilers CS 321 Prog Lang & Compilers Assigned: Feb

Cse 321, Programming Languages and Compilers CS 321 Prog Lang & Compilers Assigned: Feb 5, 2007 Assignment # 7 Due: Wed. Feb 14, 2007 Cut and paste the following into your solution file. ================================= datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char; ================================ The purpose of today's home work is to write functions analogous to "first" and "follow" for context free grammars from todays lecture. There are two differences, The functions you will write for homework are for regular expressions, not context free grammars, and since REs don't have non-terminal and terminal symbols, the functions are for a complete RE rather than a symbol. Write 3 ML functions 1) Write (nullable: RE -> boolean) Returns a boolean, true if the empty string is a member of the set of strings recognized by the RE, false otherwise. 2) Write (first: RE -> char list) Returns a (char list) which contains those characters which may appear as the first character in the strings recognized by that RE. 3) Write (last: RE -> char list). Returns a char list which contains those characters which may appear as the last character in the strings recognized by that RE. All these functions a simple functions defined with pattern matching. One clause for each constructor of RE. 10/23/2021 43