COS 320 Compilers David Walker last time context

  • Slides: 58
Download presentation
COS 320 Compilers David Walker

COS 320 Compilers David Walker

last time • context free grammars (Appel 3. 1) – terminals, non-terminals, rules –

last time • context free grammars (Appel 3. 1) – terminals, non-terminals, rules – derivations & parse trees – ambiguous grammars • recursive descent parsers (Appel 3. 2) – parse LL(k) grammars – easy to write as ML programs – algorithms for automatic construction from a CFG

non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ; ,

non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ; , = rules: 1. S : : = IF E THEN S ELSE S 4. L : : = END 2. | BEGIN S L 5. |; SL 3. | PRINT E 6. E : : = NUM datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ val tok = ref (get. Token ()) fun advance () = tok : = get. Token () fun eat t = if (! tok = t) then advance () else error () fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E () and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L () and E () = eat NUM; eat EQ; eat NUM

Constructing RD Parsers • To construct an RD parser, we need to know what

Constructing RD Parsers • To construct an RD parser, we need to know what rule to apply when – we have seen a non terminal X – we see the next terminal a in input • We apply rule X : : = s when – a is the first symbol that can be generated by string s, OR – s reduces to the empty string (is nullable) and a is the first symbol in any string that can follow X

Computing Nullable Sets • Non-terminal X is Nullable only if the following constraints are

Computing Nullable Sets • Non-terminal X is Nullable only if the following constraints are satisfied (computed using iterative analysis) – base case: • if (X : = ) then X is Nullable – inductive case: • if (X : = ABC. . . ) and A, B, C, . . . are all Nullable then X is Nullable

Computing First Sets • First(X) is computed iteratively – base case: • if T

Computing First Sets • First(X) is computed iteratively – base case: • if T is a terminal symbol then First (T) = {T} – inductive case: • if X is a non-terminal and (X: = ABC. . . ) then – First (X) = First (X) U First (ABC. . . ) where First(ABC. . . ) = F 1 U F 2 U F 3 U. . . and » F 1 = First (A) » F 2 = First (B), if A is Nullable » F 3 = First (C), if A is Nullable & B is Nullable » . . .

Computing Follow Sets • Follow(X) is computed iteratively – base case: • initially, we

Computing Follow Sets • Follow(X) is computed iteratively – base case: • initially, we assume nothing in particular follows X – (Follow (X) is initially { }) – inductive case: • if (Y : = s 1 X s 2) for any strings s 1, s 2 then – Follow (X) = First (s 2) U Follow (X) • if (Y : = s 1 X s 2) for any strings s 1, s 2 then – Follow (X) = Follow(Y) U Follow (X), if s 2 is Nullable

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = nullable Z Y X X : : = a X : : = b Y e first follow

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = nullable Z no Y yes X no base case X : : = a X : : = b Y e first follow

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = nullable Z no Y yes X no X : : = a X : : = b Y e first follow after one round of induction, we realize we have reached a fixed point

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first Z no d Y yes c X no a, b base case follow

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first Z no d, a, b Y yes c X no a, b after one round of induction, no fixed point follow

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first Z no d, a, b Y yes c X no a, b follow after two rounds of induction, no more changes ==> fixed point

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c {} X no a, b {} base case

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, d, a, b after one round of induction, no fixed point

building a predictive parser Z : : = X Y Z Z : :

building a predictive parser Z : : = X Y Z Z : : = d Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, d, a, b after two rounds of induction, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3 rd round)

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a Z Y X b c d e

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z : : = XYZ c d e

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z : : = XYZ c d Z : : = d e

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z : : = XYZ c d Z : : = d Y : : = c e

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z : : = XYZ Y Y : : = X c d e Z : : = d Y : : = c Y : : =

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b • if T First(s) then enter (X : : = s) in row X, col T • if s is Nullable and T Follow(X) enter (X : : = s) in row X, col T Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z : : = XYZ Y Y : : = X X : : = a X : : = b Ye c d e Z : : = d Y : : = c Y : : =

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b d e What are the blanks? a b Z Z : : = XYZ Y Y : : = X X : : = a X : : = b Ye c Z : : = d Y : : = c Y : : =

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b d e What are the blanks? --> syntax errors a b Z Z : : = XYZ Y Y : : = X X : : = a X : : = b Ye c Z : : = d Y : : = c Y : : =

Grammar: Z : : = X Y Z Z : : = d Computed

Grammar: Z : : = X Y Z Z : : = d Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b Is it possible to put 2 grammar rules in the same box? a b Z Z : : = XYZ Y Y : : = X X : : = a X : : = b Ye c d e Z : : = d Y : : = c Y : : =

Grammar: Z : : = X Y Z Z : : = d e

Grammar: Z : : = X Y Z Z : : = d e Computed Sets: Y : : = c Y : : = X : : = a X : : = b Y e nullable first follow Z no d, a, b {} Y yes c e, d, a, b X no a, b c, e, d, a, b Is it possible to put 2 grammar rules in the same box? a b Z Z : : = XYZ Y Y : : = X X : : = a X : : = b Y e c d e Z : : = d e Y : : = c Y : : =

predictive parsing tables • if a predictive parsing table constructed this way contains no

predictive parsing tables • if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1) – Left-to-right parse, Left-most derivation, 1 symbol lookahead • if not, of the grammar is not LL(1) • in LL(k) parsing table, columns include every klength sequence of terminals: aa ab ba bb ac ca . . .

another trick • Previously, we saw that grammars with left -recursion were problematic, but

another trick • Previously, we saw that grammars with left -recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z : : = X Y Z Z : : = d e • how do we fix it? Y : : = c Y : : = X : : = a X : : = b Y e

another trick • Previously, we saw that grammars with left -recursion were problematic, but

another trick • Previously, we saw that grammars with left -recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z : : = X Y Z Z : : = d e Y : : = c Y : : = X : : = a X : : = b Y e • solution here is left-factoring: Z : : = X Y Z Z : : = d W W : : = e Y : : = c Y : : = X : : = a X : : = b Y e

summary of RD parsing • CFGs are good at specifying programming language structure •

summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method

summary of RD parsing • CFGs are good at specifying programming language structure •

summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method. . . such as using a bottomup parsing technique

Bottom-up (Shift-Reduce) Parsing

Bottom-up (Shift-Reduce) Parsing

shift-reduce parsing • shift-reduce parsing – aka: bottom-up parsing – aka: LR(k) Left-to-right parse,

shift-reduce parsing • shift-reduce parsing – aka: bottom-up parsing – aka: LR(k) Left-to-right parse, Rightmost derivation, k-token lookahead • more powerful than LL(k) parsers • LALR variant: – the basis for parsers for most modern programming languages – implemented in tools such as ML-Yacc

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far:

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( id SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( id = SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( id = num SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( S REDUCE S : : = id = num

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L REDUCE L : : = S

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ; SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ; id = num SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ; S REDUCE S : : = id = num

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L REDUCE S : : = L ; S

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ) SHIFT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: S REDUCE S : : = ( L )

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num Input from lexer: ( id = num ; id = num ) EOF State of parse so far: A SHIFT REDUCE A : : = S EOF ACCEPT

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L :

shift-reduce parsing example Parsing Table Grammar: A : : = S EOF L : : = L ; S L : : = S S : : = ( L ) S : : = id = num Input from lexer: ( id = num ; id = num ) EOF State of parse so far: A A successful parse! Is this grammar LL(1)?

Shift-reduce algorithm • Parser keeps track of – position in current input (what input

Shift-reduce algorithm • Parser keeps track of – position in current input (what input to read next) – a stack of terminal & non-terminal symbols representing the “parse so far” • Based on next input symbol & stack, parser table indicates – shift: push next input on to top of stack – reduce R: • top of stack should match RHS of rule • replace top of stack with LHS of rule – error – accept (we shift EOF & can reduce what remains on stack to start symbol)

Shift-reduce algorithm (a detail) • The parser summarizes the current “parse state” using an

Shift-reduce algorithm (a detail) • The parser summarizes the current “parse state” using an integer – the integer is actually a state in a finite automaton – the current parse state can be computed by running the automaton over the current parse stack • Revised algorithm: Based on next input symbol & the parse state (as opposed to the entire stack), parser table indicates – shift s: • push next input on to top of stack and move automaton into state s – reduce R & goto s: • top of stack should match RHS of rule • replace top of stack with LHS of rule • move automaton into state s – error – accept

shift-reduce parsing Grammar: ? ? Input from lexer: ? ? ? ? EOF State

shift-reduce parsing Grammar: ? ? Input from lexer: ? ? ? ? EOF State of parse so far: ? ? Like LL parsing, shift-reduce parsing does not always work. What sort of grammar rules make shift-reduce parsing impossible?

shift-reduce parsing Grammar: ? ? Input from lexer: ? ? ? ? EOF State

shift-reduce parsing Grammar: ? ? Input from lexer: ? ? ? ? EOF State of parse so far: ? ? Like LL parsing, shift-reduce parsing does not always work. • Shift-Reduce errors: can’t decide whether to Shift or Reduce • Reduce-Reduce errors: can’t decide whether to Reduce by R 1 or R 2

shift-reduce errors Grammar: A : : = S EOF S : : = S

shift-reduce errors Grammar: A : : = S EOF S : : = S + S S : : = S * S S : : = id Input from lexer: ? ? ? ? EOF State of parse so far: ? ?

shift-reduce errors Grammar: A : : = S EOF S : : = S

shift-reduce errors Grammar: A : : = S EOF S : : = S + S S : : = S * S S : : = id Input from lexer: id + id * id EOF State of parse so far: S + S • reduce by rule (S : : = S + S) or • shift the * ? ? ? notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing

shift-reduce errors Grammar: A : : = S id EOF S : : =

shift-reduce errors Grammar: A : : = S id EOF S : : = E ; E : : = E ; E E : : = id Input from lexer: id ; id EOF some unambiguous grammars can’t be parsed by LR(1) parsers either id ; id EOF State of parse so far: E ; • reduce by rule (S : : = E ; ) or • shift the id input might be this, making shifting correct

reduce-reduce errors Grammar: A : : = S EOF S : : = (

reduce-reduce errors Grammar: A : : = S EOF S : : = ( E ) S : : = E Input from lexer: ( id ) EOF State of parse so far: ( E ) • reduce by rule ( S : : = ( E ) ) or • reduce by rule ( E : : = ( E ) ) E : : = ( E ) E : : = E + E E : : = id

Summary • Top-down Parsing – simple to understand implement – you can code it

Summary • Top-down Parsing – simple to understand implement – you can code it yourself using nullable, first, follow sets – excellent for quick & dirty parsing jobs • Bottom-up Parsing – more complex: uses stack & table – more powerful – Bonus: tools do the work for you ==> ML-Yacc • but you need to understand how shift-reduce & reduce errors can arise