Fall 2017 2018 Compiler Principles Lecture 2 LL

Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Operational Semantics Dataflow Analysis

Context-free grammars start nonterminal S E$ E T E E+T T id T (E)

Context-free languages • Sentential forms • Derivations (leftmost, rightmost) – Language = all derivable

Agenda • Understand role of syntax analysis • Parsing strategies • LL parsing –

Role of syntax analysis High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table

From tokens to abstract syntax trees 59 + (1257 * x. Position) program text

Marking “end-of-file” • Sometimes it will be useful to transform a grammar G with

Another convention • We will assume that all productions have been consecutively numbered (1)

Broad kinds of parsers • Parsers for arbitrary grammars – Cocke-Younger-Kasami [‘ 65] method

Top-down parsing • Constructs parse tree in a topdown matter • Find leftmost derivation

Exercise: show leftmost derivation How did we decide which production of ‘E’ to take?

Predictive parsing • Given a grammar G attempt to derive a word ω •

LL(1) parsing via pushdown automata Input stream a + b $ Stack of symbols

LL(1) parsing algorithm • Initialze stack to S $ • while true – Prediction

Example prediction table (1) E → LIT (2) E → ( E OP E

Running parser example aacbb$ S �a. Sb | c Input suffix Stack content Move

Illegal input example abcbb$ S �a. Sb | c Input suffix Stack content Move

Building the prediction table • Let G be a grammar • Compute FIRST/NULLABLE/FOLLOW •

FIRST sets • Definition: For a nonterminal A, FIRST(A) is the set of terminals

FIRST sets example E LIT | (E OP E) | not E LIT true

Computing FIRST sets Assume no null productions (A ) 1. Initially, for all nonterminals

Exercise: compute FIRST(STMT) = FIRST(if EXPR then STMT) ∪ FIRST(while EXPR do STMT) ∪

Exercise: compute FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --}

1. Initialization FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --}

2. Iterate 1 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++,

2. Iterate 2 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++,

2. Iterate 3 – fixed-point FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? ,

Reasoning about the algorithm Assume no null productions (A ) 1. Initially, for all

Reasoning about the algorithm • Termination: • Correctness: 35

LL(1) Parsing of grammars without epsilon productions 36

Using FIRST sets • Assume G has no epsilon productions and for every non-terminal

Using FIRST sets • In our Boolean expressions example – FIRST( LIT ) =

Exercise: LL(1) prediction table Terminals: id. num $ (1) S E $ (2) E

Extending LL(1) Parsing for epsilon productions 40

FIRST, FOLLOW, NULLABLE sets • For each non-terminal X • FIRST(X) = set of

Computing the NULLABLE set • Lemma: NULLABLE( 1 … k) = NULLABLE( 1) �

Exercise: compute NULLABLE S Aab A a| B AB|C C b| NULLABLE(S) = NULLABLE(A)

FIRST with epsilon productions • How do we compute FIRST( 1 … k) when

Exercise: compute FIRST S Acb A a| NULLABLE(S) = NULLABLE(A) �NULLABLE(c) �NULLABLE(b) NULLABLE(A) =

FOLLOW sets • FOLLOW(X) = set of terminals that can follow X in some

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) ? if NULLABLE(

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) FIRST( ) if

Filling the prediction table • Table[N, t] = N �α if 1. t FIRST(α)

Conflicts • FIRST-FIRST conflict – X �α and X � and – If FIRST(α)

LL(1) grammars • A grammar is in the class LL(1) when its LL(1) prediction

LL(k) grammars • Generalizes LL(1) for k lookahead tokens • Need to generalize FIRST

Problem 1: FIRST-FIRST conflict term ID | indexed_elem ID [ expr ] • FIRST(indexed_elem)

Solution: left factoring • Rewrite the grammar to be in LL(1) term ID |

Exercise: apply left factoring S if E then S else S | if E

Problem 2: FIRST-FOLLOW conflict S Aab A a| • FIRST(S) = { a }

Solution: substitution S Aab A a| Substitute A in S S aab|ab 65

Solution: substitution S Aab A a| Substitute A in S S aab|ab Left factoring

Problem 3: FIRST-FIRST conflict E E - term | term • Left recursion cannot

Solution: left recursion removal N Nα | β N βN’ N’ αN’ | G

Recap • Given a grammar • Compute for each non-terminal – NULLABLE – FIRST

Slides: 70

Download presentation

Fall 2017 -2018 Compiler Principles Lecture 2: LL parsing Roman Manevich Ben-Gurion University of the Negev 1

Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Modern Compiler Implementation in Java Andrew W. Appel Modern Compiler Design D. Grune, H. Bal, C. Jacobs, K. Langendoen Advanced Compiler Design and Implementation Steven Muchnik 2

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Operational Semantics Dataflow Analysis Register Allocation Top-down Parsing (LL) Lowering Loop Optimizations Instruction Selection Bottom-up Parsing (LR) mid-term exam 3

Context-free grammars start nonterminal S E$ E T E E+T T id T (E) production / rule 4

Context-free languages • Sentential forms • Derivations (leftmost, rightmost) – Language = all derivable words • Derivation tree (also called parse tree) – Language = all yields of derivation trees • Ambiguous grammars 5

Agenda • Understand role of syntax analysis • Parsing strategies • LL parsing – Building a predictor table via FIRST/FOLLOW/NULLABLE sets • Handling conflicts 6

Role of syntax analysis High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Code (scheme) • Recover structure from stream of tokens – Parse tree / abstract syntax tree • Error reporting (recovery) • Other possible tasks – Syntax directed translation (one pass compilers) – Create symbol table – Create pretty-printed version of the program, e. g. , Auto Formatting function in IDE 7

From tokens to abstract syntax trees 59 + (1257 * x. Position) program text Regular expressions Finite automata Lexical Analyzer Lexical error token stream Grammar: E id E num E E+E E E*E E (E) num + valid ( num * id ) Context-free grammars Push-down automata Parser valid syntax error + num Abstract Syntax Tree * num x 8

Marking “end-of-file” • Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ and a new production rule S’ �S $ – $ is not part of the set of tokens – It is a special End-Of-File (EOF) token • To parse α with G’ we change it into α $ • Simplifies parsing grammars with null productions – Also simplifies parsing LR grammars • Blank space character ˽ 9

Another convention • We will assume that all productions have been consecutively numbered (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) 10

Parsing strategies 11

Broad kinds of parsers • Parsers for arbitrary grammars – Cocke-Younger-Kasami [‘ 65] method O(n 3) – Earley’s method (implemented by NLTK) O(n 3) but lower for restricted classes – Not commonly used by compilers • Parsers for restricted classes of grammars – Top-Down • Predictive – LL parsing • Backtracking – recursive descent / combinators – Bottom-Up – LR parsing 12

Top-down parsing • Constructs parse tree in a topdown matter • Find leftmost derivation • Predictive: for every nonterminal and k-tokens predict the next production LL(k) • Challenge: beginning with the start symbol, try to guess the productions to apply to end up at the user's program By Fidelio (Own work) [GFDL (http: //www. gnu. org/copyleft/fdl. html) or CC-BY-SA-3. 0 -2. 5 -2. 0 -1. 0 (http: //creativecommons. org/licenses/by-sa/3. 0)], via Wikimedia Commons 13

Predictive parsing 14

Exercise: show leftmost derivation How did we decide which production of ‘E’ to take? E (1) E LIT (2) | (E OP E) (3) | not E (4) LIT true (5) | false (6) OP and (7) | or (8) | xor not E not ( E OP E ) not ( not LIT OP E ) not ( not true or LIT ) not ( not true or false ) E E not ( E not E LIT OP E or LIT ) false true 15

Predictive parsing • Given a grammar G attempt to derive a word ω • Idea – Scan input from left to right – Apply production to leftmost nonterminal – Pick production rule based on next (1) input token • Problem: there is more than one production based for next token • Solution: restrict grammars to LL(1) – Parser correctly predicts which production to apply – If grammar is not in LL(1) the parser construction algorithm will detect it 16

LL(1) parsing via pushdown automata Input stream a + b $ Stack of symbols (current sentential form) X Parsing program Y Derivation tree / error Prediction table Z $ nonterminal token production 17

LL(1) parsing algorithm • Initialze stack to S $ • while true – Prediction When top of stack is nonterminal N 1. Pop N 2. lookup Table[N, t] 3. If table[N, t] is not empty, push Table[N, t] on stack else return syntax error – Match When top of stack is terminal t • If t=next input token, pop t and increment input index else return syntax error – End When stack is empty • If input is empty return success else return syntax error 18

Example prediction table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Table entries determine which production to take Nonterminals Input tokens ( E LIT OP 2 ) not true false 3 1 1 4 5 and or xor 6 7 8 $ 19

Running parser example aacbb$ S �a. Sb | c Input suffix Stack content Move aacbb$ S$ predict(S, a) = S �a. Sb aacbb$ a. Sb$ match(a, a) acbb$ Sb$ predict(S, a) = S �a. Sb acbb$ a. Sbb$ match(a, a) cbb$ Sbb$ predict(S, c) = S �c cbb$ match(c, c) bb$ match(b, b) b$ b$ match(b, b) $ $ match($, $) – success a S S �a. Sb b c S �c 20

Illegal input example abcbb$ S �a. Sb | c Input suffix Stack content Move abcbb$ S$ predict(S, a) = S �a. Sb abcbb$ a. Sb$ match(a, a) bcbb$ Sb$ predict(S, b) = ERROR a S S �a. Sb b c S �c 21

Building the prediction table • Let G be a grammar • Compute FIRST/NULLABLE/FOLLOW • Check for conflicts – No conflicts => G is an LL(1) grammar – Conflicts exit => G is not an LL(1) grammar • Attempt to transform G into an equivalent LL(1) grammar G’ 22

First sets 23

FIRST sets • Definition: For a nonterminal A, FIRST(A) is the set of terminals that can start in a sentence derived from A – Formally: FIRST(A) = {t | A * t ω} • Definition: For a sentential form α, FIRST(α) is the set of terminals that can start in a sentence derived from α – Formally: FIRST(α) = {t | α * t ω} 24

FIRST sets example E LIT | (E OP E) | not E LIT true | false OP and | or | xor • FIRST(E) = FIRST(LIT) �FIRST(( E OP E )) �FIRST(not E) • FIRST(LIT) = { true, false } • FIRST(OP) = {and, or, xor} • A set of recursive equations • How do we solve them? 26

Computing FIRST sets Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t | A �t ω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A �α 1 | … | αk FIRST(A) : = FIRST(α 1) … FIRST(αk) • This is known as a fixed-point algorithm • We will see such iterative methods later in the course and learn to reason about them 27

Exercise: compute FIRST(STMT) = FIRST(if EXPR then STMT) ∪ FIRST(while EXPR do STMT) ∪ FIRST(EXPR) = FIRST(TERM -> id) ∪ FIRST(zero? TERM) ∪ FIRST(not EXPR) ∪ FIRST(++ id) ∪ FIRST(-- id) FIRST(TERM) = FIRST(id) ∪ FIRST(constant) STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM 28

Exercise: compute FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --} ∪ FIRST(TERM) = {id, constant} STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM 29

1. Initialization FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --} ∪ FIRST(TERM) = {id, constant} STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 30

2. Iterate 1 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --} ∪ FIRST(TERM) = {id, constant} STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant zero? Not ++ -- 31

2. Iterate 2 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --} ∪ FIRST(TERM) = {id, constant} STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 32

2. Iterate 3 – fixed-point FIRST(STMT) = {if, while} ∪ FIRST(EXPR) = {zero? , not, ++, --} ∪ FIRST(TERM) = {id, constant} STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant STMT EXPR TERM if while zero? Not ++ -- id constant 33

Reasoning about the algorithm Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t | A �t ω for some ω } 2. Iterate to fixpoint: for each nonterminal A for each production A �α 1 | … | αk FIRST(A) : = FIRST(α 1) ∪ … ∪ FIRST(αk) • Is the algorithm correct? • Does it terminate? (complexity) 34

Reasoning about the algorithm • Termination: • Correctness: 35

LL(1) Parsing of grammars without epsilon productions 36

Using FIRST sets • Assume G has no epsilon productions and for every non-terminal X and every pair of productions X and X we have that FIRST( ) �FIRST( ) = {} • No intersection between FIRST sets => can always pick a single rule 37

Using FIRST sets • In our Boolean expressions example – FIRST( LIT ) = { true, false } – FIRST( ( E OP E ) ) = { ‘(‘ } – FIRST( not E ) = { not } • If the FIRST sets intersect, may need longer lookahead – LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens – LL(1) is an important and useful class 38

Exercise: LL(1) prediction table Terminals: id. num $ (1) S E $ (2) E A B (3) E B (4) A id B (5) B . id A (6) B num id . FIRST(S) = FIRST(E) = FIRST(A) = FIRST(B) = num $ S E A B 39

Extending LL(1) Parsing for epsilon productions 40

FIRST, FOLLOW, NULLABLE sets • For each non-terminal X • FIRST(X) = set of terminals that can start in a sentence derived from X – FIRST(X) = {t | X * t ω} • NULLABLE(X) if X * • FOLLOW(X) = set of terminals that can follow X in some derivation – FOLLOW(X) = {t | S * X t } 41

Computing the NULLABLE set • Lemma: NULLABLE( 1 … k) = NULLABLE( 1) � … �NULLABLE( k) • If X 1 | … | k then we have the following equation: NULLABLE(X) = NULLABLE( 1) … NULLABLE( k) 1. Initially NULLABLE(X) = false 2. Iterate to fixpoint: for each production Y 1 … k if NULLABLE( 1 … k) then NULLABLE(Y) = true 42

Exercise: compute NULLABLE S Aab A a| B AB|C C b| NULLABLE(S) = NULLABLE(A) �NULLABLE(a) �NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) NULLABLE(B) = NULLABLE(A) �NULLABLE(B) NULLABLE(C) = NULLABLE(b) NULLABLE( ) 43

FIRST with epsilon productions • How do we compute FIRST( 1 … k) when epsilon productions are allowed? – FIRST( 1 … k) = ? 44

FIRST with epsilon productions • How do we compute FIRST( 1 … k) when epsilon productions are allowed? – FIRST( 1 … k) = if not NULLABLE( 1) then FIRST( 1) else FIRST( 1) �FIRST ( 2 … k) 45

Exercise: compute FIRST S Acb A a| NULLABLE(S) = NULLABLE(A) �NULLABLE(c) �NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) FIRST(S) = FIRST(A) �FIRST(cb) FIRST(A) = FIRST(a) �FIRST ( ) S Acb A a| FIRST(S) = FIRST(A) �{c} FIRST(A) = {a} What should we predict for input “acb”? What should we predict for input “cb”? 46

FOLLOW sets • FOLLOW(X) = set of terminals that can follow X in some derivation • FOLLOW(X) = {t | S * X t } 47

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) ? if NULLABLE( ) or = then FOLLOW(Y) ? 48

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) ? 49

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) FOLLOW(X) 50

FOLLOW sets p. 189 • if X �α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) (or = ) then FOLLOW(Y) FOLLOW(X) • Allows predicting nullable productions: X �α where NULLABLE(α) when the lookahead token is in FOLLOW(X) S Acb A a | | c What should we predict for input “cb”? What should we predict for input “acb”? 51

Filling the prediction table • Table[N, t] = N �α if 1. t FIRST(α) or 2. NULLABLE(α) and t FOLLOW(N) 52

LL(1) conflicts 53

Conflicts • FIRST-FIRST conflict – X �α and X � and – If FIRST(α) FIRST(β) {} • FIRST-FOLLOW conflict – X �α – NULLABLE(α) – If FIRST(α) FOLLOW(X) {} 54

LL(1) grammars • A grammar is in the class LL(1) when its LL(1) prediction table contains no conflicts • A language is said to be LL(1) when it has an LL(1) grammar 55

LL(k) grammars 56

LL(k) grammars • Generalizes LL(1) for k lookahead tokens • Need to generalize FIRST and FOLLOW for k lookahead tokens 57

Agenda • Understand role of syntax analysis • Parsing strategies • LL parsing – Building a predictor table via FIRST/FOLLOW/NULLABLE sets • Handling conflicts 58

Handling conflicts 59

Problem 1: FIRST-FIRST conflict term ID | indexed_elem ID [ expr ] • FIRST(indexed_elem) = { ID } • How can we transform the grammar into an equivalent grammar that does not have this conflict? 60

Solution: left factoring • Rewrite the grammar to be in LL(1) term ID | indexed_elem ID [ expr ] New grammar is more complex – has epsilon production term ID after_ID After_ID [ expr ] | Intuition: just like factoring in algebra: x*y + x*z into x*(y+z) 61

Exercise: apply left factoring S if E then S else S | if E then S |T 62

Exercise: apply left factoring S if E then S else S | if E then S |T S if E then S S’ |T S’ else S | 63

Problem 2: FIRST-FOLLOW conflict S Aab A a| • FIRST(S) = { a } • FIRST(A) = { a } FOLLOW(S) = { } FOLLOW(A) = { a } • How can we transform the grammar into an equivalent grammar that does not have this conflict? 64

Solution: substitution S Aab A a| Substitute A in S S aab|ab 65

Solution: substitution S Aab A a| Substitute A in S S aab|ab Left factoring S a after_A a b | b 66

Problem 3: FIRST-FIRST conflict E E - term | term • Left recursion cannot be handled with a bounded lookahead • How can we transform the grammar into an equivalent grammar that does not have this conflict? 67

Solution: left recursion removal N Nα | β N βN’ N’ αN’ | G 1 • L(G 1) = β, βαα, βααα, … • L(G 2) = same For our 3 rd example: E E - term | term p. 130 G 2 Can be done algorithmically. Problem 1: grammar becomes mangled beyond recognition Problem 2: grammar may not be LL(1) E term TE | term TE - term TE | 68

Recap • Given a grammar • Compute for each non-terminal – NULLABLE – FIRST using NULLABLE – FOLLOW using FIRST and NULLABLE • Compute FIRST for each sentential form appearing on right-hand side of a production • Check for conflicts – If exist, attempt to remove conflicts by rewriting grammar 69

Next lecture: bottom-up parsing