CMPUT 680 Fall 2006 Topic 2 Parsing and

Reading List Appel, Chapter 2, 3, 4, and 5 Aho. Sethi. Ullman, Chapter 2,

Some Important Basic Definitions lexical: of or relating to the morphemes of a language.

Some Important Basic Definitions syntax: the way in which words are put together to

Some Important Basic Definitions parsing = lexical analysis + syntax analysis semantic analysis: the

Regular Expressions Symbol: a A regular expression formed by a. Alternation: M | N

Building a Recognizer for a Language General approach: 1. Build a deterministic finite automaton

Finite Automata A nondeterministic finite automaton A = {S, , s 0, F, move

Finite Automata What languages are accepted by these automata? A Deterministic Finite Automaton (DFA):

Another NFA a a start b b An -transition is taken without consuming any

Constructing NFA How do we define an NFA that accepts a regular expression? It

Composing NFAs with Alternation The NFA for a symbol a is: start i a

Composing NFAs with Concatenation Given two NFA N(s) and N(t), the NFA N(st) is:

Composing NFAs with Repetition The NFA for N(s*) is i N(s) f CMPUT 680

Properties of the NFA z. Following this construction rules, we obtain an NFA N(r)

How to Parse a Regular Expression? Given a DFA, we can generate an automaton

Regular expression notation: An Example a M|N MN M* M+ M? [a -z. A

Regular expressions for some tokens if {return IF; } [a - z] [a -

Building Finite Automatas for Lexical Tokens if {return IF; } The NFA for a

Building Finite Automatas for Lexical Tokens [a-z] [a-z 0 -9 ] * {return ID;

Building Finite Automatas for Lexical Tokens [0 - 9] + {return NUM; } start

Building Finite Automatas for Lexical Tokens ([0 - 9] + “. ” [0 -

Building Finite Automatas for Lexical Tokens (“--” [a - z]* “n”) | (“ ”

Building Finite Automatas for Lexical Tokens i a-z f 2 1 1 3 a-z

Conversion of NFA into DFA 2 i 1 f 3 IF 4 a-z 5

Conversion of NFA into DFA Given a set of NFA states T, the -closure(T)

Problem Statement for Conversion of NFA into DFA Given an NFA find the DFA

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5

The corresponding DFA a-e, g-z, 0 -9 ID 2 -5 -6 -8 -15 i

Lexical Analyzer and Parser next char get next char Source Program lexical analyzer next

Definition of Context-Free Grammars A context-free grammar G = (T, N, S, P) consists

Syntax Analysis Problem Statement: To find a derivation sequence in a grammar G for

Parse trees A parse tree is a graphical representation of a derivation sequence of

Derivation Given the following grammar: E E + E | E E | (

Another Derivation Example E E + E | E E | ( E )

Left Recursion Consider the grammar: E E+T|T T T F|F F ( E )

Left Recursion This left-recursive grammar: E E+T|T T T F|F F ( E )

Predictive Parsing Consider the grammar: stm if expr then stmt else stmt | while

Left Factoring The following grammar: stmt if expr then stmt else stmt | if

A Predictive Parser Grammar: E TE’ E’ +TE’ | T FT’ T’ FT’ |

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: E T

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: F T

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: id T

A Predictive Parser Action when Top(Stack) = input $ : Pop stack, advance input.

A Predictive Parser E The predictive parser proceeds in this fashion emiting the T

LL(k) Parser This parser parses from left to right, and does a leftmost-derivation. It

The Parsing Table Given this grammar: E TE’ E’ +TE’ | T FT’ T’

FIRST and FOLLOW We need to build a FIRST set and a FOLLOW set

Rules to Create FIRST GRAMMAR: FIRST rules: E TE’ E’ +TE’ | T FT’

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T)

GRAMMAR: FIRST SETS: FOLLOW SETS: FIRST(E’) = {+, } FIRST(T’) = { , }

Bottom-Up and Top-Down Parsers Top-down parsers: starts constructing the parse tree at the top

Bottom-Up Parser A bottom-up parser, or a shift-reduce parser, begins at the leaves and

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d

Bottom-Up Parser Example INPUT: a A B e $ OUTPUT: S Production S a.

Bottom-Up Parser Example INPUT: S $ OUTPUT: S Production S a. ABe A Abc

Bottom-Up Parser Example The scanning of productions for matching with handles in the input

LR Parser Example Input S t a c k LR Parsing Program action Output

LR Parser Example The following grammar: Can be parsed with this action and goto

GRAMMAR: (1) E E + T (2) E T (3) T T F (4)

Constructing Parsing Tables All LR parsers use the same parsing program that we demonstrated

Using Lex source program lex. l Lex compiler lex. yy. c C compiler a.

Parsing Action Conflicts If the grammar specified is ambiguous, yacc will report parsing action

Three-Address Statements A popular form of intermediate code used in optimizing compilers is threeaddress

Intermediate Code Generation Reading List: Aho-Sethi-Ullman: Chapter 8. 1 ~ 8. 3, Chapter 8.

Front End of a Compiler Lexical Analyzer (Scanner) + Syntax Analyzer (Parser) + Semantic

Component-Based Approach to Building Compilers Source program in Language-1 Source program in Language-2 Language-1

Advantages of Using an Intermediate Language 1. Retargeting - Build a compiler for a

position : = initial + rate * 60 intermediate code generator id 1 :

Slides: 125

Download presentation

CMPUT 680 - Fall 2006 Topic 2: Parsing and Lexical Analysis José Nelson Amaral http: //www. cs. ualberta. ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization 1

Reading List Appel, Chapter 2, 3, 4, and 5 Aho. Sethi. Ullman, Chapter 2, 3, 4, and 5 CMPUT 680 - Compiler Design and Optimization 2

Some Important Basic Definitions lexical: of or relating to the morphemes of a language. morpheme: a meaningul linguistic unit that cannot be divided into smaller meaningful parts. lexical analysis: the task concerned with breaking an input into its smallest meaningful units, called tokens. CMPUT 680 - Compiler Design and Optimization 3

Some Important Basic Definitions syntax: the way in which words are put together to form phrases, clauses, or sentences. The rules governing the formation of statements in a programming language. syntax analysis: the task concerned with fitting a sequence of tokens into a specified syntax. parsing: To break a sentence down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. CMPUT 680 - Compiler Design and Optimization 4

Some Important Basic Definitions parsing = lexical analysis + syntax analysis semantic analysis: the task concerned with calculating the program’s meaning. CMPUT 680 - Compiler Design and Optimization 5

Regular Expressions Symbol: a A regular expression formed by a. Alternation: M | N A regular expression formed by M or N. Concatenation: M • N A regular expression formed by M followed by N. Epsilon: Repetition: M* The empty string. A regular expression formed by zero or more repetitions of M. CMPUT 680 - Compiler Design and Optimization 6

Building a Recognizer for a Language General approach: 1. Build a deterministic finite automaton (DFA) from regular expression E 2. Execute the DFA to determine whether an input string belongs to L(E) Note: The DFA construction is done automatically by a tool such as lex. CMPUT 680 - Compiler Design and Optimization 7

Finite Automata A nondeterministic finite automaton A = {S, , s 0, F, move } consists of: 1. A set of states S 2. A set of input symbols (the input symbol alphabet) 3. A state s 0 that is distinguished as the start state 4. A state F distinguished as the accepting state 5. A transition function move that maps state-symbol pairs into sets of state. In a Deterministic Finite State Automata (DFA), the function move maps each state-symbol pair into a unique state. CMPUT 680 - Compiler Design and Optimization 8

Finite Automata What languages are accepted by these automata? A Deterministic Finite Automaton (DFA): start 0 a 1 b 2 b 3 b*abb b A Nondeterministic Finite Automaton (NFA): a start 0 b a 1 b 2 b CMPUT 680 - Compiler Design and Optimization 3 (a|b)*abb 9 (Aho, Sethi, Ullman, pp. 114)

Another NFA a a start b b An -transition is taken without consuming any character from the input. What does the NFA above accepts? aa*|bb* CMPUT 680 - Compiler Design and Optimization 10 (Aho, Sethi, Ullman, pp. 116)

Constructing NFA How do we define an NFA that accepts a regular expression? It is very simple. Remember that a regular expression is formed by the use of alternation, concatenation, and repetition. Thus all we need to do is to know how to build the NFA for a single symbol, and how to compose NFAs. CMPUT 680 - Compiler Design and Optimization 11

Composing NFAs with Alternation The NFA for a symbol a is: start i a f Given two NFA N(s) and N(t), the NFA N(s|t) is: N(s) start i f N(t) CMPUT 680 - Compiler Design and Optimization 12 (Aho, Sethi, Ullman, pp. 122)

Composing NFAs with Concatenation Given two NFA N(s) and N(t), the NFA N(st) is: start i N(s) CMPUT 680 - Compiler Design and Optimization N(t) f 13 (Aho, Sethi, Ullman, pp. 123)

Composing NFAs with Repetition The NFA for N(s*) is i N(s) f CMPUT 680 - Compiler Design and Optimization 14 (Aho, Sethi, Ullman, pp. 123)

Properties of the NFA z. Following this construction rules, we obtain an NFA N(r) with these properties: y. N(r) has at most twice as many states as the number of symbols and operators in r; y. N(r) has exactly one starting and one accepting state; y. Each state of N(r) has at most one outgoing transition on a symbol of the alphabet or at most two outgoing -transitions. CMPUT 680 - Compiler Design and Optimization 15 (Aho, Sethi, Ullman, pp. 124)

How to Parse a Regular Expression? Given a DFA, we can generate an automaton that recognizes the longest substring of an input that is a valid token. Using the three simple rules presented, it is easy to generate an NFA to recognize a regular expression. Given a regular expression, how do we generate an automaton to recognize tokens? Create an NFA and convert it to a DFA. CMPUT 680 - Compiler Design and Optimization 16

Regular expression notation: An Example a M|N MN M* M+ M? [a -z. A -Z] . “a. +*” An ordinary character stands for itself. The empty string. Another way to write the empty string. Alternation, Choosing from M or N. Concatenation, an M followed by an N. Repetition (zero or more times). Repetition (one or more times). Optional, zero or one occurrence of M. Character set alternation. Stands for any single character except newline. Quotation, a string in quotes stands for itself literally. CMPUT 680 - Compiler Design and Optimization 17 (Appel, pp. 20)

Regular expressions for some tokens if {return IF; } [a - z] [a - z 0 - 9 ] * {return ID; } [0 - 9] + {return NUM; } ([0 - 9] + “. ” [0 - 9] *) | ([0 - 9] * “. ” [0 - 9] +) {return REAL; } (“--” [a - z]* “n”) | (“ ” | “ n ” | “ t ”) + {/* do nothing*/} . {error (); } CMPUT 680 - Compiler Design and Optimization 18 (Appel, pp. 20)

Building Finite Automatas for Lexical Tokens if {return IF; } The NFA for a symbol i is: start 1 i 2 The NFA for a symbol f is: start 1 f 2 The NFA for the regular expression if is: start 1 i 2 CMPUT 680 - Compiler Design and Optimization f 3 IF 19 (Appel, pp. 21)

Building Finite Automatas for Lexical Tokens [a-z] [a-z 0 -9 ] * {return ID; } a-z start 1 a-z 2 ID CMPUT 680 - Compiler Design and Optimization 0 -9 20 (Appel, pp. 21)

Building Finite Automatas for Lexical Tokens [0 - 9] + {return NUM; } start 1 0 -9 2 NUM CMPUT 680 - Compiler Design and Optimization 0 -9 21 (Appel, pp. 21)

Building Finite Automatas for Lexical Tokens ([0 - 9] + “. ” [0 - 9] *) | ([0 - 9] * “. ” [0 - 9] +) {return REAL; } 0 -9 start 1 0 -9. 4 2 0 -9. 5 3 0 -9 REAL CMPUT 680 - Compiler Design and Optimization 22 (Appel, pp. 21)

Building Finite Automatas for Lexical Tokens (“--” [a - z]* “n”) | (“ ” | “ n ” | “ t ”) + {/* do nothing*/} a-z start - 1 blank t 2 n n 5 blank 3 n 4 /* do nothing */ t CMPUT 680 - Compiler Design and Optimization 23 (Appel, pp. 21)

Building Finite Automatas for Lexical Tokens i a-z f 2 1 1 3 a-z 2 1 0 -9 2 0 -9 IF 0 -9 1 . ID . 2 0 -9 1 - 2 - 3 blank, etc. 4 0 -9 5 REAL NUM 0 -9 5 3 4 n a-z 1 any but n 2 blank, etc. White space CMPUT 680 - Compiler Design and Optimization error 24 (Appel, pp. 21)

Conversion of NFA into DFA 2 i 1 f 3 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error What states can be reached from state 1 without consuming a character? CMPUT 680 - Compiler Design and Optimization 25 (Appel, pp. 27)

Conversion of NFA into DFA 2 i 1 f 3 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error What states can be reached from state 1 without consuming a character? {1, 4, 9, 14} form the -closure of state 1 CMPUT 680 - Compiler Design and Optimization 26 (Appel, pp. 27)

Conversion of NFA into DFA 2 i 1 f 3 IF 4 a-z 5 a-z 6 14 0 -9 7 any character 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error What are all the state closures in this NFA? closure(1) closure(5) closure(8) closure(7) = = {1, 4, 9, 14} {5, 6, 8} {7, 8} closure(10) = {10, 11, 13} closure(13) = {11, 13} closure(12) = {12, 13} CMPUT 680 - Compiler Design and Optimization 27 (Appel, pp. 27)

Conversion of NFA into DFA Given a set of NFA states T, the -closure(T) is the set of states that are reachable through -transiton from any state s T. Given a set of NFA states T, move(T, a) is the set of states that are reachable on input a from any state s T. CMPUT 680 - Compiler Design and Optimization 28 (Aho, Sethi, Ullman, pp. 118)

Problem Statement for Conversion of NFA into DFA Given an NFA find the DFA with the minimum number of states that has the same behavior as the NFA for all inputs. If the initial state in the NFA is s 0, then the set of states in the DFA, Dstates, is initialized with a state representing -closure(s 0). CMPUT 680 - Compiler Design and Optimization 29 (Aho, Sethi, Ullman, pp. 118)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} 1 -4 -9 -14 Now we need to compute: move(1 -4 -9 -14, a-h) = ? CMPUT 680 - Compiler Design and Optimization 30 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} 1 -4 -9 -14 Now we need to compute: move(1 -4 -9 -14, a-h) = {5, 15} -closure({5, 15}) CMPUT 680 - Compiler Design and Optimization =? 31 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 1 -4 -9 -14 5 -6 -8 -15 Now we need to compute: move(1 -4 -9 -14, a-h) = {5, 15} -closure({5, 15}) CMPUT 680 - Compiler Design and Optimization = {5, 6, 8, 15} 32 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 move(1 -4 -9 -14, i) = ? 1 -4 -9 -14 CMPUT 680 - Compiler Design and Optimization 33 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 1 -4 -9 -14 5 -6 -8 -15 move(1 -4 -9 -14, i) = {2, 5, 15} -closure({2, 5, 15}) CMPUT 680 - Compiler Design and Optimization =? 34 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 0 -9 7 any character 9 15 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 1 -4 -9 -14 i move(1 -4 -9 -14, i) = {2, 5, 15} -closure({2, 5, 15}) = {2, 5, 6, 8, 15} 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design and Optimization 35 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 0 -9 7 any character 9 15 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 move(1 -4 -9 -14, j-z) = ? 1 -4 -9 -14 i 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design and Optimization 36 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 0 -9 7 any character 9 15 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 1 -4 -9 -14 i move(1 -4 -9 -14, j-z) = {5, 15} -closure({5, 15}) =? 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design and Optimization 37 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} move(1 -4 -9 -14, j-z) = {5, 15} a-h 5 -6 -8 -15 j-z -closure({5, 15}) = {5, 6, 8, 15} 1 -4 -9 -14 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design i and Optimization 38 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 j-z -closure({10, 15}) = {10, 11, 13, 15} 1 -4 -9 -14 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design i 10 -11 -13 -15 0 -9 move(1 -4 -9 -14, 0 -9) = {10, 15} and Optimization 39 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} a-h 5 -6 -8 -15 j-z -closure({15}) = {15} 1 -4 -9 -14 2 -5 -6 -8 -15 CMPUT 680 - Compiler Design i and Optimization 15 10 -11 -13 -15 0 -9 other move(1 -4 -9 -14, other) = {15} 40 (Appel, pp. 27)

Conversion of NFA into DFA f 2 i 3 1 IF 4 a-z 5 a-z 6 14 any character 0 -9 7 15 9 0 -9 10 11 0 -9 ID 8 NUM 12 13 error Dstates = {1 -4 -9 -14} The analysis for 1 -4 -9 -14 a-h 5 -6 -8 -15 j-z is complete. We mark it and 1 -4 -9 -14 pick another state in the DFA 2 -5 -6 -8 -15 to Design analyse. CMPUT 680 - Compiler i and Optimization 41 15 (Appel, pp. 27) 10 -11 -13 -15 0 -9 other

The corresponding DFA a-e, g-z, 0 -9 ID 2 -5 -6 -8 -15 i 1 -4 -9 -14 a-h j-z 0 -9 ID 5 -6 -8 -15 f IF 3 -6 -7 -8 a-z, 0 -9 ID 6 -7 -8 a-z, 0 -9 NUM 0 -9 11 -12 -13 10 -11 -13 -15 error other 15 a-z, 0 -9 See pp. 118 of Aho-Sethi-Ullman and pp. 29 of Appel. CMPUT 680 - Compiler Design and Optimization 42 (Appel, pp. 29)

Lexical Analyzer and Parser next char get next char Source Program lexical analyzer next token get next token Syntax analyzer symbol table (Contains a record for each identifier) token: smallest meaningful sequence of characters of interest in source program CMPUT 680 - Compiler Design and Optimization 43 (Aho, Sethi, Ullman, pp. 160)

Definition of Context-Free Grammars A context-free grammar G = (T, N, S, P) consists of: 1. T, a set of terminals (scanner tokens). 2. N, a set of nonterminals (syntactic variables generated by productions). 3. S, a designated start nonterminal. 4. P, a set of productions. Each production has the form, A: : = , where A is a nonterminal and is a sentential form , i. e. , a string of zero or more grammar symbols (terminals/nonterminals). CMPUT 680 - Compiler Design and Optimization 44

Syntax Analysis Problem Statement: To find a derivation sequence in a grammar G for the input token stream (or say that none exists). CMPUT 680 - Compiler Design and Optimization 45

Parse trees A parse tree is a graphical representation of a derivation sequence of a sentential form. Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps. CMPUT 680 - Compiler Design and Optimization 46

Derivation Given the following grammar: E E + E | E E | ( E ) | - E | id Is the string -(id + id) a sentence in this grammar? Yes because there is the following derivation: E -(E) -(E + E) -(id + id) Where reads “derives in one step”. CMPUT 680 - Compiler Design and Optimization 47 (Aho, Sethi, Ullman, pp. 168)

Derivation E E + E | E E | ( E ) | - E | id Lets examine this derivation: E -(E) -(E + E) -(id + id) E E - E - E ( E ) This is a top-down derivation because we start building the parse tree at the top E - E E ( E ) E + E CMPUT 680 - Compiler Design and Optimization id id parse tree 48 (Aho, Sethi, Ullman, pp. 170)

Another Derivation Example E E + E | E E | ( E ) | - E | id Find a derivation for the expression: id + id E E E + E id E E E E id E E + E Which derivation tree correct? CMPUT 680 -is Compiler Design E and Optimization E id E E + E id id 49 (Aho, Sethi, Ullman, pp. 171)

Another Derivation Example E E + E | E E | ( E ) | - E | id Find a derivation for the expression: id + id E According to the grammar, both are correct. E + E id E A grammar that produces more than one parse tree for any input sentence is said to be an ambiguous grammar. id id E E CMPUT 680 - Compiler Design and Optimization E id E + E E id id 50 (Aho, Sethi, Ullman, pp. 171)

Left Recursion Consider the grammar: E E+T|T T T F|F F ( E ) | id A top-down parser might loop forever when parsing an expression using this grammar E E E + E T E E + + T E CMPUT 680 - Compiler Design and Optimization E + T + T T 51 (Aho, Sethi, Ullman, pp. 176)

Left Recursion Consider the grammar: E E+T|T T T F|F F ( E ) | id A grammar that has at least one production of the form A A is a left recursive grammar. Top-down parsers do not work with left-recursive grammars. Left-recursion can often be eliminated by rewriting the grammar. CMPUT 680 - Compiler Design and Optimization 52 (Aho, Sethi, Ullman, pp. 176)

Predictive Parsing Consider the grammar: stm if expr then stmt else stmt | while expr do stmt | begin stmt_list end A parser for this grammar can be written with the switch(gettoken()) following simple structure: { Based only on the first token, the parser knows which rule to use to derive a statement. Therefore this is called a predictive parser. CMPUT 680 - Compiler Design and Optimization } case if: …. break; case while: …. break; case begin: …. break; default: reject input; 54 (Aho, Sethi, Ullman, pp. 183)

Left Factoring The following grammar: stmt if expr then stmt else stmt | if expr then stmt Cannot be parsed by a predictive parser that looks one element ahead. But the grammar can be re-written: stmt if expr then stmt’ stmt‘ else stmt | Where is the empty string. Rewriting a grammar to eliminate multiple productions starting with the same token is called left factoring. CMPUT 680 - Compiler Design and Optimization 55 (Aho, Sethi, Ullman, pp. 178)

A Predictive Parser Grammar: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id Parsing Table: CMPUT 680 - Compiler Design and Optimization 56 (Aho, Sethi, Ullman, pp. 188)

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: E T E’ $ $ Predictive Parsing Program T E’ PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 57

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: F T E’ T’ E’ $ $ Predictive Parsing Program PARSING TABLE: CMPUT 680 - Compiler Design and Optimization T F E’ T’ (Aho, Sethi, Ullman, pp. 186) 58

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: id T F E’ T’ E’ $ $ Predictive Parsing Program T F E’ T’ id PARSING TABLE: CMPUT 680 - Compiler Design and Optimization (Aho, Sethi, Ullman, pp. 188) 59

A Predictive Parser Action when Top(Stack) = input $ : Pop stack, advance input. INPUT: id + id id $ OUTPUT: E STACK: id F T’ E’ $ Predictive Parsing Program T F E’ T’ id PARSING TABLE: CMPUT 680 - Compiler Design and Optimization (Aho, Sethi, Ullman, pp. 188) 60

A Predictive Parser INPUT: id + id id $ OUTPUT: E STACK: E’ T’ E’ $ $ Predictive Parsing Program PARSING TABLE: CMPUT 680 - Compiler Design and Optimization T E’ F T’ id (Aho, Sethi, Ullman, pp. 188) 61

A Predictive Parser E The predictive parser proceeds in this fashion emiting the T following productions: F T’ E’ +TE’ T FT’ id F id T’ F id When Top(Stack) = input = $ T’ the parser halts and accepts the E’ input string. CMPUT 680 - Compiler Design and Optimization E’ + T F id E’ T’ F T’ id (Aho, Sethi, Ullman, pp. 188) 62

LL(k) Parser This parser parses from left to right, and does a leftmost-derivation. It looks up 1 symbol ahead to choose its next action. Therefore, it is known as a LL(1) parser. An LL(k) parser looks k symbols ahead to decide its action. CMPUT 680 - Compiler Design and Optimization 63

The Parsing Table Given this grammar: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id How is this parsing table built? PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 64

FIRST and FOLLOW We need to build a FIRST set and a FOLLOW set for each symbol in the grammar. The elements of FIRST and FOLLOW are terminal symbols. FIRST( ) is the set of terminal symbols that can begin any string derived from . FOLLOW( ) is the set of terminal symbols that can follow : t FOLLOW( ) derivation containing t CMPUT 680 - Compiler Design and Optimization 65 (Aho, Sethi, Ullman, pp. 189)

Rules to Create FIRST GRAMMAR: FIRST rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If X is a terminal, FIRST(X) = {X} 2. If X , then FIRST(X) 3. If X Y 1 Y 2 • • • Yk * and Y 1 • • • Yi-1 and a FIRST(Yi) then a FIRST(X) SETS: FIRST(id) = {id} FIRST( ) = { } FIRST(+) = {+} FIRST(() = {(} FIRST()) = {)} FIRST(E’) = { } {+, } FIRST(T’) = { } { , } FIRST(F) = {(, id} FIRST(T) = FIRST(F) = {(, id} CMPUT 680 - Compiler Design FIRST(E) = FIRST(T) = {(, id} and Optimization 66 (Aho, Sethi, Ullman, pp. 189)

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} Rules to Create FOLLOW GRAMMAR: FOLLOW rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If S is the start symbol, then $ FOLLOW(S) 2. If A B , and a FIRST( ) and a then a FOLLOW(B) 3. If A B and a FOLLOW(A) then a FOLLOW(B) 3 a. If A B * and a FOLLOW(A) then a FOLLOW(B) SETS: FOLLOW(E) = {$} { ), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} A and B are non-terminals, CMPUT 680 - Compiler Design and are strings of grammar symbols and Optimization 67 (Aho, Sethi, Ullman, pp. 189)

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} Rules to Create FOLLOW GRAMMAR: FOLLOW rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If S is the start symbol, then $ FOLLOW(S) 2. If A B , and a FIRST( ) and a then a FOLLOW(B) 3. If A B and a FOLLOW(A) then a FOLLOW(B) 3 a. If A B * and a FOLLOW(A) then a FOLLOW(B) SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} {+, ), $} CMPUT 680 - Compiler Design and Optimization 68 (Aho, Sethi, Ullman, pp. 189)

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} Rules to Create FOLLOW GRAMMAR: FOLLOW rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If S is the start symbol, then $ FOLLOW(S) 2. If A B , and a FIRST( ) and a then a FOLLOW(B) 3. If A B and a FOLLOW(A) then a FOLLOW(B) 3 a. If A B * and a FOLLOW(A) then a FOLLOW(B) SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} CMPUT 680 - Compiler Design and Optimization 69 (Aho, Sethi, Ullman, pp. 189)

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} Rules to Create FOLLOW GRAMMAR: FOLLOW rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If S is the start symbol, then $ FOLLOW(S) 2. If A B , and a FIRST( ) and a then a FOLLOW(B) 3. If A B and a FOLLOW(A) then a FOLLOW(B) 3 a. If A B * and a FOLLOW(A) then a FOLLOW(B) SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, ), $} CMPUT 680 - Compiler Design and Optimization 70 (Aho, Sethi, Ullman, pp. 189)

FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} Rules to Create FOLLOW GRAMMAR: FOLLOW rules: E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id 1. If S is the start symbol, then $ FOLLOW(S) 2. If A B , and a FIRST( ) and a then a FOLLOW(B) 3. If A B and a FOLLOW(A) then a FOLLOW(B) 3 a. If A B * and a FOLLOW(A) then a FOLLOW(B) SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, ), $} {+, , ), $} CMPUT 680 - Compiler Design and Optimization 71 (Aho, Sethi, Ullman, pp. 189)

GRAMMAR: FIRST SETS: FOLLOW SETS: FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = { } FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} Rules to Build Parsing Table (, id E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} 1. If A : if a FIRST( ), add A to M[A, a] PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 72 (Aho, Sethi, Ullman, pp. 190)

GRAMMAR: FIRST SETS: FOLLOW SETS: FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = { } FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} Rules to Build Parsing Table (, id E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} 1. If A : if a FIRST( ), add A to M[A, a] 2. If A : if FIRST( ), add A to M[A, b] for each terminal b FOLLOW(A), PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 77 (Aho, Sethi, Ullman, pp. 190)

GRAMMAR: FIRST SETS: FOLLOW SETS: FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = { } FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} Rules to Build Parsing Table (, id E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} 1. If A : if a FIRST( ), add A to M[A, a] 2. If A : if FIRST( ), add A to M[A, b] for each terminal b FOLLOW(A), PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 78 (Aho, Sethi, Ullman, pp. 190)

GRAMMAR: FIRST SETS: FOLLOW SETS: FIRST(E’) = {+, } FIRST(T’) = { , } FIRST(F) = { } FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} Rules to Build Parsing Table (, id E TE’ E’ +TE’ | T FT’ T’ FT’ | F ( E ) | id FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} 1. If A : if a FIRST( ), add A to M[A, a] 2. If A : if FIRST( ), add A to M[A, b] for each terminal b FOLLOW(A), 3. If A : if FIRST( ), and $ FOLLOW(A), add A to M[A, $] PARSING TABLE: CMPUT 680 - Compiler Design and Optimization 79 (Aho, Sethi, Ullman, pp. 190)

Bottom-Up and Top-Down Parsers Top-down parsers: starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. example: predictive parsers Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. examples: shift-reduce parser (or LR(k) parsers) CMPUT 680 - Compiler Design and Optimization 80 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser A bottom-up parser, or a shift-reduce parser, begins at the leaves and works up to the top of the tree. The reduction steps trace a rightmost derivation on reverse. Consider the Grammar: S a. ABe A Abc | b B d We want to parse the input string abbcde. CMPUT 680 - Compiler Design and Optimization 81 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a b b c d e $ OUTPUT: Bottom-Up Parsing Program CMPUT 680 - Compiler Design and Optimization 82 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a b b c d e $ Bottom-Up Parsing Program OUTPUT: A b CMPUT 680 - Compiler Design and Optimization 83 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A b c d e $ Bottom-Up Parsing Program OUTPUT: A b CMPUT 680 - Compiler Design and Optimization 84 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A b c d e $ Bottom-Up Parsing Program OUTPUT: A b We are not reducing here in this example. A parser would reduce, get stuck and then backtrack! CMPUT 680 - Compiler Design and Optimization 85 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A b c d e $ Bottom-Up Parsing Program OUTPUT: A A b c b CMPUT 680 - Compiler Design and Optimization 86 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A d e $ Bottom-Up Parsing Program OUTPUT: A A b c b CMPUT 680 - Compiler Design and Optimization 87 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A d e $ Bottom-Up Parsing Program OUTPUT: A A B b c d b CMPUT 680 - Compiler Design and Optimization 88 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: Production S a. ABe A Abc A b B d a A B e $ Bottom-Up Parsing Program OUTPUT: A A B b c d b CMPUT 680 - Compiler Design and Optimization 89 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: a A B e $ OUTPUT: S Production S a. ABe A Abc A b B d Bottom-Up Parsing Program a A B A b c d e b CMPUT 680 - Compiler Design and Optimization 90 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example INPUT: S $ OUTPUT: S Production S a. ABe A Abc A b B d Bottom-Up Parsing Program a A B A b c d e b This parser is known as an LR Parser because it scans the input from Left to right, and it constructs CMPUT derivation 680 - Compiler Design a Rightmost in reverse order. and Optimization 91 (Aho, Sethi, Ullman, pp. 195)

Bottom-Up Parser Example The scanning of productions for matching with handles in the input string, and backtracking makes the method used in the previous example very inneficient. Can we do better? CMPUT 680 - Compiler Design and Optimization 92

LR Parser Example Input S t a c k LR Parsing Program action Output goto CMPUT 680 - Compiler Design and Optimization 93 (Aho, Sethi, Ullman, pp. 217)

LR Parser Example The following grammar: Can be parsed with this action and goto table (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id CMPUT 680 - Compiler Design and Optimization 94 (Aho, Sethi, Ullman, pp. 219)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program CMPUT 680 - Compiler Design and Optimization 95 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 5 id 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program F id CMPUT 680 - Compiler Design and Optimization 96 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program F id CMPUT 680 - Compiler Design and Optimization 97 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 3 F 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F id CMPUT 680 - Compiler Design and Optimization 98 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F id CMPUT 680 - Compiler Design and Optimization 99 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 2 T 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F id CMPUT 680 - Compiler Design and Optimization 100 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 7 2 T 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F id CMPUT 680 - Compiler Design and Optimization 101 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 5 id 7 2 T 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F F id id CMPUT 680 - Compiler Design and Optimization 102 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: E 7 2 T 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F F id id CMPUT 680 - Compiler Design and Optimization 103 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: 10 E F 7 2 T 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 104 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 105 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 2 T 0 LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 106 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 0 LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 107 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 1 E 0 LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 108 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 6 + 1 E 0 LR Parsing Program T T F F id id CMPUT 680 - Compiler Design and Optimization 109 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 5 id 6 + 1 E 0 LR Parsing Program T T F F F id id id CMPUT 680 - Compiler Design and Optimization 110 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id $ E STACK: 6 + 1 E 0 LR Parsing Program T T F F F id id id CMPUT 680 - Compiler Design and Optimization 111 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id STACK: 3 F 6 + 1 E 0 LR Parser Example. OUTPUT: INPUT: id id + id $ LR Parsing Program T F E T T F F id id id CMPUT 680 - Compiler Design and Optimization 112 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id E $ E STACK: 9 T 6 + 1 E 0 LR Parsing Program + T T F F id id id CMPUT 680 - Compiler Design and Optimization 114 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id E $ E STACK: 0 LR Parsing Program + T T F F id id id CMPUT 680 - Compiler Design and Optimization 115 (Aho, Sethi, Ullman, pp. 220)

GRAMMAR: (1) E E + T (2) E’ T (3) T T F (4) T F (5) F ( E ) (6) F id LR Parser Example. OUTPUT: INPUT: id id + id E $ E STACK: 1 E 0 LR Parsing Program + T T F F id id id CMPUT 680 - Compiler Design and Optimization 116 (Aho, Sethi, Ullman, pp. 220)

Constructing Parsing Tables All LR parsers use the same parsing program that we demonstrated in the previous slides. What differentiates the LR parsers are the action and the goto tables: Simple LR (SLR): succeds for the fewest grammars, but is the easiest to implement. (See Aho. Sethi. Ullman pp. 221 -230). Canonical LR: succeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck. (See Aho. Sethi. Ullman pp. 230 -236). Lookahead LR (LALR): succeds for most common syntatic constructions used in programming languages, but produces LR tables much smaller than canonical LR. (See Aho. Sethi. Ullman pp. 236 -247). CMPUT 680 - Compiler Design and Optimization 117 (Aho, Sethi, Ullman, pp. 221)

Using Lex source program lex. l Lex compiler lex. yy. c C compiler a. out sequence of tokens Input stream CMPUT 680 - Compiler Design and Optimization 118 (Aho-Sethi-Ullman, pp. 258)

Parsing Action Conflicts If the grammar specified is ambiguous, yacc will report parsing action conflicts. These conflicts can be reduce/reduce conflicts or shift/reduce conflicts. Yacc has rules to resolve such conflicts automatically (see Aho. Sethi. Ullman, pp. 262 -264), but the resulting parser might not have the behavior intended by the grammar writer. . Whenever you see a conflict report, rerun yacc with the -v flag, examine the y. output file, and re-write CMPUT 680 - Compiler Design your grammar to eliminate the conflicts. and Optimization 119 (Aho-Sethi-Ullman, pp. 262)

Three-Address Statements A popular form of intermediate code used in optimizing compilers is threeaddress statements (or variations, such as quadruples). Source statement: x=a+b c+d Three address statements with temporaries t 1 and t 2: t 1 = b c t 2 = a + t 1 x = t 2 + d CMPUT 680 - Compiler Design and Optimization 120 (Aho-Sethi-Ullman, pp. 466)

Intermediate Code Generation Reading List: Aho-Sethi-Ullman: Chapter 8. 1 ~ 8. 3, Chapter 8. 7 CMPUT 680 - Compiler Design and Optimization 121

Front End of a Compiler Lexical Analyzer (Scanner) + Syntax Analyzer (Parser) + Semantic Analyzer Abstract Syntax Tree with attributes Front End Intermediate-code Generator Error Message Non-optimized Intermediate Code CMPUT 680 - Compiler Design and Optimization 122

Component-Based Approach to Building Compilers Source program in Language-1 Source program in Language-2 Language-1 Front End Language-2 Front End Non-optimized Intermediate Code Intermediate-code Optimizer Optimized Intermediate Code Target-1 Code Generator Target-2 Code Generator Target-1 machine code Target-2 machine code CMPUT 680 - Compiler Design and Optimization 123

Advantages of Using an Intermediate Language 1. Retargeting - Build a compiler for a new machine by attaching a new code generator to an existing front-end. 2. Optimization - reuse intermediate code optimizers in compilers for different languages and different machines. Note: the terms “intermediate code”, “intermediate language”, and “intermediate representation” are all used interchangeably. CMPUT 680 - Compiler Design and Optimization 124

position : = initial + rate * 60 intermediate code generator id 1 : = id 2 + id 3 * 60 syntax analyzer : = id 1 + id 2 * id 3 60 semantic analyzer : = id 1 The Phases of a Compiler lexical analyzer + id 2 * id 3 inttoreal 60 CMPUT 680 - Compiler Design and Optimization temp 1 : = inttoreal (60) temp 2 : = id 3 * temp 1 temp 3 : = id 2 + temp 2 id 1 : = temp 3 code optimizer temp 1 : = id 3 * 60. 0 id 1 : = id 2 + temp 1 code generator MOVF MULF MOVF ADDF MOVF id 3, R 2 #60. 0, R 2 id 2, R 1 R 2, R 1, R 1 id 1 125