Chapter 2 Chang ChiChung 2008 03 rev 1

  • Slides: 45
Download presentation
Chapter 2 Chang Chi-Chung 2008. 03 rev. 1

Chapter 2 Chang Chi-Chung 2008. 03 rev. 1

A Simple Syntax-Directed Translator n This chapter contains introductory material to Chapters 3 to

A Simple Syntax-Directed Translator n This chapter contains introductory material to Chapters 3 to 8 q n To create a syntax-directed translator that maps infix arithmetic expressions into postfix expressions. Building a simple compiler involves: q q q Defining the syntax of a programming language Develop a source code parser: for our compiler we will use predictive parsing Implementing syntax directed translation to generate intermediate code

A Code Fragment To Be Translated To extend syntax-directed translator to map code fragments

A Code Fragment To Be Translated To extend syntax-directed translator to map code fragments into threeaddress code. See appendix A. { int i; int j; float[100] a; float v; float x; while (true) { do i = i + 1; while ( a[i] < v ); do j = j – 1; while ( a[j] > v ); if ( i>= j ) break; x = a[i]; a[i] = a[j]; a[j] = x; } } 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: i = i + 1 t 1 = a [ i ] if t 1 < v goto 1 j = j -1 t 2 = a [ j ] if t 2 > v goto 4 if. False i >= j goto 9 goto 14 x = a [ i ] t 3 = a [ j ] a [ i ] = t 3 a [ j ] = x goto 1

A Model of a Compiler Front End Source program Lexical analyzer Token stream Parser

A Model of a Compiler Front End Source program Lexical analyzer Token stream Parser Character Stream Symbol Table Syntax tree Intermediate Code Generator Three-address code

Two Forms of Intermediate Code n Abstract syntax trees n Tree-Address instructions do-while body

Two Forms of Intermediate Code n Abstract syntax trees n Tree-Address instructions do-while body assign [] + i i 1: 2: 3: > a 1 v i i = i + 1 t 1 = a [ i ] if t 1 < v goto 1

Syntax Definition n Using Context-free grammar (CFG) BNF: Backus-Naur Form Context-free grammar has four

Syntax Definition n Using Context-free grammar (CFG) BNF: Backus-Naur Form Context-free grammar has four components: q q A set of tokens (terminal symbols) A set of nonterminals A set of productions A designated start symbol

Example of CFG n G = <T, N, P, S> q q q T

Example of CFG n G = <T, N, P, S> q q q T = { +, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } N = { list, digit } P= n list + digit n list – digit n list digit n q digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 S = list

Derivations n The set of all strings (sequences of tokens) generated by the CFG

Derivations n The set of all strings (sequences of tokens) generated by the CFG using derivation q q Begin with the start symbol Repeatedly replace a nonterminal symbol in the current sentential form with one of the right-hand sides of a production for that nonterminal

Example of the Derivations list + digit list - digit + digit 9 -

Example of the Derivations list + digit list - digit + digit 9 - 5 + digit 9 -5+2 n Production q list + digit q list – digit q list digit q digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Leftmost derivation q n n replaces the leftmost nonterminal (underlined) in each step. Rightmost derivation q replaces the rightmost nonterminal in each step.

Parser Trees n Given a CFG, a parse tree according to the grammar is

Parser Trees n Given a CFG, a parse tree according to the grammar is a tree with following propertes. q The root of the tree is labeled by the start symbol q Each leaf of the tree is labeled by a terminal (=token) or q Each interior node is labeled by a nonterminal q If A X 1 X 2 … Xn is a production, then node A has immediate children X 1, X 2, …, Xn where Xi is a (non)terminal or ( denotes the empty string) n Example q A XYZ A X Y Z

Example of the Parser Tree n Parse tree of the string 9 -5+2 using

Example of the Parser Tree n Parse tree of the string 9 -5+2 using grammar G list digit 9 - 5 + 2 The sequence of leafs is called the yield of the parse tree

Ambiguity n Consider the following context-free grammar G = <{string}, {+, -, 0, 1,

Ambiguity n Consider the following context-free grammar G = <{string}, {+, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, P, string> P = string + string | string - string | 0 | 1 | … | 9 n This grammar is ambiguous, because more than one parse tree represents the string 95+2

Ambiguity (Cont’d) string 9 string - 5 string + 2 9 string - 5

Ambiguity (Cont’d) string 9 string - 5 string + 2 9 string - 5 string + 2

Associativity of Operators n Left-associative q If an operand with an operator on both

Associativity of Operators n Left-associative q If an operand with an operator on both sides of it, then it belongs to the operator to its left. n q Left-associative operators have left-recursive productions n n string a+b+c has the same meaning as (a+b)+c left + term | term Right-associative q If an operand with an operator on both sides of it, then it belongs to the operator to its right. n q string a=b=c has the same meaning as a=(b=c) Right-associative operators have right-recursive productions n right term = right | term

Associativity of Operators (cont’d) right list digit letter right letter digit a + b

Associativity of Operators (cont’d) right list digit letter right letter digit a + b + left-associative c a = b right-associative = c

Precedence of Operators n n n String 9+5*2 has the same meaning as 9+(5*2)

Precedence of Operators n n n String 9+5*2 has the same meaning as 9+(5*2) * has higher precedence than + Constructs a grammar for arithmetic expressions with precedence of operators. q q left-associative : + - (expr) left-associative:* / (term) Step 1: Step 3: factor digit | ( expr ) expr + term | expr – term | term Step 2: Step 4: term * factor | term / factor | factor expr + term | expr – term | term * factor | term / factor | factor digit | ( expr )

An Example: Syntax of Statements n n The grammar is a subset of Java

An Example: Syntax of Statements n n The grammar is a subset of Java statements. This approach prevents the build-up of semicolons after statements such as if- and while-, which end with nested substatements. stmt | | | id = expression ; if ( expression ) stmt else stmt while ( expression ) stmt do stmt while ( expression ) ; { stmts } stmts stmt |

Syntax-Directed Translation n n Syntax-Directed translation is done by attaching rules or program fragments

Syntax-Directed Translation n n Syntax-Directed translation is done by attaching rules or program fragments to productions in a grammar. Translate infix expressions into postfix notation. ( in this chapter ) q q n Infix: 9 – 5 + 2 Postfix: 9 5 – 2 + An Example expr 1 + term q The pseudo-code of the translation translate expr 1 ; translate term ; handle + ; q

Syntax-Directed Translation (Cont’d) n Two concepts (approaches) related to Syntax -Directed Translation. q Synthesized

Syntax-Directed Translation (Cont’d) n Two concepts (approaches) related to Syntax -Directed Translation. q Synthesized Attributes n n q Syntax-directed definition Build up a translation by attaching strings (semantic rules) as attributes to the nodes in the parse tree. Translation Schemes n n Syntax-directed translation Build up a translation by program fragments which are called semantic actions and embedded within production bodies.

Syntax-directed definition n The syntax-directed definition associates q q n With each grammar symbol

Syntax-directed definition n The syntax-directed definition associates q q n With each grammar symbol (terminals and nonterminals), a set of attributes. With each production, a set of semantic rules for computing the values of the attributes associated with the symbols appearing in the production. An attribute is said to be q Synthesized n q if its value at a parse-tree node is determined from attribute values at its children and at the node itself. Inherited n if its value at a parse-tree node is determined from attribute values at the node itself, its parent, and its siblings in the parse tree.

An Example: Synthesized Attributes n An annotated parse tree q q Suppose a node

An Example: Synthesized Attributes n An annotated parse tree q q Suppose a node N in a parse tree is labeled by grammar symbol X. The X. a is denoted the value of attribute a of X at node N. expr. t = “ 95 -2+” expr. t = “ 95 -” term. t = “ 2” expr. t = “ 9” term. t = “ 5” term. t = “ 9” 9 - 5 + 2

Semantic Rules Production expr 1 + term expr 1 - term expr term 0

Semantic Rules Production expr 1 + term expr 1 - term expr term 0 term 1 … term 9 Semantic Rules expr. t = expr 1. t || term. t || ‘+’ expr. t = expr 1. t || term. t || ‘-’ expr. t = term. t = ‘ 0’ term. t = ‘ 1’ … term. t = ‘ 9’ || is the operator for string concatenation in semantic rule.

Depth-First Traversals n Tree traversals q q Breadth-First Depth-First n n Preorder: N L

Depth-First Traversals n Tree traversals q q Breadth-First Depth-First n n Preorder: N L R Inorder: L N R Postorder: L R N Depth-First Traversals: Postorder、From left to right procedure visit(node N) { for ( each child C of N, from left to right ) { visit(C); } evaluate semantic rules at node N; }

Example: Depth-First Traversals expr. t = 95 -2+ expr. t = 95 expr. t

Example: Depth-First Traversals expr. t = 95 -2+ expr. t = 95 expr. t = 9 term. t = 2 term. t = 5 term. t = 9 9 - 5 + 2 Note: all attributes are the synthesized type

Translation Schemes n n A translation scheme is a CFG embedded with semantic actions

Translation Schemes n n A translation scheme is a CFG embedded with semantic actions Example q rest + term { print(“+”) } rest Embedded Semantic Action rest + term { print(“+”) } rest

An Example: Translation Scheme expr term 9 - term 5 { print(‘ 9’) }

An Example: Translation Scheme expr term 9 - term 5 { print(‘ 9’) } + term { print(‘-’) } 2 { print(‘ 5’) } { print(‘+’) } { print(‘ 2’) } expr + term } expr – term expr term 0 term 1 … term 9 { print(‘+’) { print(‘-’) } { print(‘ 0’) } { print(‘ 1’) } { print(‘ 9’) }

Parsing n n The process of determining if a string of terminals (tokens) can

Parsing n n The process of determining if a string of terminals (tokens) can be generated by a grammar. Time complexity: q q n For any CFG there is a parser that takes at most O(n 3) time to parse a string of n terminals. Linear algorithms suffice to parse essentially all languages that arise in practice. Two kinds of methods q q Top-down: constructs a parse tree from root to leaves Bottom-up: constructs a parse tree from leaves to root

Top-Down Parsing n Recursive descent parsing is a top-down method of syntax analysis in

Top-Down Parsing n Recursive descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input. q q n One procedure is associated with each nonterminal of a grammar. If a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input lookahead information Predictive parsing q q A special form of recursive descent parsing The lookahead symbol unambiguously determines the flow of control through the procedure body for each nonterminal.

An Example: Top-Down Parsing stmt expr ; | if ( expr ) stmt |

An Example: Top-Down Parsing stmt expr ; | if ( expr ) stmt | for ( optexpr ; optexpr ) stmt | other optexpr | expr stmt for ( optexpr ε ; optexpr ) stmt other

void stmt() { switch ( lookahead ) { case expr: match(expr); match(‘; ’); break;

void stmt() { switch ( lookahead ) { case expr: match(expr); match(‘; ’); break; case if: match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break; case for: match(for); match(‘(‘); optexpr(); match(‘; ’); stmt expr ; optexpr(); match(‘)’); | if ( expr ) stmt(); break; | for ( optexpr ; optexpr ) stmt case other: | other match(other); break; default: report(“syntax error”); } } Pseudocode For a Predictive Parser Use ε-Productions optexpr | expr void optexpr() { if ( lookahead == expr ) match(expr); } void match(terminal t) { if ( lookahead == t ) lookahead = next. Terminal; else report(“syntax error”); }

Example: Predictive Parsing Parse Tree for LL(1) stmt ( optexpr ; optexpr ) stmt

Example: Predictive Parsing Parse Tree for LL(1) stmt ( optexpr ; optexpr ) stmt optexpr()match(‘; ‘) optexpr() match(‘)‘) stmt() match(for) match(‘(‘) optexpr()match(‘; ‘) Input for ( lookahead ; expr ) other

FIRST n n n FIRST( ) is the set of terminals that appear as

FIRST n n n FIRST( ) is the set of terminals that appear as the first symbols of one or more strings generated from is Sentential Form Example q q FIRST(stmt) = { expr, if, for, other } FIRST(expr ; ) = { expr } stmt | | | expr ; if ( expr ) stmt for ( optexpr ; optexpr ) stmt other

Examples: First type simple | ^ id | array [ simple ] of type

Examples: First type simple | ^ id | array [ simple ] of type simple integer | char | num dotdot num FIRST(simple) = { integer, char, num } FIRST(^ id) = { ^ } FIRST(type) = { integer, char, num, ^, array }

Designing a Predictive Parser n n A predictive parser is a program consisting of

Designing a Predictive Parser n n A predictive parser is a program consisting of a procedure for every nonterminal. The procedure for nonterminal A q It decides which A-production to use by examining the lookahead symbol. n n n q n Left Factor Left Recursion ε Production Mimics the body of the chosen production. Applying translation scheme q q Construct a predictive parser, ignoring the actions. Copy the actions from the translation scheme into the parser

Left Factor n Left Factor q n One production for nonterminal A starts with

Left Factor n Left Factor q n One production for nonterminal A starts with the same symbols. Example: stmt if ( expr ) stmt | if ( expr ) stmt else stmt n Use Left Factoring to fix it stmt if ( expr ) stmt rest else stmt | ε

Left Recursion n Left Recursive q q n An Example: q n A production

Left Recursion n Left Recursive q q n An Example: q n A production for nonterminal A starts with a self reference. A Aα | β expr + term | term Rewrite the left recursive to right recursive by using the following rules. A βR R αR | ε

Example: Left and Right Recursive A A … R R A … A R

Example: Left and Right Recursive A A … R R A … A R β α α …. left recursive α β α α …. right recursive α ε

Abstract and Concrete Syntax + - 2 expr 9 5 expr term helper term

Abstract and Concrete Syntax + - 2 expr 9 5 expr term helper term 9 - 5 + 2

Conclusion: Parsing and Translation Scheme n Give a CFG grammar G as below: expr

Conclusion: Parsing and Translation Scheme n Give a CFG grammar G as below: expr + term { print(‘+’) } expr – term { print(‘-’) } expr term 0 { print(‘ 0’) } term 1 { print(‘ 1’) } … term 9 { print(‘ 9’) } n Semantic actions for translating into postfix notation.

Conclusion: Parsing and Translation Scheme n Step 1 q q To elimination left-recursion Technique

Conclusion: Parsing and Translation Scheme n Step 1 q q To elimination left-recursion Technique A Aα | Aβ | γ into A γR R αR | βR | ε n Use the rule to transforms G.

Conclusion: Parsing and Translation Scheme n Left-Recursion-elimination expr term rest + term { print(‘+’)

Conclusion: Parsing and Translation Scheme n Left-Recursion-elimination expr term rest + term { print(‘+’) } rest | – term { print(‘-’) } rest | ε term 0 term 1 … term 9 { print(‘ 0’) } { print(‘ 1’) } { print(‘ 9’) }

An Example: Left-Recursion-elimination expr rest term 9 { print(‘ 9’) } - term {

An Example: Left-Recursion-elimination expr rest term 9 { print(‘ 9’) } - term { print(‘-’) } 5 { print(‘ 5’) } rest term { print(‘+’) } rest + 2 { print(‘ 2’) } expr term rest + term { print(‘+’) } rest | – term { print(‘-’) } rest | ε term 0 { print(‘ 0’) } | 1 { print(‘ 1’) } | … | 9 { print(‘ 9’) } ε

Conclusion: Parsing and Translation void expr() { term(); rest(); n Scheme Step 2 q

Conclusion: Parsing and Translation void expr() { term(); rest(); n Scheme Step 2 q Procedures for Nonterminals. } void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if ( lookahead == ‘-’ ) { match(‘-’); term(); print(‘-’); rest(); } else { } //do nothing with the input } void term() { if ( lookahead is a digit ) { t = lookahead; match(lookahead); print(t); } else report(“syntax error”); }

Conclusion: Parsing and Translation Scheme n Step 3 q Simplifying the Translator void rest()

Conclusion: Parsing and Translation Scheme n Step 3 q Simplifying the Translator void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); rest(); } else { } void rest() { while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); continue; } break; } }

Conclusion: Parsing and Translation n Scheme Complete import java. io. *; class Parser {

Conclusion: Parsing and Translation n Scheme Complete import java. io. *; class Parser { static int lookahead; public Parser() throws IOException { lookahead = System. in. read(); } void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System. out. write(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); System. out. write(‘-’); continue; } else return; } void term() throws IOException { if (Character. is. Digit((char)lookahead){ System. out. write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); } void match(int t) throws IOException { if ( lookahead == t ) lookahead = System. in. read(); else throw new Error(“syntax error”); } }