Intermediate Code Generation Introduction v v v Intermediate

  • Slides: 28
Download presentation
Intermediate Code Generation

Intermediate Code Generation

Introduction v v v Intermediate code is the interface between front end and back

Introduction v v v Intermediate code is the interface between front end and back end in a compiler Ideally the details of source language are confined to the front end and the details of target machines to the back end In this chapter we study intermediate representations, static type checking and intermediate code generation Parser Static Checker Front end Intermediate Code Generator Back end

Intermediate Code Generation v Translating source program into an “intermediate language. ” q Simple

Intermediate Code Generation v Translating source program into an “intermediate language. ” q Simple q CPU Independent, q …yet, close in spirit to machine language. v This activity can be done directly , but it is not always possible to generate m/c code directly in one pass. Then typically complier generate an easy code to represent source lang. called as intermediate lang. v Benefits 1. Machine independent Code Optimization can be applied.

Intermediate Code Generation v Intermediate codes are machine independent codes, but they are close

Intermediate Code Generation v Intermediate codes are machine independent codes, but they are close to machine instructions. v The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. v Three types of intermediate code representations q syntax trees can be used as an intermediate language. q postfix notation can be used as an intermediate language. q three-address code (Quadraples) can be used as an intermediate language Ø we will use quadraples to discuss intermediate code generation Ø quadraples are close to machine instructions, but they are not actual machine instructions. q some programming languages have well defined intermediate languages. Ø java – java virtual machine Ø prolog

Types of Intermediate code representations Graphical Representations. Syntax trees and DAG q Consider the

Types of Intermediate code representations Graphical Representations. Syntax trees and DAG q Consider the assignment x: = -a*b+-a*b: v assign x + * uminus a assign + x * * b uminus a uminus b b a

v POSIX: It is postfix notation v assignment x: = -a*b+-a*b: v Steps are

v POSIX: It is postfix notation v assignment x: = -a*b+-a*b: v Steps are : 1. draw syntax tree 2. Postorder traversal v v v x a – b *a-b*+: = : x-ab*+: = Basically it is linearization of syntax trees. This is most natural way of representation

Syntax Dir. Definition for Assignment Statements X=expression id=E PRODUCTION S id : = E

Syntax Dir. Definition for Assignment Statements X=expression id=E PRODUCTION S id : = E Semantic Rule { S. nptr = mknode (‘assign’, mkleaf(id, id. entry), E. nptr) } E E 1 + E 2 {E. nptr = mknode(‘+’, E 1. nptr, E 2. nptr) } E E 1 * E 2 {E. nptr = mknode(‘*’, E 1. nptr, E 2. nptr) } E - E 1 {E. nptr = mknode(‘uminus’, E 1. nptr) } E ( E 1 ) {E. nptr = E 1. nptr } E id {E. nptr = mkleaf(id, id. entry) }

Three Address Code v v v At the most three addresses are used to

Three Address Code v v v At the most three addresses are used to represent any statement. The general form of three address code is x: = y op z where x, y, z can be names, constants or temporary variables Operators : arithemetic, logical, relational v v As a result, x: =y + z * w should be represented as t 1: =z * w t 2: =y + t 1 x: =t 2 In fact three-address code is a linearization of the tree. Three-address code is useful: related to machine-language/ simple/ optimizable.

Example of 3 -address code Consider the assignment a: =b*-c+b*-c: t 1: =- c

Example of 3 -address code Consider the assignment a: =b*-c+b*-c: t 1: =- c t 2: =b * t 1 t 5: =t 2 + t 2 a: =t 5 t 1: =- c t 2: =b * t 1 t 3: =- c t 4: =b * t 3 t 5: =t 2 + t 4 a: =t 5

Types of Three-Address Statements. The form of three address code is very much similar

Types of Three-Address Statements. The form of three address code is very much similar to assembly laguange Language construct int. code form meaning 1]Assignment Statement: x: =y op z binary operator Assignment Statement: x: =op z uniminus 2]Copy Statement: x: =z copy value of z to x 3]Unconditional Jump: goto L L label goto ADD 4]Conditional Jump: if x relop y goto L relop: <, >, <= etc 5]Stack Operations: Push/pop More Advanced: 6]Procedure: param x 1, x 2 are parameters of procedures param x 2 … param xn call p, n return y return value of y 7] array or Index Assignments: x: =y[i] : value of ith index of array y is assign to x x[i]: =y : value of identifier y is assigned to ith index of array X 8]Address and Pointer Assignments: x: =&y : value of x is address of y x: =*y : y is pointer whose value is assigned to x

Syntax-Directed Translation into 3 -address code. v v First deal with assignments. Use attributes

Syntax-Directed Translation into 3 -address code. v v First deal with assignments. Use attributes q E. place: the name that will hold the value of E Ø Identifier will be assumed to already have the place attribute defined. E. code: hold the three address code statements that evaluate E (this is the `translation’ attribute). Use function newtemp that returns a new temporary variable that we can use. Use function gen to generate a single three address statement given the necessary information (variable names and operations). q v v

Syntax-Dir. Definition for 3 -address code PRODUCTION S id : = E E E

Syntax-Dir. Definition for 3 -address code PRODUCTION S id : = E E E 1 + E 2 E E 1 * E 2 E - E 1 E ( E 1 ) E id Semantic Rule { S. code = E. code||gen(id. place ‘=’ E. place ‘; ’) } {E. place= newtemp ; E. code = E 1. code || E 2. code || || gen(E. place‘: =’E 1. place‘+’E 2. place) } {E. place= newtemp ; E. code = E 1. code || E 2. code || || gen(E. place‘=’E 1. place‘*’E 2. place) } {E. place= newtemp ; E. code = E 1. code || || gen(E. place ‘=’ ‘uminus’ E 1. place) } {E. place= E 1. place ; E. code = E 1. code} {E. place = id. entry ; E. code = ‘’ } e. g. a : = b * - (c+d)

What about things that are not assignments? v E. g. while statements of the

What about things that are not assignments? v E. g. while statements of the form “while E do S” (intepreted as while the value of E is not 0 do S) Extension to the previous syntax-dir. Def. PRODUCTION S while E do S 1 Semantic Rule S. begin = newlabel; S. after = newlabel ; S. code = gen(S. begin ‘: ’) || E. code || gen(‘if’ E. place ‘=’ ‘ 0’ ‘goto’ S. after) || S 1. code || gen(‘goto’ S. begin) || gen(S. after ‘: ’)

Implementations of 3 -address statements v Three address code is abstract form of intermediate

Implementations of 3 -address statements v Three address code is abstract form of intermediate code that can be implemented as a record with the address fields. v There are three representations used for three address code such as v 1. quadruple 2. Triples 3. indirect triples v v 1. quadruple : is a structure with at most four fields such as op, arg 1 and arg 2, result

Implementations of 3 -address statements Quadruples t 1: =- c t 2: =b *

Implementations of 3 -address statements Quadruples t 1: =- c t 2: =b * t 1 t 3: =- c t 4: =b * t 3 t 5: =t 2 + t 4 a: =t 5 v op arg 1 (0) uminus c (1) * b (2) uminus c (3) * (4) (5) arg 2 result t 1 t 2 b t 3 t 4 + t 2 t 4 t 5 : = t 5 a Temporary names must be entered into the symbol table as they are created.

Implementations of 3 -address statements, II Triples t 1: =- c t 2: =b

Implementations of 3 -address statements, II Triples t 1: =- c t 2: =b * t 1 t 3: =- c t 4: =b * t 3 t 5: =t 2 + t 4 a: =t 5 v op arg 1 arg 2 (0) uminus c (1) * b (2) uminus c (3) * b (2) (4) + (1) (3) (5) assign a (4) (0) • Temporary names are not entered into the symbol table. • The use of Temp. variables are avoided by referring pointers in the symbol table

Other types of 3 -address statements v v e. g. operations like x[i]: =y

Other types of 3 -address statements v v e. g. operations like x[i]: =y x: =y[i] require two or more entries. e. g. op arg 1 arg 2 (0) []= x i (1) assign (0) y op arg 1 arg 2 (0) []= y i (1) assign x (0)

Implementations of 3 -address statements, III v Indirect Triples: listing of triples is been

Implementations of 3 -address statements, III v Indirect Triples: listing of triples is been done and listing pointers are used instead of using statements statement Number op arg 1 (14) uminus c (15) * b (16) uminus c arg 2 (0) (14) (15) (2) (16) (3) (17) * b (16) (4) (18) + (15) (17) (5) (19) assign a (18) (14)

v Examples on triples and quadruples

v Examples on triples and quadruples

Generation of 3 address code

Generation of 3 address code

Declarations Using a global variable offset PRODUCTION Semantic Rule P MD {} M {offset:

Declarations Using a global variable offset PRODUCTION Semantic Rule P MD {} M {offset: =0 } D id : T { addtype(id. entry, T. type, offset) offset: =offset + T. width } T char {T. type = char; T. width = 4; } T integer {T. type = integer ; T. width = 4; } T array [ num ] of T 1 {T. type=array(1. . num. val, T 1. type) T. width = num. val * T 1. width} T ^T 1 {T. type = pointer(T 1. type); T 1. width = 4}

Nested Procedure Declarations v For each procedure we should create a symbol table. mktable(previous)

Nested Procedure Declarations v For each procedure we should create a symbol table. mktable(previous) – create a new symbol table where previous is the parent symbol table of this new symbol table enter(symtable, name, type, offset) – create a new entry for a variable in the given symbol table. enterproc(symtable, name, newsymbtable) – create a new entry for the procedure in the symbol table of its parent. addwidth(symtable, width) – puts the total width of all entries in the symbol table into the header of that table. v We will have two stacks: q tblptr – to hold the pointers to the symbol tables q offset – to hold the current offsets in the symbol tables in tblptr stack.

Keeping Track of Scope Information Consider the grammar fraction: P D D D ;

Keeping Track of Scope Information Consider the grammar fraction: P D D D ; D | id : T | proc id ; D ; S Each procedure should be allowed to use independent names. Nested procedures are allowed.

Dealing with Procedures P procedure id ‘; ’ block ‘; ’ Semantic Rule begin

Dealing with Procedures P procedure id ‘; ’ block ‘; ’ Semantic Rule begin = newlabel; Enter into symbol-table in the entry of the procedure name the begin label. P. code = gen(begin ‘: ’) || block. code || gen(‘pop’ return_address) || gen(“goto return_address”) S call id Semantic Rule Look up symbol table to find procedure name. Find its begin label called proc_begin return = newlabel; S. code = gen(‘push’return); gen(goto proc_begin) || gen(return “: ”)

Keeping Track of Scope Information (a translation scheme) P MD pop(offset) } { addwidth(top(tblptr),

Keeping Track of Scope Information (a translation scheme) P MD pop(offset) } { addwidth(top(tblptr), top(offset)); pop(tblptr); M { t: =mktable(null); push(t, tblptr); push(0, offset)} D D 1 ; D 2 . . . D proc id ; N D ; S N { t: =top(tblpr); addwidth(t, top(offset)); pop(tblptr); pop(offset); enterproc(top(tblptr), id. name, t)} {t: =mktable(top(tblptr)); push(t, tblptr); push(0, offset); } D id : T {enter(top(tblptr), id. name, T. type, top(offset); top(offset): =top(offset) + T. width Example: proc func 1; D; proc func 2 D; S; S