COMPILER CONSTRUCTION Principles and Practice Kenneth C Louden

  • Slides: 42
Download presentation
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden

COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden

8. Code Generation PART ONE

8. Code Generation PART ONE

 • Generate executable code for a target machine that is a faithful representation

• Generate executable code for a target machine that is a faithful representation of the semantics of the source code • Depends not only on the characteristics of the source language but also on detailed information about the target architecture, the structure of the runtime environment, and the operating system running on the target machine

Contents Part One 8. 1 Intermediate Code and Data Structure for code Generation 8.

Contents Part One 8. 1 Intermediate Code and Data Structure for code Generation 8. 2 Basic Code Generation Techniques Other Parts 8. 3 Code Generation of Data Structure Reference 8. 4 Code Generation of Control Statements and Logical Expression 8. 5 Code Generation of Procedure and Function calls 8. 6 Code Generation on Commercial Compilers: Two Case Studies 8. 7 TM: A Simple Target Machine 8. 8 A Code Generator for the TINY Language 8. 9 A Survey of Code Optimization Techniques 8. 10 Simple Optimizations for TINY Code Generator

8. 1 Intermediate Code and Data Structures for Code Generation

8. 1 Intermediate Code and Data Structures for Code Generation

8. 1. 1 Three-Address Code

8. 1. 1 Three-Address Code

 • A data structure that represents the source program during translation is called

• A data structure that represents the source program during translation is called an intermediate representation, or IR, for short • Such an intermediate representation that resembles target code is called intermediate code – Intermediate code is particularly useful when the goal of the compiler is to produce extremely efficient code; – Intermediate code can also be useful in making a compiler more easily retarget-able. • Study two popular forms of intermediate code: Three Address code and P-code • The most basic instruction of three-address code is designed to represent the evaluation of arithmetic expressions and has the following general form: X=y op z

Figure 8. 1 Sample TINY program: { sample program in TINY language -- computes

Figure 8. 1 Sample TINY program: { sample program in TINY language -- computes factorial } read x ; { input an integer } if 0 > x then { don’t compute if x <= 0 } fact: =1; repeat fact: =fact*x; x: =x-1 until x=0; write fact { output factorial of x } ends

 • The Three-address codes for above TINY program read x t 1=x>0 if_false

• The Three-address codes for above TINY program read x t 1=x>0 if_false t 1 goto L 1 fact=1 label L 2 t 2=fact*x fact=t 2 t 3=x-1 x=t 3 t 4= x= =0 if_false t 4 goto L 2 write fact label L 1 halt

8. 1. 2 Data Structures for the Implementation of Three-Address Code

8. 1. 2 Data Structures for the Implementation of Three-Address Code

 • The most common implementation is to implement three-address code as quadruple, which

• The most common implementation is to implement three-address code as quadruple, which means that four fields are necessary: – One for the operation and three for the addresses • A different implementation of three-address code is called a triple: – Use the instructions themselves to represent the temporaries. • It requires that each three-address instruction be reference-able, either as an index in an array or as a pointer in a linked list.

 • Quadruple implementation for the threeaddress code of the previous example (rd, x

• Quadruple implementation for the threeaddress code of the previous example (rd, x , _ ) (gt, x, 0, t 1 ) (if_f, t 1, L 1, _ ) (asn, 1, fact, _ ) (lab, L 2, _ ) (mul, fact, x, t 2 ) (asn, t 2, fact, _ ) (sub, x, 1, t 3 ) (asn, t 3, x, _ ) (eq, x, 0, t 4 ) (if_f, t 4, L 2, _) (wri, fact, _, _ ) (lab, L 1, _ ) (halt, _, _, _ )

 • C code defining data structures for the quadruples Typedef enum { rd,

• C code defining data structures for the quadruples Typedef enum { rd, gt, if_f, asn, lab, mul, sub, eq, wri, halt, …} Op. Kind; Typedef enum { Empty, Int. Const, String } Addr. Kind; Typedef struct { Addr. Kind kind; Union { int val; char * name; } contents; } Address Typedef struct { Op. Kind op; Address addr 1, addr 2, addr 3; } Quad

 • A representation of the three-address code of the previous example as triples

• A representation of the three-address code of the previous example as triples (0) (rd, x , _ ) (1) (gt, x, 0) (2) (if_f, (1), (11) ) (3) (asn, 1, fact ) (4) (mul, fact, x) (5) (asn, (4), fact ) (6) (sub, x, 1 ) (7) (asn, (6), x) (8) (eq, x, 0 ) (9) (if_f, (8), (4)) (10) (wri, fact, _ ) (11) (halt, _, _)

8. 1. 3 P-Code

8. 1. 3 P-Code

 • It was designed to be the actual code for a hypothetical stack

• It was designed to be the actual code for a hypothetical stack machine, called the Pmachine, for which an interpreter was written on various actual machines • Since P-code was designed to be directly executable, it contains an implicit description of a particular runtime environment, including data sizes, as well as a great deal of information specific to the Pmachines.

 • The P-machine consists of a code memory, an unspecified data memory for

• The P-machine consists of a code memory, an unspecified data memory for named variables, and a stack for temporary data, together with whatever registers are needed to maintain the stack and support execution 2*a+(b-3) ldc 2 lod a mpi lod b ldc 3 sbi adi ; load constant 2 ; load value of variable a ; integer multiplication ; load value of variable b ; load constant 3 ; integer subtraction ; integer addition

1) Comparison of P-Code to Three-Address Code P-code is in many respects closer to

1) Comparison of P-Code to Three-Address Code P-code is in many respects closer to actual machine code than three-address code. P-code instructions also require fewer addresses: One the other hand, P-code is less compact than three-address code in terms of numbers of instructions, and P-code is not “self-contained” in that the instructions operate implicitly on a stack. 2) Implementation of P-Code Historically, P-code has largely been generated as a text file, but the previous descriptions of internal data structure implementations for three-address code will also work with appropriate modification for P-code.

8. 2 Basic Code Generation Techniques

8. 2 Basic Code Generation Techniques

8. 2. 1 Intermediate Code or Target Code as a Synthesized Attribute

8. 2. 1 Intermediate Code or Target Code as a Synthesized Attribute

 • Intermediate code generation can be viewed as an attribute computation. • This

• Intermediate code generation can be viewed as an attribute computation. • This code becomes a synthesized attribute that can be defined using an attribute grammar and generated either directly during parsing or by a post-order traversal of the syntax tree. • For an example: a small subset of C expressions: Exp id = exp | aexp Aexp aexp + factor | factor Factor (exp) | num | id

1) P-Code Grammar Rule Semantic Rules Exp 1 id = exp 2 exp 1.

1) P-Code Grammar Rule Semantic Rules Exp 1 id = exp 2 exp 1. pcode=”lda”|| id. strval++exp 2. pcode++”stn” Exp aexp exp. pcode=aexp. pcode Aexp 1 aexp 2 + factor aexp 1. pcode=aexp 2. pcode++factor. pcode++”adi” Aexp factor aexp. pcode=factor. pcode Factor (exp) factor. pcode=exp. pcode Factor num factor. pcode=”ldc”||num. strval Factor id factor. pcode=”lod”||id. strval The expression (x=x+3)+4 has the following P-Code attribute: Lda x Lod x Ldc 3 Adi Stn Ldc 4 Adi

2) Three-Address Code Grammar Rule Exp 1 id = exp 2 Exp aexp Semantic

2) Three-Address Code Grammar Rule Exp 1 id = exp 2 Exp aexp Semantic Rules exp 1. name=exp 2. name Exp 1. tacode=exp 2. tacode++ id. strval||”=”||exp 2. name exp. name=aexp. name; exp. tacode=aexp. tacode Aexp 1 aexp 2 + factor aexp 1. name=newtemp( ) aexp 1. tacode=aexp 2. tacode++factor. tacode++ aexp 1. name||”=”||aexp 2. name||”+”||factor. name Aexp factor aexp. name=factor. name; aexp. tacode=factor. tacode Factor (exp) factor. name=exp. name; factor. tacode=exp. tacode Factor num factor. namenum. strval; factor. tacode=” “ Factor id factor. namenum. strval; factor. tacode=” “ T 1=x+3 x=t 1 t 2=t 1+4

8. 2. 2 Practical Code Generation

8. 2. 2 Practical Code Generation

The basic algorithm can be described as the following recursive procedure Procedure gencode (T:

The basic algorithm can be described as the following recursive procedure Procedure gencode (T: treenode); Begin If T is not nil then Generate code to prepare for code of left child of T; Gencode(left child of T); Generate code to prepare for code of right child of T; Gencode(right child of T); Generate code to implement the action of T; End;

Typedef enum {plus, assign} optype; Typedef enum {Op. Kind, Const. Kind, Id. Kind} Node.

Typedef enum {plus, assign} optype; Typedef enum {Op. Kind, Const. Kind, Id. Kind} Node. Kind; Typedef struct streenode { Node. Kind kind; Optype op; /* used with Op. Kind */ Struct streenod *lchild, *rchild; Int val; /* used with Const. Kind */ Char * strval; /* used for identifiers and numbers */ } streenode; typedef streenode *syntaxtree;

 • A gen. Code procedure to generate P-code. Void gen. Code (Syntax. Tree

• A gen. Code procedure to generate P-code. Void gen. Code (Syntax. Tree t) { char codestr[CODESIZE]; /* CODESIZE = max length of 1 line of P-code */ if (t !=NULL) { switch (t->kind) { case op. Kind: switch(t->op) { case Plus: gen. Code(t->lchild); gen. Code(t->rchild); emit. Code(“adi”); break;

case Assign: sprintf(codestr, “%s %s”, “lda”, t->strval); emitcode(codestr); gen. Code(t->lchild); emit. Code(“stn”); break; }

case Assign: sprintf(codestr, “%s %s”, “lda”, t->strval); emitcode(codestr); gen. Code(t->lchild); emit. Code(“stn”); break; } break; case Const. Kind: sprintf(codestr, ”%s %s”, ”ldc”, t->strval); emit. Code(codestr); break;

case Id. Kind: sprintf(codestr, ”%s %s”, ”lod”, t->strval); emit. Code(codestr); break; default: emit. Code(“error”);

case Id. Kind: sprintf(codestr, ”%s %s”, ”lod”, t->strval); emit. Code(codestr); break; default: emit. Code(“error”); break; } } }

Yacc specification for the generation P-code according to the attribute grammar of Table 8.

Yacc specification for the generation P-code according to the attribute grammar of Table 8. 1 %{ #define YYSTYPE char * /* make Yacc use strings as values */ /* other inclusion code … */ %} %token NUM ID %% exp : ID { sprintf (codestr, “%s %s”, “lda”, $1); emit. Code ( codestr); } ‘ = ‘ exp { emit. Code (“stn”); }

| aexp ; aexp | factor ; factor | NUM | ID : aexp

| aexp ; aexp | factor ; factor | NUM | ID : aexp ‘+’ factor {emit. Code(“adi”); } : ‘(‘ exp ‘)’ {sprintf(codestr, “%s %s”, ”ldc”, $1); emit. Code(codestr); } {sprintf(codestr, ”%s %s”, “lod”, $1); emit. Code(codestr); } ; %% /*utility functions… */

8. 2. 3 Generation of Target Code from Intermediate Code

8. 2. 3 Generation of Target Code from Intermediate Code

 • Code generation from intermediate code involves either or both of two standard

• Code generation from intermediate code involves either or both of two standard techniques: – Macro expansion and Static simulation • Macro expansion involves replacing each kind of intermediate code instruction with an equivalent sequence of target code instructions. • Static simulation involves a straight-line simulation of the effects of the intermediate code and generating target code to match these effects.

 • Consider the expression (x=x+3) +4, translate the P-code into three-address code: Lad

• Consider the expression (x=x+3) +4, translate the P-code into three-address code: Lad x Lod x Ldc 3 Adi Stn Ldc 4 Adi t 1=x+3 x=t 1 t 2=t 1+4 • We perform a static simulation of the P-machine stack to find three-address equivalence for the given code

 • We now consider the case of translating from three -address code to

• We now consider the case of translating from three -address code to P-code, by simple macro expansion. A three-address instruction: a=b+c • Can always be translated into the P-code sequence lda a lod b lod c adi sto

 • Then, the three-address code for the expression (x=x+3)+4: T 1 = x

• Then, the three-address code for the expression (x=x+3)+4: T 1 = x + 3 X = t 1 T 2 = t 1 + 4 • Can be translated into the following P-code: Lda t 1 Lod x Ldc 3 Adi Sto Lad x Lod t 1 Sto Lda t 2 Lod t 1 Ldc 4 Adi Sto

End of Part One THANKS

End of Part One THANKS