Unit 4 Unit 7 Pushdown Compiler Automata Prof

Topics to be covered • Causes of large semantic gap • Binding and binding

Causes of large semantic gap Two aspects of compilation are: 1. Generate code to

Binding & Binding time • Binding is the association of an attribute of a

Data structures used in compilers System Programming (2150708) Unit – 7 : Compiler Darshan

Data structures used in compilers • Two types of data structures used in a

Stack • The stack is a linear data structure. • It follows LIFO (Last

Heap • The heap data structure permits allocation and deallocation of entities in a

Scope rules System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering

Scope rules • A block in a program is a function, a procedure, or

Local and nonlocal variables Consider following program: Procedure A { int x, y, z

Memory allocation System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering

Memory allocation • Memory allocation involves three important tasks: 1. Determining the amount of

Static memory allocation • In static memory allocation, memory is allocated to a variable

Dynamic memory allocation • In dynamic memory allocation, memory bindings are established and destroyed

Dynamic memory allocation • Types of dynamic memory allocation are: 1. Automatic allocation: memory

Dynamic memory allocation and access • Each record in a stack is used to

Memory allocation in block structured language • The block is a sequence of statements

Compilation of expressions System Programming (2150708) Unit – 7 : Compiler Darshan Institute of

Compilation of expressions • The major issues in code generation for expressions are as

A toy code generator for expressions • A toy code generator has to track

Operand descriptors • An operand descriptor has the following fields: 1. Attributes: Contains the

Register descriptors • A register descriptor has two fields: 1. Status: Contains the code

Intermediate codes for expressions System Programming (2150708) Unit – 7 : Compiler Darshan Institute

Types of intermediate forms • Types of intermediate forms: 1. Expression tree 2. Postfix

Expression tree • A expression tree depicts the natural hierarchical structure of a source

Postfix notation • Postfix notation is a linearization of a syntax tree. • In

Three address code • Three address code is a sequence of statements of the

Different representation of three address code System Programming (2150708) Unit – 7 : Compiler

Different representation of three address code • There are three types of representation used

Quadruple • The quadruple is a structure with at the most four fields such

Triple • To avoid entering temporary names into the symbol table, we might refer

Indirect triple • In the indirect triple representation the listing of triples has been

Exercise Write quadruple, triple and indirect triple for following statements: 1. -(a*b)+(c+d) 2. a*-(b+c)

Compilation of control structures System Programming (2150708) Unit – 7 : Compiler Darshan Institute

Parameter passing mechanisms • Types of parameter passing methods are: 1. Call by value

Call by value • This is the simplest method of parameter passing. • The

Call by result • It is similar to call by value with one difference.

Call by reference • This method is also called as call by address or

Call by name • This is less popular method of parameter passing. • Procedure

Code optimization System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering

Optimization & types of optimization • Code optimization aims at improving the execution efficiency

Compile time evaluation • Compile time evaluation means shifting of computations from run time

Common sub expressions elimination • The common sub expression is an expression appearing repeatedly

Frequency reduction • Optimization can be obtained by moving some amount of code outside

Strength reduction • Precedence of certain operators is higher than others. • For instance

Dead code elimination • The variable is said to be dead at a point

Control & Dataflow Analysis System Programming (2150708) Unit – 7 : Compiler Darshan Institute

Control flow analysis • Control flow analysis analyses a program to collect information concerning

Predecessors & Successors Predecessor of B 2 Successor of B 1 y=2 B 1

Path & Ancestors, Descendants y=2 B 1 Ancestors of B 2 y=y+2 B 2

Dominators • In a flow graph, a node d dominates n if every path

Dataflow analysis • Dataflow property represent the certain information regarding usefulness of data items

Available expression § B 1: t 1=4*i B 2: t 2: c+d[t 1] B

Live variable § Unit – 7 : Compiler System Programming (2150708) 55 Darshan Institute

End of Unit - 7 System Programming (2150708) Unit – 7 : Compiler Darshan

Slides: 56

Download presentation

Unit – 4 Unit – 7 Pushdown Compiler Automata Prof. Dixita Kagathara dixita. kagathara@darshan. ac. in System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Topics to be covered • Causes of large semantic gap • Binding and binding times • Data structure used in compiling • Scope rules • Memory allocation • Compilation of expression • Compilation of control structure • Code optimization Unit – 7 : Compiler System Programming (2150708) 2 Darshan Institute of Engineering & Technology

Causes of large semantic gap Two aspects of compilation are: 1. Generate code to implement meaning of a source program in the execution domain. (target code generation) 2. Provide diagnostics for violations of programming language semantics in a program. (Error reporting) • There are four issues involved in implementing these aspects: 1. Data types 2. Data structures 3. Scope rules 4. Control structure Unit – 7 : Compiler System Programming (2150708) 3 Darshan Institute of Engineering & Technology

Binding & Binding time • Binding is the association of an attribute of a program entity with a value. • Binding time is the time at which a binding is actually performed. • The following binding times arise in compilers: 1. Language definition time of a programming language L, which is the time at which features of the language are specified. (Example: bind operator symbols to operations) 2. Language implementation time of a programming language L, which is the time at which the design of a language translator for L is finalized. (Example: bind floating point type to a representation) 3. Compilation time of a program P. (Example: bind a variable to a type in C or Java) 4. Execution init time of a procedure proc. (Example: bind a C or C++ static variable to a memory cell) 5. Execution time of a procedure proc. (Example: bind a non static local variable to a memory cell) Unit – 7 : Compiler System Programming (2150708) 4 Darshan Institute of Engineering & Technology

Data structures used in compilers System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Data structures used in compilers • Two types of data structures used in a compiler are: 1. Stack 2. Heap Unit – 7 : Compiler System Programming (2150708) 6 Darshan Institute of Engineering & Technology

Stack • The stack is a linear data structure. • It follows LIFO (Last In First Out) rule. • Each time a procedure is called, space for its local variables is pushed onto a stack, and when the procedure terminates, space is popped off the stack. Unit – 7 : Compiler System Programming (2150708) 7 Darshan Institute of Engineering & Technology

Heap • The heap data structure permits allocation and deallocation of entities in a random order. • Heap is non linear data structure. • When a program makes an allocation request, the heap manager allocates a memory area and returns its address. • The program is expected to save this address in a pointer and use it to access the allotted entity. • When a program makes a deallocation request, the heap manager deallocates a memory area. Unit – 7 : Compiler System Programming (2150708) 8 Darshan Institute of Engineering & Technology

Scope rules System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Scope rules • A block in a program is a function, a procedure, or simply a unit of code that may contain data declaration. • Entities declared in a block must have unique names. • These entities can be accessed only within the block. • Thus, places where an entity can be accessed or visible is referred to the scope of that entity. • There are two types of variable situated in the block structured language: 1. Local variable 2. Non local variable Unit – 7 : Compiler System Programming (2150708) 10 Darshan Institute of Engineering & Technology

Local and nonlocal variables Consider following program: Procedure A { int x, y, z Procedure B A Local variables x, y, z { B a, b Nonlocal variables x, y, z int a, b Procedure } } Unit – 7 : Compiler System Programming (2150708) 11 Darshan Institute of Engineering & Technology

Memory allocation System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Memory allocation • Memory allocation involves three important tasks: 1. Determining the amount of memory required for storing the value of data item. 2. Using an appropriate memory allocation model. 3. Developing appropriate memory mappings for accessing values stored in a data structure. • Types of memory allocations are: 1. Static memory allocation 2. Dynamic memory allocation Unit – 7 : Compiler System Programming (2150708) 13 Darshan Institute of Engineering & Technology

Static memory allocation • In static memory allocation, memory is allocated to a variable before the execution of a program begins. • No memory allocation or deallocation actions are performed during the execution of a program. Thus, variables remain permanently allocated. Code(A) Data(A) Code(B) Data(B) Code(C) Data(C) Unit – 7 : Compiler System Programming (2150708) 14 Darshan Institute of Engineering & Technology

Dynamic memory allocation • In dynamic memory allocation, memory bindings are established and destroyed during the execution of a program. Code(A) Code(B) Code(C) Data(A) Data(C) Data(B) Unit – 7 : Compiler System Programming (2150708) 15 Darshan Institute of Engineering & Technology

Dynamic memory allocation • Types of dynamic memory allocation are: 1. Automatic allocation: memory is allocated to the variables declared in a procedure when the procedure is entered during execution and is deallocated when the procedure is exited. 2. Program controlled allocation: a program can allocate or deallocate memory at any time during its execution. Unit – 7 : Compiler System Programming (2150708) 16 Darshan Institute of Engineering & Technology

Dynamic memory allocation and access • Each record in a stack is used to accommodate variables of one activation of a block, is called activation record. Used for memory allocation and deallocation. 0 (ARB): Dynamic pointer ARB (activation record base) 1 (ARB): static pointer Reserved pointer Used for accessing nonlocal variables. TOS Stack record format Unit – 7 : Compiler System Programming (2150708) 17 Darshan Institute of Engineering & Technology

Memory allocation in block structured language • The block is a sequence of statements containing the local data and declarations which are enclosed within the delimiters. A { Statements …. . } • A block structured language uses dynamic memory allocation. • Finding the scope of the variable means checking the visibility within the block. • Following are the rules used to determine the scope of the variable: 1. Variable X is accessed within the block B 1 if it can be accessed by any statement situated in block B 1. 2. Variable X is accessed by any statement in block B 2 if block B 2 is situated inside block B 1. Unit – 7 : Compiler System Programming (2150708) 18 Darshan Institute of Engineering & Technology

Compilation of expressions System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Compilation of expressions • The major issues in code generation for expressions are as follows: 1. Determination of an evaluation order for the operators in an expression. 2. Selection of instruction to be used in target code. 3. Use of registers. Unit – 7 : Compiler System Programming (2150708) 20 Darshan Institute of Engineering & Technology

A toy code generator for expressions • A toy code generator has to track both the registers (for availability) and addresses (location of values) while generating the code. • For both of them, the following two descriptors are used: • The code generator uses the notation of an operand descriptor to maintain type, length and addressability information for each operand. • It uses a register descriptor to maintain information about what operand or partial result would be contained in a CPU register during execution of the generated code. Unit – 7 : Compiler System Programming (2150708) 21 Darshan Institute of Engineering & Technology

Operand descriptors • An operand descriptor has the following fields: 1. Attributes: Contains the subfields type, and length. 2. Addressability: Specifies where the operand is located, and how it can be accessed. It has two subfields: I. Addressability code: Takes the values 'M' (operand is in memory), and 'R' (operand is in register). II. Address: Address of a CPU register or memory word. • Example: a*b MOVER AREG, A MULT AREG, B Unit – 7 : Compiler System Programming (2150708) Attribute (int, 1) 22 Addressability M, Address(a) M, Address(b) R, Address(AREG) Operand_descriptor[1] Operand_descriptor[2] Operand_descriptor[3] Darshan Institute of Engineering & Technology

Register descriptors • A register descriptor has two fields: 1. Status: Contains the code free or occupied to indicate register status. 2. Operand descriptor: If status = occupied, this field contains the descriptor for the operand contained in the register. • The register descriptor for AREG after generating code for a*b would be: Status Operand descriptor # Occupied #3 • This indicates that register AREG contains the operand described by descriptor #3. Unit – 7 : Compiler System Programming (2150708) 23 Darshan Institute of Engineering & Technology

Intermediate codes for expressions System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Types of intermediate forms • Types of intermediate forms: 1. Expression tree 2. Postfix notation 3. Three address code Unit – 7 : Compiler System Programming (2150708) 25 Darshan Institute of Engineering & Technology

Expression tree • A expression tree depicts the natural hierarchical structure of a source program. • Ex: a=b*-c+b*-c = + * * b uminus c c Unit – 7 : Compiler System Programming (2150708) a 26 Darshan Institute of Engineering & Technology

Postfix notation • Postfix notation is a linearization of a syntax tree. • In postfix notation the operands occurs first and then operators are arranged. • Example: 9 -5+2 = 95 -2+ Unit – 7 : Compiler System Programming (2150708) 27 Darshan Institute of Engineering & Technology

Three address code • Three address code is a sequence of statements of the general form, a: = b op c • Where a, b or c are the operands that can be names or constants and op stands for any operator. • Example: a = b + c + d t 1=b+c t 2=t 1+d a= t 2 • Here t 1 and t 2 are the temporary names generated by the compiler. • There at most three addresses allowed (two for operands and one for result). Hence, this representation is called three-address code. Unit – 7 : Compiler System Programming (2150708) 28 Darshan Institute of Engineering & Technology

Different representation of three address code System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Different representation of three address code • There are three types of representation used for three address code: 1. Quadruples 2. Triples 3. Indirect triples • Ex: x= -a*b + -a*b t 1= - a t 2 = t 1 * b t 3= - a t 4 = t 3 * b Three Address Code t 5 = t 2 + t 4 x= t 5 Unit – 7 : Compiler System Programming (2150708) 30 Darshan Institute of Engineering & Technology

Quadruple • The quadruple is a structure with at the most four fields such as op, arg 1, arg 2 and result. • The op field is used to represent the internal code for operator. • The arg 1 and arg 2 represent the two operands. • And result field is used to store the result of an expression. Quadruple x= -a*b + -a*b t 1= - a t 2 = t 1 * b t 3= - a t 4 = t 3 * b t 5 = t 2 + t 4 x= t 5 Unit – 7 : Compiler System Programming (2150708) No. Operator Arg 1 (0) uminus a (1) * t 1 (2) uminus a (3) * t 3 b t 4 (4) + t 2 t 4 t 5 (5) = t 5 31 Arg 2 Result t 1 b t 2 t 3 x Darshan Institute of Engineering & Technology

Triple • To avoid entering temporary names into the symbol table, we might refer a temporary value by the position of the statement that computes it. • If we do so, three address statements can be represented by records with only three fields: op, arg 1 and arg 2. Triple Quadruple Arg 2 Result No. Operator Arg 1 t 1 (0) uminus a (1) * (0) (2) uminus a (3) * (2) b t 5 (4) + (1) (3) x (5) = x (4) No. Operator Arg 1 (0) uminus a (1) * t 1 (2) uminus a (3) * t 3 b t 4 (4) + t 2 t 4 (5) = t 5 b t 2 t 3 Unit – 7 : Compiler System Programming (2150708) 32 Arg 2 b Darshan Institute of Engineering & Technology

Indirect triple • In the indirect triple representation the listing of triples has been done. And listing pointers are used instead of using statement. • This implementation is called indirect triples. Triple Indirect Triple Statement No. Operator Arg 1 (0) (14) (0) uminus a (1) (15) (1) * (14) (2) (16) (2) uminus a b (3) (17) (3) * (16) b (1) (3) (4) (18) (4) + (15) (17) x (4) (5) (19) (5) = x (18) No. Operator Arg 1 (0) uminus a (1) * (0) (2) uminus a (3) * (2) (4) + (5) = Arg 2 b Unit – 7 : Compiler System Programming (2150708) 33 Arg 2 b Darshan Institute of Engineering & Technology

Exercise Write quadruple, triple and indirect triple for following statements: 1. -(a*b)+(c+d) 2. a*-(b+c) 3. x=(a+b*c)^(d*e)+f*g^h 4. g+a*(b-c)+(x-y)*d Unit – 7 : Compiler System Programming (2150708) 34 Darshan Institute of Engineering & Technology

Compilation of control structures System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Parameter passing mechanisms • Types of parameter passing methods are: 1. Call by value 2. Call by result 3. Call by reference 4. Call by name Unit – 7 : Compiler System Programming (2150708) 36 Darshan Institute of Engineering & Technology

Call by value • This is the simplest method of parameter passing. • The call by value method of passing arguments to a function copies the actual value of an argument into the formal parameter of the function. • The operations on formal parameters do not change the values of a parameter. Unit – 7 : Compiler System Programming (2150708) 37 Darshan Institute of Engineering & Technology

Call by result • It is similar to call by value with one difference. • At the time of return from the called function, the values of formal parameters are copied back into corresponding actual parameter. Unit – 7 : Compiler System Programming (2150708) 38 Darshan Institute of Engineering & Technology

Call by reference • This method is also called as call by address or call by location. • The call by reference method of passing arguments to a function copies the address of an argument into the formal parameter. • Inside the function, the address is used to access the actual argument used in the call. • It means the changes made to the parameter affect the passed argument. Unit – 7 : Compiler System Programming (2150708) 39 Darshan Institute of Engineering & Technology

Call by name • This is less popular method of parameter passing. • Procedure is treated like macro. • The procedure body is substituted for call in caller with actual parameters substituted formals. • The local names of called procedure and names of calling procedure are distinct. Unit – 7 : Compiler System Programming (2150708) 40 Darshan Institute of Engineering & Technology

Code optimization System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Optimization & types of optimization • Code optimization aims at improving the execution efficiency of a program. • This achieved in two ways: 1. Redundancies in a program are eliminated. 2. Computations in a program are rearranged or rewritten to make it execute efficiently. • There are two types of optimization: • Local optimization: the optimizing transformation are applied over small segments of a program consisting of a few statements. • Global optimization: the optimizing transformation are applied over program unit, i. e. over a function or a procedure. Unit – 7 : Compiler System Programming (2150708) 42 Darshan Institute of Engineering & Technology

Compile time evaluation • Compile time evaluation means shifting of computations from run time to compile time. Constant folding • In this technique the value of variable is replaced and computation of an expression is done at compilation time. • Example: pi = 3. 14; r = 5; Area = pi * r; • Here at the compilation time the value of pi is replaced by 3. 14 and r by 5 then computation of 3. 14 * 5 is done during compilation. Unit – 7 : Compiler System Programming (2150708) 43 Darshan Institute of Engineering & Technology

Common sub expressions elimination • The common sub expression is an expression appearing repeatedly in the program which is computed previously. • If the operands of this sub expression do not get changed at all then result of such sub expression is used instead of re-computing it each time. • Example: t 1 : = 4 * i t 2 : = a[t 1] t 3 : = 4 * j t 4 : = 4 * i t 5: = n t 6 : = b[t 4]+t 5 t 6 : = b[t 1]+t 5 Unit – 7 : Compiler System Programming (2150708) 44 Darshan Institute of Engineering & Technology

Frequency reduction • Optimization can be obtained by moving some amount of code outside the loop and placing it just before entering in the loop. • This method is also called loop invariant computation. • Example: While(i<=max-1) { sum=sum+a[i]; } Unit – 7 : Compiler System Programming (2150708) N=max-1; While(i<=N) { sum=sum+a[i]; } 45 Darshan Institute of Engineering & Technology

Strength reduction • Precedence of certain operators is higher than others. • For instance strength of * is higher than +. • In this technique the higher strength operators can be replaced by lower strength operators. • Example: for(i=1; i<=50; i++) temp=7; { } for(i=1; i<=50; i++) { count = temp; temp = temp+7; } count = i*7; • Here we get the count values as 7, 14, 21…. and so on. Unit – 7 : Compiler System Programming (2150708) 46 Darshan Institute of Engineering & Technology

Dead code elimination • The variable is said to be dead at a point in a program if the value contained into it is never been used. • The code containing such a variable supposed to be a dead code. • Example: i=0; if(i==1) { Dead Code a=x+5; } • If statement is a dead code as this condition will never get satisfied hence, statement can be eliminated and optimization can be done. Unit – 7 : Compiler System Programming (2150708) 47 Darshan Institute of Engineering & Technology

Control & Dataflow Analysis System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology

Control flow analysis • Control flow analysis analyses a program to collect information concerning its structure. • The control flow concept of interest are: 1. Predecessors & Successors 2. Paths 3. Ancestors & Descendants 4. Dominators Unit – 7 : Compiler System Programming (2150708) 49 Darshan Institute of Engineering & Technology

Predecessors & Successors Predecessor of B 2 Successor of B 1 y=2 B 1 y=y+2 B 2 • Basic block B 1 is a predecessor of B 2. • Basic block B 2 is a successor of B 1. Unit – 7 : Compiler System Programming (2150708) 50 Darshan Institute of Engineering & Technology

Path & Ancestors, Descendants y=2 B 1 Ancestors of B 2 y=y+2 B 2 Descendants of B 1 Path • A path is a sequence of edges such that the destination node of one edge is the source node of the following edge. • If path exists from B 1 to B 2, B 1 is an Ancestors of B 2 and B 2 is Descendants of B 1. Unit – 7 : Compiler System Programming (2150708) 51 Darshan Institute of Engineering & Technology

Dominators • In a flow graph, a node d dominates n if every path to node n from initial node goes through d only. This can be denoted as ’d dom n'. • Every initial node dominates all the remaining nodes in the flow graph. • Every node dominates itself. 1 2 3 4 5 • Node 1 is initial node and it dominates every node as it is initial node. • Node 2 dominates 3, 4 and 5. Unit – 7 : Compiler System Programming (2150708) 52 Darshan Institute of Engineering & Technology

Dataflow analysis • Dataflow property represent the certain information regarding usefulness of data items for the purpose of optimization. • The data flow properties are: 1. Available expression 2. Live variable Unit – 7 : Compiler System Programming (2150708) 53 Darshan Institute of Engineering & Technology

Available expression § B 1: t 1=4*i B 2: t 2: c+d[t 1] B 3: t 3=4*i B 4: t 4=a[t 3] Unit – 7 : Compiler System Programming (2150708) 54 Darshan Institute of Engineering & Technology

Live variable § Unit – 7 : Compiler System Programming (2150708) 55 Darshan Institute of Engineering & Technology

End of Unit - 7 System Programming (2150708) Unit – 7 : Compiler Darshan Institute of Engineering & Technology