Compiler Construction Runtime Environments 1 RunTime Environments Chapter

  • Slides: 34
Download presentation
Compiler Construction Run-time Environments, 1

Compiler Construction Run-time Environments, 1

Run-Time Environments (Chapter 7) Continued: Access to No-local Names 2

Run-Time Environments (Chapter 7) Continued: Access to No-local Names 2

Non-locals Assume we have stack allocation of activation records. SCOPE RULES of the source

Non-locals Assume we have stack allocation of activation records. SCOPE RULES of the source language determine how we handle non-local references. Most languages use LEXICAL (also called STATIC) scoping. − Lexical scoping means it is possible to determine the declaration corresponding to a reference just by examining the program. − Pascal, C, Ada, etc. use static scoping. Languages with DYNAMIC scoping require examination of the stack, at runtime, to find the right declaration. 3

Block structure C and many other languages have BLOCKs: stmt -> block | …

Block structure C and many other languages have BLOCKs: stmt -> block | … block -> { decls stmts } The scope of a declaration in a block uses the MOSTCLOSELY- NESTED rule: 1. The scope of a declaration in block B includes B 2. If “x” is referred to but not declared in B, then “x” is in the scope of a declaration in an enclosing block B’ s. t. a. B’ has a declaration of “x” and b. B’ is more closely nested around B than any other block with a declaration of “x” 4

C program with blocks Decl int a = 0; int b = 1; int

C program with blocks Decl int a = 0; int b = 1; int a = 2; int b = 3; Scope B 0 -B 2 what is the output? ) 5

Stack allocation of declarations in blocks Declarations in each block can be allocated on

Stack allocation of declarations in blocks Declarations in each block can be allocated on the stack. It is similar to a procedure call (with no parameters). Space is allocated on the stack when we enter the block. Space is deallocated on the stack when we exit the block. 6

Lexical scope without nested procedures C and related languages do NOT allow nested procedures.

Lexical scope without nested procedures C and related languages do NOT allow nested procedures. A program is a series of declarations and functions. All non-local references inside functions must refer to declarations at file (global) scope. 7

Example: lexical scope Consider the C code: int a[11]; void readarray( void ) {

Example: lexical scope Consider the C code: int a[11]; void readarray( void ) { … a … } int partition( int y, int z ) { … a … } void quicksort( int m, int n ) { … } int main( void ) { … a … } The references to a are always to the array declared on the first line. 8

Lexical scope Without nested procedures: − Locals use stack dynamic allocation. − All non-local

Lexical scope Without nested procedures: − Locals use stack dynamic allocation. − All non-local data is allocated in the static data area. − At compile time, if a reference is not found in the current procedure’s AR, we look in the static data area and use the resulting static address. − Otherwise, the reference is local and accessible relative to the top of stack pointer. Passing procedures as parameters is also simple if there is no nesting (all non-locals have static addresses). 9

1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14)

1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) program sort( input, output ); var a : array[0. . 10] of integer; x : integer; procedure readarray; var i : integer; begin … a … end { readarray }; procedure exchange( i, j : integer ); begin x : = a[i]; a[i] : = a[j]; a[j] : = x end { exchange }; procedure quicksort( m, n: integer ); var k, v : integer; function partition( y, z: integer ): integer; var i, j : integer; begin… a … …v… … exchange( i, j ); … end { partition }; begin … end { quicksort }; begin … end { sort }; Lexical scope with nested procedures 10

Nesting depth The reference to a on line 15: − The ref is inside

Nesting depth The reference to a on line 15: − The ref is inside partition(), which is inside quicksort(). − The most closely nested declaration is line 2, at program (global) scope. The reference to exchange on line 17: − The ref is in partition(), which is nested in quicksort(). − The most closely nested declaration is line 7. The compiler need to keep track of the NESTING DEPTH of each declaration: − sort() is at depth 1 − quicksort() is at depth 2 − partition() is at depth 3 − i of partition(): depth 4 11

Access Links We need some way to traverse from one AR to another when

Access Links We need some way to traverse from one AR to another when searching for the declaration corresponding to a reference. A new pointer, the ACCESS LINK, is added to the AR. If procedure P is nested inside procedure Q in the program, then the access link in P’s AR should point to the access link in Q’s AR. 12

13

13

Access links How to find a non-local reference using access links? Suppose procedure P

Access links How to find a non-local reference using access links? Suppose procedure P at nesting depth np refers to a nonlocal “a” with nesting depth na <= np. We find the storage for variable a as follows: 1. When control is in P, there must be an AR for P on top of the stack. We follow np - na access links. 2. After following np - na access links, we have the correct AR. The storage for a is some fixed offset relative to the beginning of that AR. 14

Setting up access links At compile time, non-local references are represented by the pair

Setting up access links At compile time, non-local references are represented by the pair (np-na, offset). We need to set up the access links at procedure call time. Suppose procedure P at depth np calls procedure X at depth nx. The resulting code depends on whether the called procedure is nested within the caller or not. 1. Case np < nx : this means X is nested more deeply then P, so X’s access link just needs to point to P’s AR. 2. Case np >= nx : this means X is at the same level or an outer scope. We have to find the common ancestor of P and X. This will be np-nx+1 access links from P. 15

Parameter Passing 16

Parameter Passing 16

Parameter Passing Parameters are the most common way for a calling procedure to communicate

Parameter Passing Parameters are the most common way for a calling procedure to communicate with the callee. Different languages have different parameter semantics. Mostly, the differences lie in whether an l-value or rvalue or text of the actual parameter is passed. We consider four protocols: − Call by value − Call by reference − Copy-restore − Call by name 17

Call by value This is the simplest parameter passing method. The caller computes r-values

Call by value This is the simplest parameter passing method. The caller computes r-values for the actuals. The caller places the resulting values on the stack, in the AR of the callee. The callee may change the parameters, but this has no effect on the caller. This is the default protocol in Pascal, and the ONLY protocol in C. 18

Parameter passing example 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11)

Parameter passing example 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) program reference( input, output ); var a, b: integer; procedure swap( var x, y: integer ); var temp : integer; begin temp : = x; x : = y; y : = temp; end; begin a : = 1; b : = 2; swap( a, b ); writeln( ‘a = ‘, a ); writeln( ‘b = ‘, b ) end. Specifies callby-reference 19

Call by reference The caller passes the called procedure a POINTER to the storage

Call by reference The caller passes the called procedure a POINTER to the storage address of the actual parameter. If the actual has an l-value, it is used. If the actual is an expression, we place the result of the expression in a temporary and pass a pointer to the temporary. Pascal uses call by reference if the “var” keyword is used. C++ uses call by reference if the “&” operator is specified. 20

Copy restore This is a hybrid between call-by-value and call-by reference. Before callee is

Copy restore This is a hybrid between call-by-value and call-by reference. Before callee is activated, we evaluate the actuals and put their r-values in the AR for the callee. But we also compute and save the l-values of the actuals. In the return sequence, we copy the updated r-values from the callee’s AR to the location for the saved values. FORTRAN used this approach. 21

Call by name (macro expansion) In this method, we just substitute the body of

Call by name (macro expansion) In this method, we just substitute the body of the procedure for the procedure call. In the copied body, the formal parameters are replaced by the text of the actuals. #define macros in C/C++ use this technique. 22

Symbol Tables 23

Symbol Tables 23

Symbol table implementation The symbol table stores many kinds of information about names: −

Symbol table implementation The symbol table stores many kinds of information about names: − The NAME itself − STORAGE information − SCOPE information So a symbol table entry is typically a record data type. The table itself could be a simple linear array, or a more complex data structure (hash table, etc. ). 24

The NAME entry Most languages put some bound on the length of ID names.

The NAME entry Most languages put some bound on the length of ID names. If the limit is small, we can place the name in the ST entry itself: typedef struct { char name[MAX_LENGTH+1]; … } t. Symbol. Table. Entry But otherwise, we should use the heap to store the names and simply point to them: typedef struct { char *name; … } t. Symbol. Table. Entry; 25

Storage information The code generator needs to know about the storage required for declared

Storage information The code generator needs to know about the storage required for declared names. Statically allocated variables just have an offset relative to the beginning of the static data area. Each definition needs to reserve space in the static data area and advance a pointer to the next available location. For stack dynamic variables, we need to store the offset of the variable relative to the activation record for the procedure. Heap dynamic variable storage requirements are not known until runtime. 26

Linear list representations We add new ST entries to the end of an array.

Linear list representations We add new ST entries to the end of an array. The array has to be reallocated if it gets too big. Search for an item begins at the end and goes backwards, to ensure we get the most recent declaration of a name. Checking for existence takes n/2 checks on average. For n insertions and e lookups, we have O(n(n+e)) time. Usually e >> n, so we can write O(ne). This running time is generally too large for big programs. 27

Hash table representations of the ST We try to reduce search time to insert

Hash table representations of the ST We try to reduce search time to insert and search the ST with a hash table. OPEN HASHING gives us a run time of O(n(n+e)/m) for any m we desire. The table is an array of m BUCKETS. To determine if s is in the table, we appy a HASH FUNCTION h() to s, such that 0 <= h(s) < m Then we search the linked list for h(s). 28

Hash table representations of the ST Complexity: the average list length is n/m, so

Hash table representations of the ST Complexity: the average list length is n/m, so as long as m is within a constant factor of n, the search takes nearly constant time. For h(s), the simplest method is to add up the ASCII values of the characters in s, divide by m, and take the remainder. There are MANY other techniques. Most modern languages have library support for hash tables (see hcreate()/hsearch()/hdestroy() if you are a C lover). 29

Scope and the ST Each entry in a ST corresponds to a declaration of

Scope and the ST Each entry in a ST corresponds to a declaration of a name. When we look up a name in the ST, we want the entry for the declaration at the correct scope to be returned. The simplest approach is to have a separate hash table for every scope. Another way is to give each procedure a unique number, and append the number to each name, guaranteeing uniqueness. 30

Dynamic Storage Allocation 31

Dynamic Storage Allocation 31

Explicit vs. implicit alloc/dealloc Most languages support dynamic allocation of memory. Pascal supports new(p)

Explicit vs. implicit alloc/dealloc Most languages support dynamic allocation of memory. Pascal supports new(p) and dispose(p) for pointer types. C provides malloc() and free() in the standard library. C++ provides the new and free operators. These are all examples of EXPLICIT allocation. Other languages like Python and Lisp have IMPLICIT allocation. 32

Garbage In languages with explicit deallocation, the programmer must be careful to free every

Garbage In languages with explicit deallocation, the programmer must be careful to free every dynamically allocated variable, or GARBAGE will accumulate. Garbage is dynamically allocated memory that is no longer accessible because no pointers are pointing to it. In some languages with implicit deallocation, GARBAGE COLLECTION is occasionally necessary. Other languages with implicit deallocation carefully track references to allocated memory and automatically free memory when nobody refers to it any longer. 33

Dynamic storage allocation We assume the heap is an initially empty block of memory.

Dynamic storage allocation We assume the heap is an initially empty block of memory. As memory is allocated and deallocated, fragmentation occurs. For allocation, we must find a HOLE large enough to hold the requested memory. For deallocation, we must merge adjacent holes to prevent further fragmentation. 34