Chapter 5 Names Bindings Type Checking and Scopes

Chapter 5 Names, Bindings, Type Checking, and Scopes ISBN 0 -321 -33025 -0

Chapter 5 Topics • • • Introduction Names Variables The Concept of Binding Type Checking Strong Typing Type Compatibility Scope and Lifetime Referencing Environments Named Constants 1 -2

5. 1 Introduction • Imperative languages are abstractions of von Neumann architecture – Memory – Processor • Variables characterized by attributes – “Type”, most important – To design, must consider scope, lifetime, type checking, initialization, and type compatibility 1 -3

5. 2 Names • Associated with labels, subprograms, formal parameters, etc. • Design issues for names: – – Maximum length? Are connector characters allowed? Are names case sensitive? Are special words reserved words or keywords? 1 -4

Names (continued) • Length – If too short, they cannot be connotative – Language examples: • FORTRAN I: maximum 6 • COBOL: maximum 30 • FORTRAN 90 and ANSI C: maximum 31 • Ada and Java: no limit, and all are significant • C++: no limit, but implementers often impose one 1 -5

Names (continued) • Forms – A letter followed by a string consisting of letters, digits, and underscore characters(_). – In C-based languages, the underscore is replaced by the “camel” notation (e. g. “my. Stack”). 1 -6

Names (continued) • Case sensitivity – C, C++, and Java names are case sensitive. The names in other languages are not. e. g. in C++, rose, ROSE, Rose are different – Disadvantage: readability (names that look alike are different) • worse in C++ and Java because predefined names are mixed case (e. g. Index. Out. Of. Bounds. Exception) 1 -7

Names (continued) • Special words – An aid to readability • A keyword is a word that is special only in certain contexts, e. g. , in Fortran – Real Apple (Real is a data type followed with a name, therefore Real is a keyword) – Real = 3. 4 (Real is a variable) • A reserved word is a special word that cannot be used as a user-defined name • E. g. in Java – string, int, main, public, void … • Predefined names – names defined in other program units, such as Java packages, and C++ libraries. Can be made visible to a program only if explicitly imported. Once imported, they cannot be 1 -8 redefined.

5. 3 Variables • A variable is an abstraction of a memory cell • Variables can be characterized as a sextuple of attributes: – – – Name Address Value Type Lifetime Scope 1 -9

Variables Attributes • Name - not all variables have them, e. g. a pointer pointing to an integer • Address - the memory address with which it is associated – Variables with the same name may have different addresses at different times during execution e. g. void sub(){ int count; … while (…) { int count; count++; } 1 -10

Variables Attributes – If two variable names can be used to access the same memory location, they are called aliases • E. g. x = max(a, b); … int max(int m, int n){ … } – Aliases are created via pointers, reference variables, C and C++ unions – Two pointer variables are aliases when they point to the same memory location – Aliasing allows a variable to have its value changed by an assignment to a different variable. It is harmful to readability (program readers must remember all of them) 1 -11

Variables Attributes (continued) • Type - determines the range of values of variables and the set of operations that are defined for values of that type; – E. g. Java int: - 2147483678 to 2147483647 • Value - the contents of the memory cells with which the variable is associated – Memory cells – abstract memory cells – E. g. a floating point value may occupy 4 physical bytes, we think of it as occupying a single abstract memory cell 1 -12

5. 4 The Concept of Binding • The l-value of a variable is its address, because that is required when a variable appears in the left side of an assignment statement. • The r-value of a variable is its value • To access the r-value, the l-value must be determined first • A binding is an association, such as between an attribute and an entity, or between an operation and a symbol • Binding time is the time at which a binding takes place. 1 -13

Possible Binding Times • Language design time -- bind operator symbols to operations, e. g. * means multiplication • Language implementation time-- bind floating point type to a representation • Compile time -- bind a variable to a type in C or Java • Load time -- bind a FORTRAN 77 variable to a memory cell (or a C static variable) • Runtime -- bind a nonstatic local variable to a memory cell 1 -14

Possible Binding Times • count = count + 5; compile time – The type of count is bound at ____ – The set of possible values of count is bound at compiler design ________time – The meaning of the operator symbol + is bound compile at ______time, when the type of its operands have been determined – The internal representation of the literal 5 is design bound at compiler ________time execution – The value of count is bound at ______time with this statement 1 -15

Static and Dynamic Binding • A binding is static if it first occurs before run time and remains unchanged throughout program execution. • A binding is dynamic if it first occurs during execution or can change during execution of the program 1 -16

Type Binding • Before a variable can be referenced, it must be bound to a data type. – How is a type specified? – When does the binding take place? 1 -17

Variable Declarations • explicit declaration: a program statement that lists variable names and specifies that they are a particular type • implicit declaration: a default mechanism for specifying types of variables (the first appearance of the variable in the program) • Most languages require explicit declaration • FORTRAN, PL/I, BASIC, Java. Script, vb. Script, and Perl provide implicit declarations 1 -18

Variable Declarations • e. g. vb. Script dim a, b a = 10 b = “Hello!” … function m(){ … m = result } – Advantage: writability, convenient in programming – Disadvantage: reliability e. g. variables that are accidentally left undeclared by the programmer are given default types and unexpected attributes 1 -19

Variable Declarations • Less problem with Perl – If a name begins with • $: it is a scalar, which can store either a string or a numeric value • @: it is an array • %: it is an hash structure 1 -20

Dynamic Type Binding • A variable is bound to a type when it is assigned a value in an assignment statement • (Java. Script, vb. Script and PHP) • Specified through an assignment statement e. g. , Java. Script list = [2, 4. 33, 6, 8]; list = 17. 3; – Advantage: flexibility (generic program units) – Disadvantages: • High cost (dynamic type checking and interpretation) • Type error detection by the compiler is difficult • Has to be implemented using pure interpreters 1 -21

Type Inference • Rather than by assignment statement, types are determined from the context of the reference (ML, Miranda, and Haskell) • E. g. fun circumf(r) = 3. 14159 *r*r; - r and result are floating point fun time 10(x) = 10 * x - x and result are int 1 -22

Storage Bindings & Lifetime • Allocation - getting a cell from some pool of available cells • De-allocation - putting a cell back into the pool • The lifetime of a variable is the time during which it is bound to a particular memory cell • Scalar (unstructured) variables can be separated into four categories: static, stack-dynamic, explicit heap-dynamic and implicit heap-dynamic • Storage allocation http: //lambda. uta. edu/cse 5317/notes/node 33. ht ml 1 -23

Categories of Variables by Lifetimes • Static – bound to memory cells before execution begins and remains bound to the same memory cell throughout execution, e. g. , all FORTRAN 77 variables, C static variables, global variables – Advantages: • efficiency (direct addressing, see supplementary material- memory addressing “http: //www. cs. helsinki. fi/u/kerola/tito/koksi_doc/ memaddr. html”) • history-sensitive subprogram support – Disadvantage: lack of flexibility (no recursion), storage cannot be shared among variables 1 -24

Categories of Variables by Lifetimes – C and C++ static variables (see supplementary material “Using the Static Keywordhttp: //www. cprogramming. com/tutorial/statick eyword. html”) 1 -25

Categories of Variables by Lifetimes • Stack-dynamic – Storage bindings are created for variables when their declaration statements are elaborated (gets executed at run time). – Types are statically bound – Does not matter where the declarations occur – Allocated from run-time stack – E. g. local variables – In Java, C++ and C#, variables defined in methods are by default stack-dynamic – Advantage: allows recursion; conserves storage – Disadvantages: • Overhead of allocation and de-allocation • Subprograms cannot be history-sensitive • Inefficient references (indirect addressing) 1 -26

Categories of Variables by Lifetimes • Explicit heap-dynamic – Allocated and de-allocated by explicit directives, specified by the programmer, which take effect during execution – Heap is a collection of storage cells whose organization is highly disorganized because of the unpredictability of its use – Referenced only through pointers or references variables, e. g. dynamic objects in C++ (via new and delete), all objects in Java – Has two variables associated with it: a pointer or reference variable through which the heap-dynamic variable can be accessed, and the heap-dynamic variable itself e. g. int *intnode; //create a pointer that points to an interger intnode = new int; //create the heap-dynamic variable delete intnode; //de-allocate the heap-dynamic variable 1 -27

Categories of Variables by Lifetimes – Java: all data except the primitive scalars are objects. Java objects are explicitly heap-dynamic and are accessed through reference variables. Implicit garbage collection is used – C#: has both explicit heap-dynamic and stackdynamic objects, all of which are implicitly deallocated. Also supports C++ style pointers to interoperate with C and C++ components. – Advantage: provides for dynamic storage management – Disadvantage: inefficient and unreliable 1 -28

Categories of Variables by Lifetimes • Implicit heap-dynamic – Bound to heap storage only when they are assigned values – all variables in APL; all strings and arrays in Perl and Java. Script – E. g. list = [3. 5, 4. 1] list = 47 – Advantage: flexibility – Disadvantages: • Inefficient, because all attributes are dynamic • Loss of error detection 1 -29

5. 5 Type Checking • Type checking is the activity of ensuring that the operands of an operator are of compatible types • Generalize the concept of operands and operators to include subprograms and assignments – E. g. int a, b, x; a = 4; b = 5; x = max(4, 5); … float max(float m, float n { …} string result; int a, b; a = 4; b = 5; result = a + b; 1 -30

Type Checking • A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler- generated code, to a legal type – This automatic conversion is called a coercion. e. g. int a = 2; float b = 1. 5 b = a + b; a = a + b; • A type error is the application of an operator to an operand of an inappropriate type 1 -31

Type Checking • Static type checking – consistency can be checked before the program is run. All variables must be given a type • Dynamic type checking – type checking at run time. Dynamic type binding. • If all type bindings are static, nearly all type checking can be static • If type bindings are dynamic, type checking must be dynamic (e. g. Java. Script) • Advantage of static type checking – potential problems can be identified earlier • Advantage of dynamic type checking - flexibility 1 -32

5. 6 Strong Typing • A programming language is strongly typed if type errors are always detected, either at compile time or run time • Fortran 95: not strongly typed. The use of “Equivalence” allows a variable of one type to refer to a value of different type • Ada: nearly strong typed. Allows programmers to suspend type checking for a particular type conversion • C, C++: not strongly typed. “Union” types are not type checked • Java, C#: strongly typed • Languages with a great deal of coercion, like Fortran, C and C++ are significantly less reliable than those with little coercion, such as Ada, Java and C#. 1 -33

5. 7 Type Compatibility • Name type compatibility means the two variables have compatible types if they are in either the same declaration or in declarations that use the same type name e. g. int a, b; and int a; int b; 1 -34

Type Compatibility • Easy to implement but highly restrictive: – Subranges of integer types are not compatible with integer types e. g. type Indextype is 1. . 100 count: Integer index: Indextype – Formal parameters must be the same type as their corresponding actual parameters. If a structured type is passed among subprograms through parameters, such a type must be defined only once, globally. A subprogram cannot state the type of such formal parameters in local terms 1 -35

Type Compatibility • Structure type compatibility means that two variables have compatible types if their types have identical structures e. g. Apple Pear Apple a = new Apple(); Pear p = new Pear(); { string color; float weight; string type; } • More flexible, but harder to implement. Entire structures of the two types must be compared. The comparison is not always simple 1 -36

Type Compatibility • Consider the problem of two structured types: – Are two record types compatible if they are structurally the same but use different field names? – Are two array types compatible if they are the same except that the subscripts are different? (e. g. [1. . 10] and [0. . 9]) – Are two enumeration types compatible if their components are spelled differently? – With structural type compatibility, you cannot differentiate between types of the same structure e. g. type celsius is new Float; type fahrenheit is new Float; Although compatible in structure, but shouldn’t be. 1 -37

Type Compatibility • Ada: uses name type compatibility, but provides two type constructs, subtypes and derived types. – A derived type is a new type that is based on some previously defined type with which it is incompatible, although it may have identical structure e. g. type celsius is new Float; type fahrenheit is new Float; derived from Float, incompatible with each other and with Float – A subtype is a possibly range-constrained version of an existing type. Compatible with its parent type. e. g. subtype Small_type is Integer range 0. . 99; variables of Small_type is compatible with Integer 1 -38

Type Compatibility • C: uses structure type compatibility for all types except structures and unions • C++ uses name type compatibility 1 -39

5. 8 Scope • The scope of a variable is the range of statements over which it is visible • A variable is visible in a statement if it can be referenced in that statement • The nonlocal variables of a program unit are those that are visible but not declared there 1 -40

Scope • Static Scope – Binding names to nonlocal variables – The scope of a variable can be determined prior to execution – Two categories of static-scoped languages • Subprograms can be nested (Ada, Java. Script, PHP) • Subprograms cannot be nested (C-based languages) 1 -41

Scope • To decide the scope, the reader must find the declaration of the variable • Search process: search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name, e. g. find the scope of X in Sub 1 Procedure Big is X: Integer; procedure Sub 1 is begin …X… end; procedure Sub 2 is X: Integer; begin … end 1 -42

Scope • Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent • Variables can be hidden from a unit by having a "closer" variable with the same name. e. g. void sub(){ int count; while (…){ int count: count++; } } Legal in C and C++, illegal in Java and C# 1 -43

Scope • In Ada, hidden variables from ancestor scopes can be accessed with selective references. E. g. Big. X • C-based languages do not allow subprograms to be nested inside other subprogram definition, but they have global variables. These variables are declared outside any subprogram definition. Local variables can hide these globals. In C++, such hidden variables can be accessed using the scope operator : : , e. g. class_name: : variable. Name 1 -44

Blocks • A method of creating a section of code to have its own local variables whose scope is minimized. Such a section of code is called a block. • Variables are stack dynamic, so they have their storage allocated when the section is entered and de -allocated when the section is exited. • C-based languages allow any compound statements (a statement sequence surrounded by matched braces) to have declarations and thus define a new scope. • Blocks are treated exactly like those created by subprograms. References to variables in a block that are not declared there are connected to declarations by searching enclosing scopes. 1 -45

Blocks • Examples: C and C++: for (. . . ) { int index; . . . } Ada: declare LCL : FLOAT; begin. . . end • C++ allows variable definitions to appear anywhere in functions. When a definition appears at a position other than the beginning of a function, but not within a block, that variable’s scope is from its definition to the end of the function. • In C, all data declarations in a function but not in blocks within the function must appear at the beginning of the function. • In classes of C++, Java and C#, instance variables’ scope is the whole class. The scope of a variable defined in a method starts at the definition. 1 -46

Evaluation of Static Scoping • Static scoping provides nonlocal access, but also has problems. • Assume MAIN calls A and B A calls C and D B calls A and E MAIN A C A B D C B D E E 1 -47

Static Scope Example MAIN A MAIN B C D Potential Calls A E C B D E Desired Calls 1 -48

Static Scope (continued) • Suppose the spec is changed so that E must now access some data in D • Solutions: – Put E in D (but then E can no longer access B) – Move the data from D that E needs to MAIN (but then all procedures can access them possibility of incorrect access) • Overall: static scoping often encourages many global variables 1 -49

Dynamic Scope • Based on calling sequences of program units, not their textual layout • Determined at run time • References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point 1 -50

Scope Example Main - declaration of x SUB 1 - declaration of x. . . call SUB 2. . . MAIN calls SUB 1 calls SUB 2 uses x SUB 2. . . - reference to x. . . call SUB 1 … 1 -51

Scope Example • Static scoping – Reference to x is to MAIN's x • Dynamic scoping – Reference to x is to SUB 1's x 1 -52

Evaluation of Dynamic Scoping • Problems – From when subprogram begins till the execution ends, the local variables of the subprogram are all visible to any other executing subprogram. There is not way to protect local variables. Less reliable than static scoping. – Inability to statically type check references to nonlocals – Poor readability since it’s based on calling sequence – Longer access time • Merit – No need to pass parameters 1 -53

Scope and Lifetime • Scope and lifetime are sometimes closely related, but are different concepts • Consider a static variable in a C or C++ function – A static variable declared within a function • Scope: function • Lifetime: entire execution of the program – Another example: Scope of sum: compute() Lifetime: compute() + printheader() void printheader(){…} void compute(){ int sum; … printheader(); } 1 -54

Referencing Environments • The referencing environment of a statement is the collection of all names that are visible in the statement • In a static-scoped language, it is the local variables plus all of the visible variables in all of the enclosing scopes 1 -55

Referencing Environments Procedure Example is A. B: Integer; … procedure Sub 1 is X, Y : Integer begin … ---- point 1 end; procedure Sub 2 is X : Integer … procedure Sub 3 is X : Integer begin … ---- point 2 end; begin … ---- point 3 end; begin … end; Referencing environments Point 1: X and Y of Sub 1, A and B of Example Point 2: X of Sub 3, (X of Sub 2 is hidden), A and B of Example Point 3: X of Sub 2, A and B of Example 1 -56

Referencing Environments • A subprogram is active if its execution has begun but has not yet terminated • In a dynamic-scoped language, the referencing environment is the local variables plus all visible variables in all active subprograms 1 -57

Referencing Environments • Main calls sub 2, which calls sub 1 void sub 1(){ int a, b; … <------ point 1 } void sub 2(){ int b, c; … <------ point 2 sub 1(); } void main(){ int c, d; … <------ point 3 sub 2(); } Referencing environments Point 1: a and b of sub 1, c of sub 2, d of main (c of main and b of sub 2 are hidden) Point 2: b and c of sub 2, d of main (c of main is hidden) Point 3: c and d of main 1 -58

Named Constants • A named constant is a variable that is bound to a value only when it is bound to storage • Named constants vs. static variable – Static variable: initiated once, value can change – Named constants: initiated once, value cannot change • Advantages: readability and modifiability – use of PI instead of the constant 3. 14159 – used to parameterize programs 1 -59

Named Constants void example (){ int [] int. List = new int [100]; String[] str. List = new String[100]; … for (index = 0; index < 100; index++){…} … average = sum/100; … } void example (){ final int len = 100; int [] int. List = new int [len]; String[] str. List = new String[len]; … for (index = 0; index < len; index++){…} … average = sum/ len; … } 1 -60

Named Constants • The binding of values to named constants can be either static (called manifest constants) or dynamic • FORTRAN 90: constant-valued expressions (manifest constants) • Ada, C++, and Java: expressions of any kind – E. g. const int result = 2 * width + 1; 1 -61

Variable Initialization • The binding of a variable to a value at the time it is bound to storage is called initialization • Initialization is often done on the declaration statement, e. g. , in Java int sum = 0; 1 -62