Data Types and the Type System James Brucker

Important Topics n n n n what is a "type system"? what are common data types? how are numeric types stored and operated on? compound types: arrays, struct, records enumerated types character and string types strong type checking versus not-so-strong n enumerations, type compatibility, and type safety advantages & disadvantages of compile-time checking

Important Topics n n type compatibility n what are compatibility rules in C, C++, and Java? n when are user-defined types compatible? type conversion n what conversions are automatic in C, C++, Java? n what conversions are allowed using a cast?

Importance of knowing data types § Need to know the valid range of data values. for(int k=0; k<9999999; k++) /* do something */; int k = Integer. MAX_VALUE; // = 2, 147, 483, 647 § Need to know rules for operations. int k k = k int m float = + = x Integer. MAX_VALUE; 1; 7, n = 4; = m / n; // = 2, 147, 483, 647 // overflow? // 1. 75, 2. 0 or 1. 0 ? § Need to know what assignments are valid and how the compiler will convert from one type to another. § Need to know what variables represent: value of data or a reference to a storage location.

Data Type A data type is a set of possible values and operations on those values Example: int n set of values: -2, 147, 483, 648. . . , 0, 1, 2, 147, 483, 647 ( -231 to 231 - 1 ) operations: +: int * : int etc. internal representation: 32 -bit 2's complement

Data Types define meaning n n n To the computer, a stored value is just bits. The data type assigns meaning to those bits. Example C function: /* "An int is a unsigned is a float" --the cpu */ void rawdata( ) { union { int i; unsigned int u; float f; } x; while( 1 ) { printf("Input a value: "); scanf("%d", &x. i); // read as an int printf("int %d is unsigned %u is float %gn", x. i, x. u, x. f); } }

Type System The type system is n the collection of all data types n rules for type equivalence, type compatibility, and type conversion between data types

Type System Example: int n = 0. 5 * 99; n cannot directly multiply a float times int n type conversion rule: "int" can be automatically converted to "float". n type system says float * float is float (49. 5) n in C, assignment compatibility rule says that you can convert float back to int by truncation. n in Java or C#, result is double and it is not assignment compatible with int (assignment error)

Memory Concepts n n n We will cover memory management later, but first. . . When OS runs a program it allocates at least 2 memory segments: n text segment for program instructions (Read Only) n data segment for data (variables, constants, . . . ) Data segment is divided into 3 parts: n static area - static data n stack area - stack oriented data n heap - dynamic, non-stack data

Memory Concepts int count = 0; const int MAXSIZE = 4000; int *getarray(int n) { int *a = (int *)malloc( n*sizeof(int) ); return a; } int main( ) { int size; scanf("%d", &n); int *a = getarray(n); scanf("%f", a); a[0], a[1], . . . Heap Area unused space Stack Frame for getarray Stack Area Stack Frame for main size a (pointer only) count MAXSIZE Static Area

Virtual Memory n n n Most OS use virtual memory. The actual location of memory pages varies. Accessing memory efficiently affects program speed. Program virtual Real memory page n+1 page n Memory manager page n+1

Integer Data Types C/C++ support both “unsigned” and “signed” integer types. Type # Bytes. Range of values short int 2 -32, 768 (-215) to 32, 767 (215 - 1) unsigned short 2 0 to 65, 535 (216 - 1) int 4 -2, 147, 483, 648 (-231) to 2, 147, 483, 647 (231 - 1) unsigned int 4 0 to 4, 294, 967, 295 (232 - 1) long int same as "int" on Pentium and Athlon CPU C permits “char” type for integer values, too… char 1 -128 to 127 unsigned char 1 0 to 255

Example use of unsigned int n To display the address of a variable in C: printf("address of %s is %dn", "x", (unsigned int)&x);

IEEE 754 Floating Point Standard n n n Problem: some numerical algorithms would run on one computer, but fail on another computer. with arithmetic overflow/underflow error on another. Worse problem: results from different computers could differ greatly! n This reduced trust in the answer from computer. n In fact, when numerical results differ greatly it usually indicates a problem in the algorithm! Solution: IEEE 754 (1985) defines a standard for computer storage of floating point numbers.

IEEE Floating Point Data Types -1. 011100 x 211 = 1 Sign bit Float: Double: 1 1 1 0 0 0 1 0 01110000. . . Biased Exponent Mantissa 8 bits bias= 127 11 bits bias=1023 Range Float: Double: 10 -38 - 10+38 10 -308 - 10+308 23 bits 52 bits Precision 24 bits =~ 7 dec. digits 53 bits =~ 15 dec. digits Stored exponent = actual exponent + bias

Implicit Leading "1" n n n Floating point numbers are stored in normalized form: 13. 2525 =1101. 01010 = 1. 1010100 x 23 3/16 =0. 00110000 = 1. 1000000 x 2 -3 Normalized form: the leading digit is always one. So, IEEE 752 doesn't store it. Rule: if the stored value has exp. 2 -bias to 2+bias then the floating point value is stored in normalized format: 1011. 01110 = 1. 011011100 x 23 mantissa: 011011100. . . exponent: 3+bias = 130 (single prec)

Gradual underflow n n To extend the precision for small numbers, very small numbers are not stored in normalized form. In this case the leading "1" is also stored and the biased exponent has value 0 (smallest exponent) Value Mantissa Biased Exp. 1. 01101110 x 2 -126 011011100000000 -126+bias = 1 1. 01101110 x 2 -127 1011011100000000 -127+bias = 0 1. 01101110 x 2 -128 0101101110000000 -127+bias = 0 1. 01101110 x 2 -129 00101101110000000 -127+bias = 0 1. 01101110 x 2 -130 00010110111000000 -127+bias = 0. . . as number gets smaller, leading significant digits shift right 1. 01101110 x 2 -147 0000000000101 -127+bias = 0 1. 01101110 x 2 -148 0000000000010 -127+bias = 0 1. 01101110 x 2 -149 000000000001 -127+bias = 0

IEEE 754 Floating Point Values The standard defines special values: +/-Infinity: 1/0 = +Infinity, -3/0 = -Infinity, exp(5000)= +Infinity, Infinity+Infinity = Infinity Na. N (Not-a-Number). 0/0 = Na. N, Infinity*0 = Na. N, . . . Value Sign Bit Exponent Mantissa Normalized f. p. Denormalized 0, 1 1 to 2*bias 0000 any Zero 0, 1 0000 0 +Infinity -Infinity Na. N 0 1 0, 1 11111111 0 0 any non-0

Floating Point Questions Question: How do you store 2. 50 as a "float"? 2. 50 = 1. 25 x 2 = 1. 0100000 x 21 Implicit leading 1 rule: mantissa = 010000000 Exponent: 1 + bias = 128 Stored value: 0 10000000 010000000000 Question: What is the decimal value of: 1 100000000000000 0 1111 00000000000

Floating Point Questions (cont'd) Question: How do you store 0. 1 as a "float"? 0. 1 = 0. 00110011001100. . . Normalized mantissa = 100110011001100 Exponent: -3 + bias = 124 Stored val: 0 01111100 100110011001100 n 0. 1 has no exact representation in binary! Question: what decimal values have an exact binary representation (no truncation error)? ? ?

Consequence of inexact conversion n n 0. 1 does not have exact binary representation. Therefore, we may have: 10 * 0. 1 != 1. 0 0. 1 + 0. 1 != 1. 0 Don't use "==" as test criteria in loops with floats. This loop never terminates: double x = 0. 1; while( x != 1. 0 ) { // better: ( x <= 1. 0 ) System. out. println( x ); x = x + 0. 1; }

Type Compatibility for built-in types n n n Operations in most languages will automatically convert ("promote") some data types: 2 * 1. 75 convert 2 (int) to floating point Assignment compatibility: what automatic type conversions are allowed on assignment? int n = 1234567890; float x = n; // OK is C or Java n = x; // allowed in C? Java? char -> short -> int -> long -> double short -> int -> float -> double What about long -> float ? Rules for C/C++ not same as Java.

C/C++ Arithmetic Type Conversion n n For +, -, *, /, both operands must be the same type C/C++ compiler "promotes" mixed type operands to make all operands same using the following rules: Operand Types Promote short op int short => int long op int => long int op float int => float int op double int => double float op double float => double etc. . . "op" is any arithmetic operation: + - * / Result int long float double

Assignment Type Conversion is not Arithmetic Type Conversion (1) n What is the result of this calculation? int m = 15; int n = 16; double x = m / n;

Forcing Type Conversion n n Since arguments are integer, integer division is used: double x = 15 / 16; // = 0 ! you must coerce "int" values to floating point. There are two ways: int m = 15; int n = 16; /** Efficient way: cast as a double */ double x = (double)m / (double)n ; /** Clumsy way: multiply by a float (ala Fortran) */ double x = 1. 0*m / n;

Assignment Type Conversion is not Arithmetic Type Conversion (2) n Many students wrote this in Fraction program: public class Fraction { int numerator; int denominator; . . . etc. . . // numerator of the fraction // denominator of the fraction /** compare this fraction to another. */ public int compare. To( Fraction frac ) { double r 1 = this. numerator / this. denominator; double r 2 = frac. numerator / frac. denominator; if ( r 1 > r 2 ) return 1; else if ( r 1 == r 2 ) return 0; else return -1; }

Arrays An array is a series of elements of the same type, with an index, which occupy consecutive memory locations. float x[10]; char [] c = new char[40]; // C: array of 10 “float” vars // Java: array of 40 "char" Array x[ ] in memory: x[0] x[1] x[2] . . . x[9] 4 Bytes = sizeof(float) Array c[ ] in memory : c[0] c[1] . . . c[39]

Array "dope vector" n n In C or Fortran an array is just a set on continuous elements. No type or length information is stored. Some languages store a "dope vector" (aka array descriptor) describing the array. /* C language */ double x[10]; x 01 E 4820 /* Language with dope */ double x[10]; x[0] x[1] x[2] x[3]. . . x double 0 10 01 E 4820 x[0] x[1] x[2] x[3]. . . x[9]

Array as Object n n In Java, arrays are objects: double [ ] x = new double[10]; x is an Object; x[10] is a double (primitive type). x double[ ] +length = 10 x[0] x[1] x[2]. . . x. get. Class( ). to. String( ) returns "[D"

1 -Dimensional Arrays n Element of 1 -D array computed as offset from start: float f[20]; address of f[n] n = address(f) + n*sizeof(float) Some languages permit arbitrary index bounds: n Pascal: var a: array [ 2. . 5 ] of real; n n FORTRAN REAL (100) X REAL (2: 5) Y array is X(1). . . X(100) array is Y(2). . . Y(5) In any case, array element can be computed as offset: address of a[n] = address(a) + (n-start)*sizeof(real)

2 -Dimensional Arrays There are different organizations of 2 -D arrays: n Rectangular array in row major order: float r[4, 3]; In memory (row major order): r[0, 0] r[0, 1] r[0, 2] r[1, 0] r[1, 1] r[1, 2] r[2, 0] n Rectangular array in column major order (Fortran): real(4, 3) x in memory (column major order) x(1, 1) x(2, 1) x(3, 1) x(4, 1) x(1, 2) x(2, 2) x(3, 2) x(4, 2) x(1, 3). . .

2 -Dimensional Arrays n Computing address of array elements n Rectangular arrays in row major order: float x[ROWS][COLS]; address of x[j][k] = address(x) + (j*COLS + k) * sizeof(float) n n n Three dimensional array: float y[J][K][L]; address of y[j][k][l] = address(y) + j*K*L + k*L + l 2 -D and 3 -D arrays require more time to access due to this calculation. Compiler can optimize when you access consecutive items for(k = 0; k<COLS; k++) sum += x[j][k];

Arrays of Pointers: ragged arrays n Each element of a vector is a pointer to a vector char *days[7] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = days[0] days[1] days[2] days[3] S d y y 0 t u a 0 0 F u n y W T r r d 0 e h i d a T d u d a y u n r a y 0 e e s y 0 M s s d 0 o d d a S n a a y a days[4] days[5] days[6] days[7] Vector of pointers: 7 x 4 bytes = 28 bytes Array of characters: = 57 bytes

Arrays of Pointers: ragged arrays (2) n Compare previous slide with 2 -D array: char days[ ][10] = { "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" }; days = S M T W T F S u o u e h r a n n e d u i t d d s n r d u a a d e s a r y y a s d y d 0 0 y d a 0 a y 0 y 0 2 -D array = 7 x 10 bytes = 70 bytes

What is sizeof( ) for 2 -D arrays? n Rectangular array in C: char days[7][10] = { "Sunday", "Monday", . . . }; int m = sizeof( days ); int n = sizeof( days[0] ); n Array of pointers in C: char *days[7] = { "Sunday", "Monday", . . . }; int m = sizeof( days ); int n = sizeof( days[0] );

Java: always uses array of pointers n 2 -D arrays in Java are always treated as array of pointers final int N = 10; double [][] a; a = new double[N][ ]; // create row pointers for(int k=0; k<N; k++) a[k] = new double[k+1]; // create columns // array dimensions determined by initial values int [][] m = { { 1, 2, 3, 4}, { 5, 6}, { 8, 9, 10}, { 11 } }; What are the sizes of each row of m ?

C#: rectangular and ragged arrays n A rectangular array in C# (one set of brackets) const int N 1 = 10, N 2 = 20, N 3=25; // 2 -dimensional array double [, ] a = new double[N 1, N 2]; // 3 -dimensional array double [, , ] a = new double[N 1, N 2, N 3]; n A ragged array in C# or Java uses multiple brackets: // create array of row pointers double [][] b = new double[N 1][ ]; // allocate space for each row (can differ) for (k=0; k<N 1; k++) b[k] = new double[N 2]; n In Java (but not C#) can write: b = new double[N 1][N 2]

Accessing Array Elements n Ragged Arrays require multiple levels of dereferencing result = b[i][j]; 1. get address of b. 2. get b[i]. _addr = valueat( address(b) + i*sizeof( b[ ][ ] ) ) 3. result = valueat( _addr + j*sizeof( b[ ][ ] ) ) n Rectangular array computes address as offset: double [ , ] b = new double[N 1, N 2]; result = b[i, j]; 1. get address of b. 2. result = valueat( address(b) +i*N 2 + j ) n In Java and C#, arrays are objects, so address is not this simple.

Efficiency and multi-dimensional array n n Multi-dimensional array access is much slower than 1 -D array. Access in row order is more efficient, and can minimize paging. // search a[ROWS, COLS] in row major order for(int r=0; r<ROWS; r++) for (int c=0; c<COLS; c++) if ( a[r, c] > max ) max = a[r, c]; r[0, 0] r[0, 1] . . . r[0, ROWS-1] r[1, 0] r[1, 1] // search a[ROWS, COLS] in column major order for(int c=0; c<COLS; c++) for (int r=0; r<ROWS; r++) if ( a[r, c] > max ) max = a[r, c]; r[0, 0] r[0, 1] . . . r[0, ROWS-1] r[1, 0] r[1, 1]

Type Checking Verifying that the actual value of an expression is valid for the type to which it is assigned. n A strongly typed language is one in which all type errors are detected at compile time or run time. Example: n Java is strongly typed: most type errors are detected by compiler. Others, like casts, are checked at runtime and generate exceptions: Object obj = new Double(2. 5); String s = obj; // compile time error String s = (String) obj; // run-time Class. Cast. Exception n

Type Compatibility for user types q In C, "typedef" defines an alias for a type -- it doesn't create a new typedef int type_a; typedef int type_b; int main( ) { type_a a; type_b b; b = 5; // assign integer to "type_b" variable OK? a = b; // assign "type_b" to "type_a" variable OK?

Type Compatibility for user types (2) struct A { float x; char c; }; struct B { float x; char c; }; typedef C { float z; char c; }; int main( ) { struct A a; struct B b; struct C c; a. x = 0. 5; a. c = 'a'; b = a; // OK c = a; // Error if (b == a) // OK?

Type Compatibility for classes (3) public class A { public float x; public char c; } public class B { public float x; public char c; } public class C { public float z; public char c; } public static void main(. . . ) { A a = new A(); B b; C c; a. x = 0. 5; a. c = 'a'; b = a; // Error c = a; // Error

Type Conversion and Polymorphism int max( int a, int b) { if ( a > b ) return a; else return b; float max( float a, float b) { if ( a > b ) return a; else return b; int main( ) { int m, n; float x, y, z; x = 5. 5; m = x; z = max(x, y); n = max(m, x); y = max(x, m); // // OK to EASY! which } } convert float to int call max(float, float) max function?

Explicit Polymorphism in C++ /* This template generates "max" functions of * any parameter type that the program needs. */ template <typename T> T max( T a, T b ) { if ( a > b ) return a; else return b; } int main( ) { int m = 4, n = 9; float x = 0. 5, y = 2. 7; n = max(m, m); y = max(x, y); // generate max(int, int) // generate max(float, float)