ICS 313 Programming Language Theory Module 06 Data

  • Slides: 43
Download presentation
ICS 313: Programming Language Theory Module 06: Data Types (1)

ICS 313: Programming Language Theory Module 06: Data Types (1)

Objectives To understand basic issues in the design and implementation of typical data types.

Objectives To understand basic issues in the design and implementation of typical data types. (2)

Central Goal of Typed Data To model the real-world problem space as closely and

Central Goal of Typed Data To model the real-world problem space as closely and efficiently as possible. Evolution: • Fortran-I: Numeric and Array; floating point modeled with ints • PL 1: everything for everyone • Algol: few basic types and user definitions • Simula, Java: Entities modeled with classes Evolutionary progression: • Association of data with functions. • Abstractions for maintaining/assessing interdependencies automatically. (3)

Primitive and Structured Data Types Primitive Data Types: • Data types not defined by

Primitive and Structured Data Types Primitive Data Types: • Data types not defined by other types. -Reflect hardware support directly or with minor software support. -Examples: integers, floats, strings, etc. Structured Data Types: • Primitive types + “type constructors” (4)

Built In Primitives (These are the options: they aren’t all built in to all

Built In Primitives (These are the options: they aren’t all built in to all languages) (5)

Numeric Types: Integer Only primitive data type found in early languages (except Lisp) Integer:

Numeric Types: Integer Only primitive data type found in early languages (except Lisp) Integer: • Different sizes possible: 1 -8 bytes • Arbitrarily large in Lisp • Representation: string of bits -leftmost can represent the sign -twos complement better for computer math • Direct support in hardware. (6)

Numeric Types: Floating Point: • Approximations for real numbers. • Typically stored in binary

Numeric Types: Floating Point: • Approximations for real numbers. • Typically stored in binary (base 2) -means 0. 1 cannot be represented exactly! • Two levels of accuracy -Real (typically four bytes, 1/8/23) -Double (typically eight bytes, 1/11/52) • Representation (IEEE Standard): (7) 23 or 52 1 8 or 11 Sign Exponent Fraction Range Precision (and range) Try this in Python: x = 3. 4 x

Numeric Types: Decimal types: • Store a fixed number of decimal digits with decimal

Numeric Types: Decimal types: • Store a fixed number of decimal digits with decimal point in fixed position. • Mandatory for business application process. • Business (mainframes) have hardware support. • Others implement decimal using integers and software. Representation: • 1 -2 digits encoded into each byte. • Example: 9352. 14 in three bytes 1001 0011 0101 0010. 0001 0100 (8)

Numeric Types: Boolean Simplest type: two values (true and false). • Requires only one

Numeric Types: Boolean Simplest type: two values (true and false). • Requires only one bit to implement. • Typically implemented as a byte. • Included in most languages since Algol. Some languages do not have Boolean type: • In C (and C++): -0 is false, all other numeric values true. • Lisp: -“nil” false, all others true. • Python extends these conventions to many primitives: -0, ‘’, (), [], {} are all false. • Scheme uses a mixture of Boolean and others: - #f false, #t and all others true (9)

Characters and Strings Character type: • Stored as numeric encodings -ASCII popular, but limited

Characters and Strings Character type: • Stored as numeric encodings -ASCII popular, but limited to 127 chars. -UNICODE used in Java, ASCII superset, most natural language characters. Character strings • Design issues -Are strings a primitive or structured data type (i. e. array of chars) -Are strings fixed or variable length? (10)

String Operations Required operations: • equality, concat, <, >, substring, etc. Pascal, Ada: •

String Operations Required operations: • equality, concat, <, >, substring, etc. Pascal, Ada: • strings are predefined as array of chars. • built-in string operations • Index slice (like immutable strings in Python: s[3: 7]) C, C++: • strings are implemented as an array of chars with special null character terminator. • library package provides string operations. Scheme: Strings are primitive constants Java: String immuatble, String. Buffer mutable (11)

Pattern Matching Built in (PERL, SNOBOL 4, ICON) versus Library (Python, …) Examples (PERL,

Pattern Matching Built in (PERL, SNOBOL 4, ICON) versus Library (Python, …) Examples (PERL, Python): What do these represent? • /[A-Za-z][A-Za-zd]*/ • /d+. ? d*|. d+/ (12)

String Length Choices Static: • Length specified in declaration of string. • Example: Fortran.

String Length Choices Static: • Length specified in declaration of string. • Example: Fortran. (Blank fill) Limited dynamic: • Maximum length specified in declaration. • Example: C, C++ (null character ends) Dynamic: • No length specified; shrinks/grows as needed. • Example: Common Lisp, Java (13)

String Implementation Static (used only at compile time): Length Address “The String” Limited Dynamic

String Implementation Static (used only at compile time): Length Address “The String” Limited Dynamic (run time, except in C which uses null, doesn’t check): Curr Len. Max Len. Address “The String” Dynamic (used at run time): Cur. Len (14) Address “The String”

String in memory “The String” How is this allocated? Linked list: • Faster allocation/deallocation

String in memory “The String” How is this allocated? Linked list: • Faster allocation/deallocation as size changes • Slower operations • More memory Contiguous storage • Slower reallocation during growth • Faster operations (15)

User Defined Primitives (16)

User Defined Primitives (16)

Enumeration Types All possible (symbolic) values are explicitly stated in the type declaration: •

Enumeration Types All possible (symbolic) values are explicitly stated in the type declaration: • type WEEKEND = (Sat, Sun); • type DAY = (Mon, Tue, Wed, Thu, Fri, Sat, Sun); • Of what type is ‘Sat’? (overloaded literal) -Pascal, C don’t allow it -Ada allows it Advantages of enumerated types over “numeric encoding” (i. e. Sat = 1, Sun = 2): • Provides greatly increased readability. • Prevents use of inappropriate operations or values Implemented w/Integers, range checks (17)

Subrange Types Subsequence of ordinal, e. g. : • Pascal: index = 1. .

Subrange Types Subsequence of ordinal, e. g. : • Pascal: index = 1. . 100 • Python: for x in range(10) Subtype: • Restricted range of type • Compatible with parent Derived type: • Also restricted range, but not compatible Good for readability and reliability Implemented like parent with range checks (18)

Structured Types Most of these are built in types, although in the case of

Structured Types Most of these are built in types, although in the case of records (structures) and pointers the programmer then uses them to define specialized types (19)

Arrays A homogeneous aggregate of typed data elements with elements identified by position. Issues:

Arrays A homogeneous aggregate of typed data elements with elements identified by position. Issues: • Syntax: A(i), A[i] • Subscript types: allow any ordinal type? -Definition: A[DAY] -Use: A[Mon] • Range checked? (20)

Array Categories Static arrays: • Subscript ranges (and data element types) are statically bound

Array Categories Static arrays: • Subscript ranges (and data element types) are statically bound and storage allocation done at compile-time • FORTRAN up to 77 • Most time efficient, can waste memory Fixed stack-dynamic arrays: • Subscript ranges/element types statically bound but allocation done at run time. • Supports re-use of large array spaces. • Pascal, C (21)

Array Categories (cont. ) Stack-dynamic arrays: • Subscript ranges bound and storage allocated at

Array Categories (cont. ) Stack-dynamic arrays: • Subscript ranges bound and storage allocated at runtime, but constant for lifetime of variable. Heap-dynamic arrays: • Subscript ranges bound and storage allocated at runtime and can change. • Allows greatest flexibility (array can grow or shrink. ) • Java Vector • Least efficient. (22)

Array Operations Operate on array as unit. Some languages provide no array operations. Examples

Array Operations Operate on array as unit. Some languages provide no array operations. Examples of operations: • Assignment • Concatenation • Relational operations • Pair-wise +, -, *, / • Operations on Slices (FORTRAN 90, Python) APL is the most radical programming language for array processing • Array reversal, transposition, inversion (23)

Array Implementation For 1 -based, 1 -dimensional array: • address(A[k]) = (address(A[1]) - element_size)

Array Implementation For 1 -based, 1 -dimensional array: • address(A[k]) = (address(A[1]) - element_size) + (k * element_size) Issue: when is array element address computed? • Static arrays: -element_size and address(A[1]) computed at compile time. -Run-time computation: • Other (24) - address(A[k]) = k * constant array types require lookup of A[1] at run-time.

Multidimensional Arrays Map to linear memory: a b c • Row-major storage (most languages):

Multidimensional Arrays Map to linear memory: a b c • Row-major storage (most languages): d e f -lowest value of first subscript stored first g h i -a b c d e f g h i • Column-major storage (FORTRAN): -lowest value of last subscript stored first -a d g b e h c f i For 1 -based, 2 -dimensional array in row-major order: • address(A[i, j]) = address(A[1, 1]) + ((((i - 1) * n)) + (j - 1)) * element. Size Why should a programmer care? • Large arrays may cross page boundaries in virtual memory • Access cells in the wrong order and you create a lot of swapping (25)

Associative Arrays Also known as Hash tables • Index by key (part of data)

Associative Arrays Also known as Hash tables • Index by key (part of data) rather than value • Store both key and value (take more space) • Best when access is by data rather than index Examples: • Lisp alist: -((key 1. data 1) (key 2. data 2) (key 3. data 3) • Python Dictionary: -{key 1 : data 1, key 2 : data 2, key 3 : data 3} • Java: -Java. util. Hashtable (26)

Sets Useful to shorten booleans: • If x in set … Implemented as primitive

Sets Useful to shorten booleans: • If x in set … Implemented as primitive only in Pascal • Stored as bitstring in one word • Implementation dependent limit on size • Efficient intersection, union, equality Some languages supply set operations applied to lists (Common Lisp, Prolog). Java provides interface java. util. Set (27)

Record types A heterogeneous aggregate of typed data elements with elements identified by name.

Record types A heterogeneous aggregate of typed data elements with elements identified by name. Operations: • assignment • equality • assign corresponding fields. Implementation: • Simple and efficient, because field name references are literals bound at compile-time. • Use offsets to determine address. (28)

Record types Examples: • COBOL Records: -NAME OF EMPLOYEE -MOVE CORRESPONDING • Pascal Records:

Record types Examples: • COBOL Records: -NAME OF EMPLOYEE -MOVE CORRESPONDING • Pascal Records: -employee. name -with employee do … name • Ada also has records, uses • Common Lisp “Structures”: -(employee-name …) • C also has structures, uses • Java: use Classes instead (29) EMPLOYEE TO REPORT = … dot notation

Union types Allow different types of values to be stored at different times during

Union types Allow different types of values to be stored at different times during execution. Often used in records (e. g. , Pascal record variants) Example: • Table of symbols and values • Each value may be int, real, or string. • Which would you prefer? symbol int_value real_value string_value symbol value (max) Implementation: • Allocate for largest variant • Discriminated unions include tag field to indicate type (30)

Union Type Evaluation Advantages: • Union types provide storage efficiency. • Get around overly

Union Type Evaluation Advantages: • Union types provide storage efficiency. • Get around overly restrictive type system • Pointer arithmetic in language that does not support it directly (access pointer as if int) Disadvantages: • Are more difficult to type check. • May require run-time type checking. • May lead to lack of any type checking. Unnecessary in OOL like Java (why? ) and functional languages (like ML) that support polymorphism and compile-time type checking. (31)

Pointer Types Pointer variables values are memory addresses or one distinguished value (nil). Pointers

Pointer Types Pointer variables values are memory addresses or one distinguished value (nil). Pointers provide two capabilities: • Support indirect addresssing. • Enable dynamic memory management. Will give example of binary trees in FORTRAN and pointers Note: heap dynamic variables have no name and must be referenced by pointer variables. foo_ptr FF 03 (32) “The String”

Fundamental Pointer Operations Assignment: • Sets pointer variable to address of an object. -Direct

Fundamental Pointer Operations Assignment: • Sets pointer variable to address of an object. -Direct addressing: assignment done implicitly during variable initialization. -Indirect addressing: requires an operator that takes an object and returns its address. (ptr = &object in C) (Reference: ) • Occurrence of pointer variable indicates its own address, just as with other variables (ptr). Dereference: • Occurrence of pointer variable indicates address of object whose address is the value of the pointer variable. (*ptr in C) (33)

Pointer Examples Let’s diagram this C: • int *ptr; • int i, j; •

Pointer Examples Let’s diagram this C: • int *ptr; • int i, j; • i = 3; • ptr = &i; • *ptr = 4; • j = *ptr; // compare to j = ptr Pointer Arithmetic in C • double a[10]; • index = 3; • ptr = a; // assigns address(a[0]) • ptr = ptr + index; // increments by as many words as needed to skip one array element Pointers to Records: • (*ptr). name is same as ptr -> name in C • ptr^. name in Pascal (34)

Pointer Problems Type checking: • If a pointer is allowed to point to more

Pointer Problems Type checking: • If a pointer is allowed to point to more than one type of object, then static type checking is no longer possible (as in C, Lisp). (35)

Dealing with Type Checking Solution: • Force all pointers to be typed (in terms

Dealing with Type Checking Solution: • Force all pointers to be typed (in terms of the object to which they are dereferenced) • Example: FORTRAN 90 Limits prime use of pointers: • Polymorphism (void * in C) (36)

Pointer Problems (cont. ) Dangling Pointers: • When a pointer points to an object,

Pointer Problems (cont. ) Dangling Pointers: • When a pointer points to an object, but the object has been deallocated. Can occur when: • The object goes out of scope but the pointer does not. A contrived example … ptr 1 = &ptr 2 call foo(ptr 1) in which *ptr 1 = local. Object after return, try *ptr 2 • The object is explicitly deallocated ptr 1 = new Object(); ptr 2 = ptr 1; destroy(ptr 1); *ptr 2 … (37)

Dealing with Dangling Pointers Four strategies: • Disallow (in language) explicit deallocation. • Ignore

Dealing with Dangling Pointers Four strategies: • Disallow (in language) explicit deallocation. • Ignore (in compiler) explicit deallocation. -Then pointers will never point to nothing (but space will never be reclaimed). • Allow deallocation, reset other -Incurs run-time overhead. pointers. - Tombstones - Locks and keys • Allow deallocation and trust the programmer. -Efficient but allows dangling pointers. (38)

Pointer Problems (cont. ) Lost objects (garbage): • When all pointers to a dynamic

Pointer Problems (cont. ) Lost objects (garbage): • When all pointers to a dynamic variable are removed, so that the variable’s value can no longer be referenced but the space is still allocated. ptr 1 = new Object(); … ptr 1 = new Object(); • Common when beginners think that every declaration needs a value. • Results in “memory leaks” (memory fills up) (39)

Dealing with Lost Objects The lost object problem can be solved if the language

Dealing with Lost Objects The lost object problem can be solved if the language implements automatic storage management. (Java and Lisp) Two approaches: Reference counting (“eager” approach): • Object maintains a counter of how many pointers reference it, when counter is decremented to zero, the object is deallocated. • Reference counting incurs significant overhead on each pointer assignment, but the overhead is distributed throughout the session. (40)

Dealing with Lost Objects (cont. ) Garbage collection (“lazy” approach): • Wait until all

Dealing with Lost Objects (cont. ) Garbage collection (“lazy” approach): • Wait until all storage is allocated, then collect the garbage • Mark and Sweep GC: -Mark all objects in heap as garbage. -Follow all pointers through heap and reset mark on all objects encountered. -Deallocate all remaining marked objects. Problems with Mark and Sweep GC: • Causes the system to “halt” during GC. • Most time-consuming when you really need it. “Ephemeral” GC overcomes these problems. • Runs before you need it • Generations according to object age (so only part of memory is searched) (41)

Pointer Commentary “Their introduction into high-level languages has been a step backward from which

Pointer Commentary “Their introduction into high-level languages has been a step backward from which we may never recover. ” (C. Hoare, 1973). “Pointers are thought by many to be essential in imperative languages. ” (R. Sebesta, 1996) “Java has no pointer data type. ” (P. Johnson, 1999) “… it remains to be seen … (R. Sebesta, 2002) Java References • Assignment to (heap dynamic) objects (class instances) • No dereferencing, so no dangling pointers • Runtime system manages memory, so no lost objects • No pointer arithmetic: meaningless (42)

End of module 06 (43)

End of module 06 (43)