Department of Computer Science CS 3304 Comparative Languages
























- Slides: 24

Department of Computer Science CS 3304 Comparative Languages Lecture 10: Simple Data Types 16 February 2012 © 2012 Denis Gracanin

Department of Computer Science Introduction • Most programming languages include a notion of type for expressions and/or objects. • We all have developed an intuitive notion of what types are; what's behind the intuition? CS 3304 Lecture 10: Simple Data Types (16 February 2012) 2

Department of Computer Science Why Data Types? • Implicit context for many operations: • The programmer does not have to specify the context explicitly. • Example: in C, the expressions a+b will use integer addition if a and b are integers, floating point addition if a and b are floating points. • Limit the set of operations that may be performed in a semantically valid program: • Example: prevent from adding a character and a record. • Type checking cannot prevent all meaningless operations. • It catches enough of them to be useful. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 3

Department of Computer Science Type Systems • A type system consists of: 1. A mechanism to defines types and associate them with certain language constructs. 2. A set of rules for type equivalence, type compatibility, type inference: • Type equivalence: when are the types of two values the same? • Type compatibility: when can a value of type A be used in a context that expects type B? • Type inference: what is the type of an expression, given the types of the operands? • Compatibility is the more useful concept, because it tells you what you can do. • Polymorphism results when the compiler finds that it doesn't need to know certain things. • Subroutines need to have types if they are first- or secondclass value. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 4

Department of Computer Science Type Checking • Type checking is the process of ensuring that a program obeys the language’s type compatibility rules. • Type clash: a violation of these rules. • Strong typing means that the language prevents you from applying an operation to data on which it is not appropriate. • Static typing: compiler can check all at compile time. • Examples: • Common Lisp is strongly typed, but not statically typed. • Ada is statically typed, Pascal is almost statically typed. • C less strongly typed than Pascal, e. g. unions, subroutines with variable numbers of parameters. • Java is strongly typed, with a non-trivial mix of things that can be checked statically and things that have to be checked dynamically. • Scripting languages are generally dynamically typed. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 5

Department of Computer Science Polymorphism • Polymorphism allows a single body of code to works with objects of multiple types: • May or may not imply the need for run-time type checking. • Fully dynamic typing: arbitrary operations on arbitrary objects. • Only at run time the check is performed. • Types of objects are implied: implicit parametric polymorphism. • Significant run-time cost and delayed error reporting. • Type inference: infers for every object and expression a type (e. g. , ML). • Subtype polymorphism: allows a variable X of type T to refer to an object of any type derived from T. • Explicit parametric polymorphism (generics): define classes with type parameters. • Dynamic versus static typing. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 6

Department of Computer Science Meaning of “Type” • Collection of values from a “domain” (the denotational approach) • Internal structure of a bunch of data, described down to the level of a small set of fundamental types (the structural approach). • Equivalence class of objects (the implementor's approach). • Collection of well-defined operations that can be applied to objects of that type (the abstraction approach). CS 3304 Lecture 10: Simple Data Types (16 February 2012) 7

Department of Computer Science Classification of Types • Discrete (ordinal) types – countable: integer, boolean, char, enumeration, and subrange. • Scalar (simple) types - one-dimensional: discrete, rational, real, and complex. • Composite types: • • Records (structures). Variant records (unions). Arrays; strings are arrays of characters. Sets: the mathematical powerset of their base types. Pointers: l-values. Lists: no notion of mapping or indexing. Files. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 8

Department of Computer Science Orthogonality • Orthogonality is a useful goal in the design of a language, particularly its type system: • A collection of features is orthogonal if there are no restrictions on the ways in which the features can be combined (analogy to vectors). • For example: • Pascal is more orthogonal than Fortran, (because it allows arrays of anything, for instance), but it does not permit variant records as arbitrary fields of other records (for instance). • Orthogonality is nice primarily because it makes a language easy to understand, easy to use, and easy to reason about. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 9

Department of Computer Science Type Checking • In most statically typed languages every definition of an object must specify the object’s type. • Many of the contexts in which an object might appear are also typed. • Type equivalence, type compatibility, and type inference. • Type compatibility is the most critical. • Objects and contexts are often compatible even when their types are different. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 10

Department of Computer Science Type Equivalence • Two major approaches – structural and name equivalence: • Name equivalence is based on declarations. • Structural equivalence is based on some notion of meaning behind those declarations. The exact definition varies. • Name equivalence is more fashionable these days. • There at least two common variants on name equivalence: • The differences between all these approaches boils down to where you draw the line between important and unimportant differences between type descriptions. • In all three schemes described in the book, we begin by putting every type description in a standard form that takes care of “obviously unimportant” distinctions like those on the next slide. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 11

Department of Computer Science Type Equivalence Example • Certainly format does not matter: struct { int a, b; } • Is the same as: struct { int a, b; } • We certainly want them to be the same as: struct { int a; int b; } • How about this? struct { int b; int a; } CS 3304 Lecture 10: Simple Data Types (16 February 2012) 12

Department of Computer Science Name Equivalence • How about type aliasing? Program dependent? TYPE new_type = old_type; • Examples when not the same: TYPE celisus_temp = REAL; fahrenheit_temp = REAL; VAR c : celsius_temp; f : fahrenheit_temp; . . . f : = c; • Equivalence types: • Strict name equivalence: aliased types considered distinct. • Loose name equivalence: aliased types considered equivalent. • Tricky to implement in the presence of separate compilation. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 13

Department of Computer Science Type Conversion and Casts • Expected and provided types: if different an explicit type conversion (type cast) is needed: • Types structurally equivalent but the language uses name equivalence: the conversion purely conceptual operation. • Types have different sets of values but the intersecting values are represented in the same way: a run time check. • Different low-level representations but some sort of correspondence among their values: machine instructions that effect this conversion. • In C, a type conversion is specified by using the name of the desired type. No run-time checks for arithmetic overflow. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 14

Department of Computer Science Nonconverting Type Casts • Interpreting the bits of a value of one type as if they were another type: • Memory allocation example. • Reinterpreting a floating point number as an integer or record. • Nonconverting type cast (type pun) - a change of type that does not alter the underlying bits: • • Ada: a built-in generic subroutine unchecked_conversion. C++: in addition to C casting, a family of alternatives: • • Type conversion: static_cast. Nonconverting type cast: reinterpret_cast. Manipulating pointers of polymorphic type: dynamic_cast. Removing read-only qualification: const_cast. • A nonconverting type constitutes a dangerous subversion of the language’s type system. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 15

Department of Computer Science Type Compatibility • Instead of type equivalence, a value’s type must be compatible with that of the context in which it appears: • Assignment statement: the right-hand side type must be compatible with that of the left-hand side. • Arithmetic operator operands types must be compatible with some common type that supports the arithmetic operation. • The definition of type compatibility varies: • Ada: type S is compatible with an expected type T if and only if: • S and T are equivalent. • One is a subtype of the other or both are subtypes of the same base type. • Both are arrays, with the same numbers and types of elements in each dimension. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 16

Department of Computer Science Coercion I • When an expression of one type is used in a context where a different type is expected, one normally gets a type error. • But what about: var a : integer; b, c : real; . . . c : = a + b; • Many languages allow things like this, and coerce an expression to be of the proper type. • Coercion can be based just on types of operands, or can take into account expected type from surrounding context as well. • Fortran has lots of coercion, all based on operand type. • C has lots of coercion, too, but with simpler rules: • All floats in expressions become doubles. • short int and char become int in expressions. • If necessary, precision is removed when assigning into LHS. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 17

Department of Computer Science Coercion II • In effect, coercion rules are a relaxation of type checking: • • Recent thought is that this is probably a bad idea. Languages such as Modula-2 and Ada do not permit coercions. C++, however, goes hog-wild with them. They're one of the hardest parts of the language to understand. • Make sure you understand the difference between: • Type conversions (explicit). • Type coercions (implicit). • Sometimes the word ‘cast’ is used for conversions (C is guilty here). CS 3304 Lecture 10: Simple Data Types (16 February 2012) 18

Department of Computer Science Overloading and Coercion • Overloading and coercion sometimes used to similar effect. • An overloaded name can refer to more than one object: the ambiguity resolved by the context. • Example: addition of numeric quantities: • Without coercion: both operands must be of the same type. • With coercion: if either operand is real, a floating-point addition. • If the operator is not overloaded, the conversion from integer is always required: overhead. • In most languages literal constants or the null pointer can be intermixed in expressions with values of many types. • More commonly, constants are treated as a special case in the language’s type-checking rules. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 19

Department of Computer Science Universal Reference Types • Used for systems programming or for general-purpose container objects: • C, C++: void *. • Clu: any. • Java: Object. • Arbitrary l-values can be assigned into an object of universal reference type with no concern about type safety since the compiler will not allow any operation to be performed. • Problems with the assignment: • Make objects self-descriptive. • If not, no way to identify their type at run time. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 20

Department of Computer Science Type Inference • What determines the type of an overall expression? • • Arithmetic operator: the result has the same type as the operands. Comparison: usually Boolean. Function call: type declared in the function’s header. Assignment: the same type as the left-hand side. • Sometimes the answer is not obvious: • Subranges. • Composite objects. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 21

Department of Computer Science Subranges • One or more operands have subrange types. • Pascal: the result of any arithmetic operation on a subrange has the subrange’s base type. • If the result of an arithmetic operation is assigned into a variable of a subrange type: a dynamic semantic check may be required. • In languages like Ada, special significance of the arithmetic expression’s type in the header of a for loop. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 22

Department of Computer Science Composite Types • Some operators can be applied to values of composite types. • Example: • Character strings (Pascal, Ada). • Sets (Pascal, Modula). • ML type system: • Programmers have the option of declaring the types in these languages (more like traditional statically typed language). • Programmers may choose not to declare certain types. • ML-style type inference. CS 3304 Lecture 10: Simple Data Types (16 February 2012) 23

Department of Computer Science Summary • General issues of type systems and type checking. • A type system consists of a set of built-in types, a mechanism to define new types, and rules for type equivalence, type compatibility, and type inference. • Denotational, constructive, and abstraction-based points of view which regard types in terms of their values, their substructure, and the operations they support (respectively). CS 3304 Lecture 10: Simple Data Types (16 February 2012) 24