Abstract Data Type C and Data Structures Baojian
Abstract Data Type C and Data Structures Baojian Hua bjhua@ustc. edu. cn
Data Types n A data type consists of: n n n A collection of data elements (a type) A set of operations on these data elements Data types in languages: n predefined: n n n any language defines a group of predefined data types C e. g. : int, char, float, double, … user-defined: n n allow programmers to define their own (new) data types C e. g. : struct, union, …
Data Type Examples n Predefined: n n type: int elements: …, -2, -1, 0, 1, 2, … operations: +, -, *, /, %, … User-defined: n n n type: complex elements: 1+3 i, -5+8 i, … operations: new, add, sub, distance, …
Concrete Data Types (CDT) n An concrete data type: n n both concrete representations and their operations are available Almost all C predefined types are CDT n n n For instance, “int” is a 32 -bit double-word, and +, -, … Knowing this can do dirty hacks See demo…
Abstract Data Types (ADT) n An abstract data type: n n n separates data type declaration from representation separates function declaration (prototypes) from implementation (definitions) A language must some form of mechanism to support ADT n n n interfaces in Java signatures in ML (roughly) header files & typedef in C
Case Study n Suppose we’d design a new data type to represent complex number c: n n n a data type “complex” elements: 3+4 i, -5 -8 i, … operations: n n new, add, sub, distance, … How to represent this data type in C (CDT, ADT or …)?
Complex Number // Recall the definition of a complex number c: c = x + yi, where x, y in R, and i=sqrt(-1); // Some typical operations: complex Complex_new (double x, double y); complex Complex_add (complex c 1, complex c 2); complex Complex_sub (complex c 1, complex c 2); complex Complex_mult (complex c 1, complex c 2); complex Complex_divide (complex c 1, complex c 2); // Next, we’d discuss several variants of rep’s: // CDT, ADT.
CDT of Complex: Interface—Types // In file “complex. h”: #ifndef COMPLEX_H #define COMPLEX_H struct Complex_t { double x; double y; }; typedef struct Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif
Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main. c”: #include “complex. h” int main () { Complex_t c 1, c 2, c 3; c 1 = Complex_new (3. 0, 4. 0); c 2 = Complex_new (7. 0, 6. 0); c 3 = Complex_add (c 1, c 2); Complex_output (c 3); return 0; Do we know c 1, c 2, c 3’s concrete representation? How?
CDT Complex: Implementation // In a file “complex. c”: #include “complex. h” Complex_t Complex_new (double x, double y) { Complex_t c = {. x = x, . y = y}; return c; } // other functions are similar. See Lab 1
Problem #1 int main () { Complex_t c; c = Complex_new (3. 0, 4. 0); // Want to do this: c = c + (5+i 6); // Ooooops, this is legal: c. x += 5; c. y += 6; return 0; }
Problem #2 #ifndef COMPLEX_H #define COMPLEX_H struct Complex_t { // change to a more fancy one? Anger “main”… double a[2]; }; typedef struct Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif
Problems with CDT? n Operations are transparent. n n n user code have no idea of the algorithm Good! Data representations dependence n Problem #1: Client code can access data directly n n n kick away the interface safe? Problem #2: make code rigid n easy to change or evolve?
ADT of Complex: Interface—Types // In file “complex. h”: #ifndef COMPLEX_H #define COMPLEX_H // note that “struct complex. Struct” not given typedef struct Complex_t *Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif
Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main. c”: #include “complex. h” int main () { Complex_t c 1, c 2, c 3; c 1 = Complex_new (3. 0, 4. 0); c 2 = Complex_new (7. 0, 6. 0); c 3 = Complex_add (c 1, c 2); Complex_output (c 3); return 0; Can we still know c 1, c 2, c 3’s concrete representation? Why?
ADT Complex: Implementation#1—Types // In a file “complex. c”: #include “complex. h” // We may choose to define complex type as: struct Complex_t { double x; double y; }; // which is hidden in implementation.
ADT Complex: Implementation Continued // In a file “complex. c”: #include “complex. h” Complex_t Complex_new (double x, double y) { Complex_t c; c = malloc (sizeof (*c)); c->x = x; c->y = y; return c; } // other functions are similar. See Lab 1
ADT Summary n Yes, that’s ADT! n n Algorithm is hidden Data representation is hidden n n client code can NOT access it thus, client code independent of the impl’ Interface and implementation Do Lab 1
Polymorphism n n To explain polymorphism, we start with a new data type “tuple” A tuple is of the form: (x, y) n n n x A, y B (aka: A*B) A, B may be unknown in advance and may be different E. g: n A=int, B=int: n n (2, 3), (4, 6), (9, 7), … A=char *, B=double: n (“Bob”, 145. 8), (“Alice”, 90. 5), …
Polymorphism n From the data type point of view, two types: n n operations: n n n A, B new (x, y); equals (t 1, t 2); first (t); second (t); … // create a new tuple with x and y // equality testing // get the first element of t // get the second element of t How to represent this type in computers (using C)?
Monomorphic Version n We start by studying a monomorphic tuple type called “int. Tuple”: n n n both the first and second components are of “int” type (2, 3), (8, 9), … The int. Tuple ADT: n n n type: int. Tuple elements: (2, 3), (8, 9), … Operations: n n n tuple new (int x, int y); int first (int t); int second (tuple t); int equals (tuple t 1, tuple t 2); …
“Int. Tuple” CDT // in a file “int-tuple. h” #ifndef INT_TUPLE_H #define INT_TUPLE_H struct Int. Tuple_t { int x; int y; }; typedef struct Int. Tuple_t; Int. Tuple_t Int. Tuple_new (int n 1, int n 2); int Int. Tuple_first (Int. Tuple_t t); … #endif
Or the “Int. Tuple” ADT // in a file “int-tuple. h” #ifndef INT_TUPLE_H #define INT_TUPLE_H typedef struct Int. Tuple_t *Int. Tuple_t; Int. Tuple_t Int. Tuple_new (int n 1, int n 2); int Int. Tuple_first (Int. Tuple_t t); int Int. Tuple_equals (Int. Tuple_t t 1, Int. Tuple_t t 2); … #endif // We only discuss “tuple. Equals ()”. All others // functions left to you.
Equality Testing // in a file “int-tuple. c” int Tuple_equals (Int. Tuple_t t 1, Int. Tuple_t t 2) { return ((t 1 ->x == t 2 ->x) && (t 1 ->y==t 2 ->y)); } t 1 x y t 2 x y
Problems? n n It’s ok if we only design “Int. Tuple” But we if we’ll design these tuples: n n (int, double), (int, char *), (double, double), … Same code exists everywhere, no means to maintain and evolve n n Nightmares for programmers Remember: never duplicate code!
Polymorphism n Now, we consider a polymorphic tuple type called “tuple”: n n “poly”: may take various forms Every element of the type “tuple” may be of different types (2, 3. 14), (“ 8”, ‘a’), (‘ ’, 99), … The “tuple” ADT: n n type: tuple elements: (2, 3. 14), (“ 8”, ‘a’), (‘ ’, 99), …
The Tuple ADT n What about operations? n n n tuple new (? ? ? x, ? ? ? y); ? ? ? first (tuple t); ? ? ? second (tuple t); int equals (tuple t 1, tuple t 2); …
Polymorphic Type n To resove this, C dedicates a special polymorphic type “void *” n “void *” is a pointer which can point to “any” concrete types (i. e. , it’s compatible with any pointer type), n n very poly… long history of practice, initially “char *” can not be used directly, use ugly cast similar to constructs in others language, such as “Object”
The Tuple ADT n What about operations? n n n tuple new. Tuple (void *x, void *y); void *first (tuple t); void *second (tuple t); int equals (tuple t 1, tuple t 2); …
“tuple” Interface // in a file “tuple. h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef struct Tuple_t * Tuple_t; Tuple_t Tuple_new (poly x, poly y); poly first (Tuple_t t); poly second (Tuple_t t); int equals (Tuple_t t 1, Tuple_t t 2); #endif TUPLE_H
Client Code // file “main. c” #include “tuple. h” int main () { int i = 8; Tuple_t t 1 = Tuple_new (&i, “hello”); return 0; }
“tuple” ADT Implementation // in a file “tuple. c” #include <stdlib. h> #include “tuple. h” t struct Tuple_t { poly x; poly y; }; Tuple_t Tuple_new (poly x, poly y) { tuple t = malloc (sizeof (*t)); t->x = x; t->y = y; return t; } x y
“tuple” ADT Implementation // in a file “tuple. c” #include <stdlib. h> #include “tuple. h” struct Tuple_t { poly x; poly y; }; poly Tuple_first (Tuple_t t) { return t->x; } t x y
Client Code #include “complex. h” #include “tuple. h” // ADT version int main () { int i = 8; Tuple_t t 1 = Tuple_new (&i, “hello”); // type cast int *p = (int *)Tuple_first (t 1); return 0; }
Equality Testing struct Tuple_t { poly x; poly y; }; t x y // The #1 try: int Tuple_equals (Tuple_t t 1, Tuple_t t 2) { return ((t 1 ->x == t 2 ->x) && (t 1 ->y == t 2 ->y)); // Wrong!! }
Equality Testing struct Tuple_t { poly x; poly y; }; t x y // The #2 try: int Tuple_equals (Tuple_t t 1, Tuple_t t 2) { return (*(t 1 ->x) == *(t 2 ->x) && *(t 1 ->y) == *(t 2 ->y)); // Problem? }
Equality Testing struct Tuple_t { poly x; poly y; }; t x y // The #3 try: int Tuple_equals (Tuple_t t 1, Tuple_t t 2) { return (equals. XXX (t 1 ->x, t 2 ->x) && equals. YYY (t 1 ->y, t 2 ->y)); // but what are “equals. XXX” and “equals. YYY”? }
Function as Arguments // So in the body of “equals” function, instead // of guessing the types of t->x and t->y, we // require the callers of “equals” supply the // necessary equality testing functions. // The #4 try: typedef int (*tf)(poly, poly); int Tuple_equals (tuple t 1, tuple t 2, tf eqx, tf eqy) { return (eqx (t 1 ->x, t 2 ->x) && eqy (t 1 ->y, t 2 ->y)); }
Change to “tuple” Interface // in file “tuple. h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef int (*tf)(poly, poly); typedef struct Tuple_t *Tuple_t; Tuple_t Tuple_new (poly x, poly y); poly Tuple_first (Tuple_t t); poly Tuple_second (Tuple_t t); int Tuple_equals (Tuple_t t 1, Tuple_t t 2, tf eqx, tf eqy); #endif TUPLE_H
Client Code // in file “main. c” #include “tuple. h” int main () { int i=8, j=8, k=7, m=7; Tuple_t t 1 = Tuple_new (&i, &k); Tuple_t t 2 = Tuple_new (&j, &k); Tuple_equals (t 1, t 2, Int_equals); return 0; }
Moral n void* serves as polymorphic type in C n n Pros: n n n mask all pointer types (think Object type in Java) code reuse: write once, used in arbitrary context we’d see more examples later in this course Cons: n Polymorphism doesn’t come for free n n n boxed data: data heap-allocated (to cope with void *) no static or runtime checking (at least in C) clumsy code n extra function pointer arguments
Function-Carrying Data n n Why we can NOT make use of data, such as passed as function arguments, when it’s of type “void *”? Better idea: n n Let data carry functions themselves, instead passing function pointers such kind of data called objects
Function Pointer in Data int Tuple_equals (Tuple_t t 1, Tuple_t t 2) { // note that if t 1 ->x or t 1 ->y has carried the // equality testing functions, then the code // could just be written as: return (t 1 ->x->equals (t 1 ->x, t 2 ->x) && t 1 ->y->equals (t 1 ->y, t 2 ->y)); equals } t 1 equals_x …… x y equals …… equals_y
Function Pointer in Data // To cope with this, we should modify other // modules. For instance, the “complex” ADT: struct Complex_t equals { n int (*equals) (poly, poly); double a[2]; x }; y Complex_t Complex_new (double x, double y) { Complex_t c = malloc (sizeof (*c)); c->equals = Complex_equals; …; return n; }
Function Call int Tuple_equals (Tuple_t t 1, Tuple_t t 2) { return (t 1 ->x->equals (t 1 ->x, t 2 ->x) && t 1 ->y->equals (t 1 ->y, t 2 ->y)); } equals t 1 x a[0] x y a[1] y t 2
Client Code // in file “main. c” #include “complex. h” #include “tuple. h” int main () { Complex_t c 1 = Complex_new (1. 0, 2. 0); Complex_t c 2 = Complex_new (1. 0, 2. 0); Tuple_t t 1 = Tuple_new (c 1, c 2); Tuple_t t 2 = Tuple_new (c 1, c 2); Tuple_equals (t 1, t 2); // dirty simple! return 0; } : -P
Object n Data elements with function pointers is the simplest form of objects n n object = virtual functions + private data With such facilities, we can in principal model object oriented programming n n In fact, early C++ compilers compiles to C That’s partly why I don’t love objectoriented languages
Summary n Abstract data types enable modular programming n n clear separation between interface and implementation should design and evolve together Polymorphism enables code reuse Object = data + function pointers
- Slides: 48