DATA STRUCTURE FILE PROCESSING OBJECTIVES TO IDENTIFY THE

DATA STRUCTURE & FILE PROCESSING

OBJECTIVES TO IDENTIFY THE DEFINITION OF DATA STRUCTURE n IDENTIFY DATA STRUCTURE TYPES n

OVERVIEW n n n A computer science is a way of storing data in a computer so that it can be used efficiently. Organization of mathematical & logical concepts of data. Allow the most efficient algorithm to be used. A well designed data structure allows a variety of critical operations to be performed using a few resources, both execution time & memory space. Different kinds of data structure are implemented by a programming language as data types & the references & operations they provide.

What data structure to use? Data structures let the input and output be represented in a way that can be handled efficiently and effectively. array Linked list tree queue stack

What is data structures n The way in which the data is organized affects the performance of a program for different tasks.

LISTS OF DATA STRUCTURES n Linear Data Structure Array – dynamic, sparse, matrix, etc. ¨ Linked List – unrolled, doubly, Vlist, etc. ¨ Associate array – hash, etc. ¨ n Non-linear data structure Graph data structure – scene graph, binary tree, heap, etc. ¨ Tree data structure – search tree, syntax, etc. ¨

BASIC TERMINOLOGY n n n Data – values @ sets of values. Data items – a single unit of values. Group items – data items that are divided into sub-items. Elementary items – data items that are not divided into sub-items. Example 1: an employee’s name may be divided into 3 sub-items – first name, middle initial & last name Example 2: IC no. would normally treated as a single items

BASIC TERMINOLOGY (cont…) n n n Data organized into a hierarchy of fields, records & files. Entity – something that has certain attributes @ properties which may be assigned values. Information is sometimes used for data with given attributes @ in other words, meaningful @ processed data. Fields – a single elementary unit of information representing an attribute of an entity. Records – collection of field values of a given entity. File – collection of records of the entities in a given entity set.

BASIC TERMINOLOGY (cont…) Attributes Name : Values: Age Mary, Jane 34 Sex IC Number F 760101115643

ARRAYS n n n The simplest type of data structure. Using a linear array, we mean a list of a finite number n of similar data elements referenced respectively by a set of n consecutive numbers. If we choose the name A for the array, then the elements of A are denoted by subscript notation: a 1, a 2, a 3, …… , an ¨ A(1), A(2), A(3), …… , A(N) ¨ A[1], A[2], A[3], …… , A[N] ¨

ARRAYS (cont…) n A linear array STUDENT consisting of the names of six students is pictured in below figure. n Linear arrays are called one. STUDENT dimensional arrays because each 1 Ahmad element in such an array is 2 Juliana referenced by one subscript. 3 Irfan n 2 -dimensional array is a collection of similar data elements where 4 Ismail each element is referenced by 2 5 Fatimah subscript. 6 Mohamad

ARRAYS (example…) n n n A chain of 28 stores, each store having 4 departments, may list its weekly sales as in the beside figure. Such data can be stored in the computer using 2 -dimensional array in which the 1 st subscript denotes the store & the 2 nd subscript the department. The size of array is denoted by 28 x 4 Store 1 2872 805 2 2196 1223 2525 1744 3 3257 1017 3686 1951 ……. 28 2618 3 4 3211 1560 …… …… …. . 931 2333 982

LINKED LISTS n n It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes. benefit of a linked list over a conventional array is that the order of the linked items may be different from the order that the data items are stored in memory or on disk, allowing the list of items to be traversed in a different order. A linked list is a self-referential data type because it contains a pointer or link to another datum of the same type. Linked lists permit insertion and removal of nodes at any point in the list in constant time, but do not allow random access.

LINKED LISTS (example…) n n Suppose a brokerage firm maintains a file where each record contains a customer’s name & salesperson. The file contains the data appearing in figure. Another way of storing data in the figure is to have to separate array for salesperson & an entry (called a pointer). An integer used as a pointer requires less space than a name; hence this representation saves space. Customer Salesperson 1 Adams Smith 2 Brown Ray 3 Clark Jones 4 Drew Ray 5 Evans Smith 6 Farmer Jones 7 Geller Ray 8 Hill Smith 9 Infeld Ray

LINKED LISTS (example…) Customer Salesperson 1 Adams Smith 2 Brown Ray 3 Clark Jones 4 Drew Ray 5 Evans Smith 6 Farmer Jones 7 Geller Ray 8 Hill Smith 9 Infeld Ray Salesperson Jones Ray Smith

LINKED LISTS (example…) Salesperson Pointer 1 Jones 3, 6 2 Ray 2, 4, 7, 9 3 Smith 1, 5, 8

LINKED LISTS (example…) Customer Link Salesperson Pointer 1 Adams 5 Jones 3 2 Brown 4 Ray 2 3 Clark 6 Smith 1 4 Drew 7 5 Evans 8 6 Farmer 0 7 Geller 9 8 Hill 0 9 Infeld 0

TREES n n n Data frequently contain a hierarchical relationship between various elements. The data structure which reflects this r’ship is called a rooted tree graph @ tree. Kind of trees - record structure & algebraic expression

Record Structure n Consider the Employee records that contain the following items: IC no. , Name, Address, Age, Salary, Dependents

Record Structure Employee IC Name First Last Address Street City Age Area State Postcode Salary Dependent

Record Structure n Another way of picturing such a tree structure is in terms of levels 01 Employee 02 IC Number 02 Name 02 03 First 03 Last 03 Street 03 Area Address 02 Age 02 Salary 02 Dependent s 04 City 04 State 04 Postcod e

Algebraic Expression n Consider the following algebraic expression: ¨ n 3 (2 x + y)(a – 7 b) Using vertical arrow ( ) for exponentiation and an asterisk (*) for multiplication.

Algebraic Expression * + y * 2 x a 3 * 7 b

STACK n n n Also called a last-in-first-out (LIFO) system, is a linear list in which insertions & deletions can take place only at one end, called the top. Stacks are used extensively at every level of a modern computer system. For example, a modern PC uses stacks at the architecture level, which are used in the basic design of an operating system for interrupt handling and operating system function calls. Among other uses, stacks are used to run a Java Virtual Machine, and the Java language itself has a class called "Stack", which can be used by the programmer.

QUEUE n Also called a first-in-first-out (FIFO) system, is a linear list in which deletions can take place only at one end of the list, the front of the list and insertions can take place only at the other end of the list, the rear of the list.

GRAPH n n Data sometimes contain a relationship between pairs of elements which is not necessarily hierarchical in nature. For example, suppose an airline flies only between the cities connected by lines. The data structure which reflects this type of r’ship is called a graph.

DATA STRUCTURE OPERATIONS n n n Traversing: accessing each record exactly once so that certain items in the record may be processed. Searching: finding the location of the record with a given key value @ finding the locations of all records which satisfy one or more condition. Inserting: adding a new record to the structure. Deleting: removing a record from the structure. Sorting: arranging the records in some logical order. Merging: combining the records in 2 different sorted files into a single sorted file.

ALGORITHMS n An algorithm is a well-defined list of steps for solving a particular problem.
- Slides: 28