Data Structures Fundamental Data Storage Data Structures For

Data Structures • For sizeable programs, one problem that can quickly arise is that

Data Structures • For some tasks, it is helpful (at minimum) and possibly necessary

Data Structures • Note: while we have seen these in passing and as examples

Arrays • Possibly the most basic non-trivial data storage structure is that of the

Beyond Arrays • Note that the main structure being implemented by an array is

Beyond Arrays • In Java, there is an Array. List class in the java.

Beyond Arrays • In C++, there is a vector class as part of the

Beyond Arrays • However, arrays are not the only way to model a list.

Linked Lists • The linked list stores each data element separately and individually, allocating

Linked Lists • Adding data to the end of a linked list is trivial,

Linked Lists • Adding data in the middle of the list, or at its

Adding Elements • Remember that for an array, elements are in fixed locations. •

Adding Elements 13 42 3 8 1 2 4 0 1 2 3 4

Adding Elements • For a linked list, however, each element’s storage space is distinct

Linked Lists • Naturally, there is the question of what these “links of the

Linked Lists • In their most basic and simple form… template <typename T> class

Linked Lists template <typename T> class Node<T> { public: T value; Node<T>* next; }

Linked Lists Remember – objects are handled by reference, so the class Node<T> doesn’t

Linked Lists The end of the “linked list chain” is denoted by a null

Lists • Note that we now have two different ways of storing data, each

Lists • Note that both of these objects fulfill the same end goal –

Templates • Templates are integral to generic programming in C++ – Template is like

Why Templates? What is the difference between the following two functions? int compare(const string

Why Templates? What if we could write the function once for any type and

Exercise 1 • Implement the generic compare function • Implement a main() that compares

What is Going On? • Compiler sees structure when template is defined, blueprint when

Abstracting Beyond Lists • We have this notion of a “list” structure, which maps

The Iterator • Many programming languages provide objects called iterators for enumerating objects contained

The Iterator • This iterator may be used to get each contained object in

The Iterator • Example code: vector<int> numbers; // omitted code initializing numbers. iterator<int> iter;

The Iterator • In C++, iterators are designed to look like and act something

The Iterator vector<int> numbers; // omitted code initializing numbers. iterator<int> iter; for(iter = numbers.

The Iterator • C++11 (the newest edition/standard) also provides an alternate version of the

The Iterator • Both the std: : vector and std: : list classes of

Exercise 2 • Include <iterator> header • Use iterator to walk through an array

Abstracting Beyond Lists • There are many, many other techniques for storing data than

Other Data Structures • Let’s move on from this idea of a “list” structure.

Other Data Structures • There are many, many other techniques for storing data than

Other Data Structures • A first example: arrays index their contained objects by integers.

Maps • The interface built on this idea within Java is the Map. •

Maps • The classes built on this idea within C++ are map and unordered_map.

Maps • How would such a map work? – We could just use matching

Hash Maps • Hash maps work by converting the key to a unique integer,

Hash Maps • This “hash code” is then mapped into an array for storage.

Hash Maps New input: (“Football”, “Will”) hash(“Football”) -2070369658 mod 7 0 i 0 1

Hash Maps New input: (“Basketball”, “Billy”) hash(“Horton”) -2127646392 mod 7 -4 => 3 i

Hash Maps New input: (“Gymnastics”, “Rhonda”) hash(“Gymnastics”) 2068792 mod 7 5 i 0 1

Hash Maps New input: (“Soccer”, “Becky”) hash(“Soccer”) -2026118662 mod 7 -1 => 6 i

Hash Maps • Pros: – direct, instant lookup of values, regardless of the key’s

$Map Example #include <map> #include <iterator> main() { map<string, size_t> wordcount; String word; while$

Exercise 3 • Include <map> header • Use unordered map – to store >=

Maps • What if we want to have the entries sorted by their keys?

Binary Tree • The binary tree is an example of one structure that can

Binary Tree • The corresponding Java structure is the Tree. Map class. – It

Binary Tree • The corresponding C++ structure, on the other hand, is the std:

Binary Tree • The “first” node of the tree is called the root. –

Binary Tree • Binary trees require the ability to compare the keys – C++

Binary Tree • Of particular note with binary trees – operations on them tend

Binary Tree • Pros: – the items are always in an established, sorted order!

Questions? • You have already implemented trees

Input/Output Modeling • Certain other structures exist to model specialized, restricted input and output

Stacks • The data structure known as a stack is a “Last In, First

Stacks • Stacks are a very good model for function calls. – When function

Stacks • Stacks are a very good model for function calls. – In fact,

Stacks • When debugging, the stack trace (or call stack) of a program at

Stacks + Math • Stacks have often been used in mathematical operations. – Some

Stacks + Math • Let’s consider the following mathematical expression: 2+5*7– 6/3 • In

Stacks + Math • Using the standard order of operations, this becomes: 2 +

Stacks + Math 257*+63/- • Let’s see how this facilitates getting the right answer.

Stacks + Math 257*+63/- 7 2 5 5 35 2 2 2 6 37

Stacks + Math 257*+63/37 6 3 / - 3 37 6 6 2 37

Stacks + Math • Math done in “standard” (i. e, infix notation) is typically

Stacks • C++ provides the std: : stack class. – This implementation is something

Questions? • Home exercise – implement and use a stack

Queues • The data structure known as a queue is a “First In, First

Queues • Queues are significantly like lists, except that we have additional restrictions placed

Queues • In C++, the queue class is provided. – This implementation is also

Stacks + Queues • The “deque”, or double-ended queue, combines the behaviors of stacks

Deques • C++ defines the deque class for such uses. – This is a

Questions? • Home exercise – implement and use a queue and a deque

Slides: 92

Download presentation

Data Structures Fundamental Data Storage

Data Structures • For sizeable programs, one problem that can quickly arise is that of data storage. – What is the most efficient or effective way to organize and utilize information within a program? – Quick answer – it depends on the task.

Data Structures • For some tasks, it is helpful (at minimum) and possibly necessary to have sorted data. • For other tasks, it is not necessary to note where any given piece of data is stored within a storage data structure.

Data Structures • Note: while we have seen these in passing and as examples earlier in the course, we will now examine these a little more closely.

Arrays • Possibly the most basic non-trivial data storage structure is that of the array. – We’ve already seen the notion of a “vector” that dynamically resizes. 0 1 2 3 4 5 6 7 8 9

Beyond Arrays • Note that the main structure being implemented by an array is effectively that of an ordered list. – Just like with an array, each element being stored has a specific location, which implies an ordering. 0 1 2 3 4 5 6 7 8 9

Beyond Arrays • In Java, there is an Array. List class in the java. util. * package. – This class internally uses an array and resizes it when necessary as new items are added to the conceptual underlying list. – This resizing is handled internally and automatically by the class.

Beyond Arrays • In C++, there is a vector class as part of the std namespace. – Likewise, this class internally uses an array and resizes it when necessary as new items are added to the conceptual underlying list. – This resizing is also handled internally and automatically by the class.

Beyond Arrays • However, arrays are not the only way to model a list. – Another such model is that of the linked list. (See the graphic below. )

Linked Lists • The linked list stores each data element separately and individually, allocating space for new elements whenever as they are added into the list.

Linked Lists • Adding data to the end of a linked list is trivial, as it (usually) also is for an array.

Linked Lists • Adding data in the middle of the list, or at its beginning, is (relatively) very time-consuming for an array. • For a linked list, however, it is often a much simpler operation.

Adding Elements • Remember that for an array, elements are in fixed locations. • To insert an element into the middle of an array requires moving all elements at and after the point of insertion, e. g. , insert 7 at index 3. 3 8 1 2 4 0 1 2 3 4 13 42 5 6 9 5 7 8 9

Adding Elements 13 42 3 8 1 2 4 0 1 2 3 4 5 3 8 1 7 2 4 0 1 2 3 4 5 6 9 5 7 8 9 9 5 8 9 13 42 6 7

Adding Elements • For a linked list, however, each element’s storage space is distinct and separate from the others. • New storage may be placed directly in the middle of the chain.

Adding Elements

Linked Lists • Naturally, there is the question of what these “links of the chain” actually are, or more properly, how to represent them.

Linked Lists • In their most basic and simple form… template <typename T> class Node<T> { public: T value; Node<T>* next; }

Linked Lists template <typename T> class Node<T> { public: T value; Node<T>* next; } value next

Linked Lists Remember – objects are handled by reference, so the class Node<T> doesn’t actually contain another Node<T> – just a reference to the next one in line.

Linked Lists The end of the “linked list chain” is denoted by a null reference in the last node. The “ground” symbol at the end denotes this.

Lists • Note that we now have two different ways of storing data, each of which has its own pros and cons. – Arrays • Good for adding items to the end of lists and for random access to items within the list. • Bad for cases with many additions and removals at various places within the list.

Lists • Note that we now have two different ways of storing data, each of which has its own pros and cons. – Linked Lists • Better for adding and removing items at random locations within the list. • Bad at randomly accessing items from the list. – Note that to use a random item within the list, we must traverse the chain to find it.

Lists • Note that both of these objects fulfill the same end goal – to represent a group of objects with some implied ordering upon them. • While they meet this goal differently, their primary purpose is identical.

Templates • Templates are integral to generic programming in C++ – Template is like a blueprint – Blueprint is used to instantiate function when it is actually used in code – “Actual” types are substituted in for the “formal” types of the template

Why Templates? What is the difference between the following two functions? int compare(const string &v 1, const string &v 2) { if (v 1 < v 2) return -1; if (v 2 < v 1) return 1; return 0; } int compare(const double &v 1, const double &v 2) { if (v 1 < v 2) return -1; if (v 2 < v 1) return 1; return 0; } Only the types!

Why Templates? What if we could write the function once for any type and have the compiler just use the right types? template <typename T> int compare(const T &v 1, const T &v 2) { if (v 1 < v 2) return -1; if (v 2 < v 1) return 1; return 0; } Requires type T to have < operator

Exercise 1 • Implement the generic compare function • Implement a main() that compares two doubles, two ints, two chars, and two strings using the compare fcn. • Compile and see that it is good!

What is Going On? • Compiler sees structure when template is defined, blueprint when generic function is coded (in header) • When call to function is seen, compiler substitutes types used in invocation into blueprint and generates required code • Can’t catch many errors until invocation is seen

Abstracting Beyond Lists • We have this notion of a “list” structure, which maps its stored objects to indices. – What if we don’t actually need to have a lookup position for our stored objects? • But wait! How could we possibly iterate over the objects in a for loop?

The Iterator • Many programming languages provide objects called iterators for enumerating objects contained within data structures – C++ and Java are no exceptions – C++’s versions are defined in the <iterator> header file – (see 3. 4 – 3. 5)

The Iterator • This iterator may be used to get each contained object in order, one at a time, in a controllable manner. – It’s especially designed to work well with for loops.

The Iterator • Example code: vector<int> numbers; // omitted code initializing numbers. iterator<int> iter; for(iter = numbers. begin(); iter != numbers. end(); iter++) { cout << *iter << ‘ ’; }

The Iterator • In C++, iterators are designed to look like and act something like pointers. – The * and -> operators are overloaded to give pointer-like semantics, allowing users of the iterator object to “dereference” the object currently “referenced” by the iterator.

The Iterator • In C++, iterators are designed to look like and act something like pointers. – Note the use of operator ++ to increment the iterator to the next item • This is another way we can interact with pointers; it’s useful for iterating across an array while using pointer semantics… but keep a copy of the original around!

The Iterator vector<int> numbers; // omitted code initializing numbers. iterator<int> iter; for(iter = numbers. begin(); iter != numbers. end(); iter++) { cout << *iter << ‘ ’; }

The Iterator • C++11 (the newest edition/standard) also provides an alternate version of the for-loop which is designed to work with iterable structures and iterators • Looks like “foreach” in other languages vector<Person> structure; for(Person &p: structure) { //Code. }

The Iterator • Both the std: : vector and std: : list classes of C++ implement iterators. – begin() returns an iterator to the list’s first element – end() is a special iterator “just after” the final element of the list, useful for checking when we’re done with iteration – Use != to check for termination

Exercise 2 • Include <iterator> header • Use iterator to walk through an array you define and print out its contents • Compile and run • See that it is good

Abstracting Beyond Lists • There are many, many other techniques for storing data than the model of a list. – Such other data structures have different techniques for accessing stored data. – You have seen one in your lab exercises

Other Data Structures • Let’s move on from this idea of a “list” structure. • In particular, note how lists map their stored objects to indices (or can map an index to the stored object) – What if we don’t actually need to have a lookup position for our stored objects? – In particular, does it really need to be an integer?

Other Data Structures • There are many, many other techniques for storing data than the model of a list. – Such other data structures have different techniques for accessing and handling stored data. – These “different techniques” are often designed with a focus on different usage patterns.

Other Data Structures • A first example: arrays index their contained objects by integers. – Should integers be the only thing by which we can index an item within a collection-oriented data structure? – Think up some examples with neighbors apple bear A 113 42 cake blue red …

Maps • The interface built on this idea within Java is the Map. • Tree. Map and Hash. Map are the two prominent implementations. – The value is the object being stored within the map. – The key is the data element used as an index into the map for that value (i. e. , how you “look up” the value) – Key is like key in a database, sometimes call “tag” in associative memory

Maps • The classes built on this idea within C++ are map and unordered_map. • Sidenote – these are also not polymorphically related. – Map stores items in order of keys – Unordered map does not require keys to have order relation at all!

Maps • How would such a map work? – We could just use matching arrays for the keys and values. – However, this wouldn’t be the most efficient idea – better techniques are known.

Hash Maps • Hash maps work by converting the key to a unique integer, where possible, through a hashing function. – C++: hash maps are represented by unordered_map. – The selection of such a function is not a simple operation. • As such, the constructor takes in a hashing function as an argument, mapping each key to a nearly-unique integer.

Hash Maps • This “hash code” is then mapped into an array for storage. – Problem: the “hash code” can easily be larger than the storage array’s size. – Solution: modular arithmetic. Divide by the array’s size and use the remainder.

Hash Maps New input: (“Football”, “Will”) hash(“Football”) -2070369658 mod 7 0 i 0 1 2 3 4 5 6 Key Value “Football” “Will”

Hash Maps New input: (“Basketball”, “Billy”) hash(“Horton”) -2127646392 mod 7 -4 => 3 i 0 1 2 3 4 5 6 Key Value “Football” “Will” “Basketball” “Billy”

Hash Maps New input: (“Gymnastics”, “Rhonda”) hash(“Gymnastics”) 2068792 mod 7 5 i 0 1 2 3 4 5 6 Key Value “Football” “Will” “Basketball” “Billy” “Gymnastics” “Rhonda”

Hash Maps New input: (“Soccer”, “Becky”) hash(“Soccer”) -2026118662 mod 7 -1 => 6 i 0 1 2 3 4 5 6 Key Value “Football” “Will” “Basketball” “Billy” “Gymnastics” “Rhonda” “Soccer” “Becky”

Hash Maps • Pros: – direct, instant lookup of values, regardless of the key’s type. • Cons: – does not support sorting – requires a specialized hashing function for keys that creates a unique int for each possible key.

$Map Example #include <map> #include <iterator> main() { map<string, size_t> wordcount; String word; while$

Map Example #include <map> #include <iterator> main() { map<string, size_t> wordcount; String word; while (cin >> word) { ++word_count[word]; // use map to look up value } for (const auto &w : word_count) { // iterator cout << w. first << “ occurs ” << w. second << ((w. second > 1) ? “ times ” : “ time ”) << endl; } exit 0; }

Exercise 3 • Include <map> header • Use unordered map – to store >= four <key, value> pairs – your choice – Look up values based on keys and print – Or code up previous example • Compile and run • See that it is good

Maps • What if we want to have the entries sorted by their keys? – It is possible to build structures that efficiently keep their data permanently sorted by key!

Binary Tree • The binary tree is an example of one structure that can accomplish this. – Think of it as a linked list, but with two links per node instead of one.

Binary Tree • The corresponding Java structure is the Tree. Map class. – It implements the Sorted. Map interface.

Binary Tree • The corresponding C++ structure, on the other hand, is the std: : map class.

Binary Tree • The “first” node of the tree is called the root. – Any key smaller than the root’s key is in the left branch. – Any key larger than the root’s key is in the right branch.

Binary Tree root 13 7 2 25 9 17 42

Binary Tree • Binary trees require the ability to compare the keys – C++ assumes that operator< has been overloaded for custom data types

Binary Tree • Of particular note with binary trees – operations on them tend to be highly recursive due to their structure. – You’ve done this in lab – twice now!

Binary Tree • Pros: – the items are always in an established, sorted order! (By key) • Pro/Con: – accesses are slower than an unordered_map, but generally faster than a list.

Questions? • You have already implemented trees

Input/Output Modeling • Certain other structures exist to model specialized, restricted input and output behavior. – Consider the usual interaction someone might have with a stack of papers. – Another possibility: the usual behavior of a group of people waiting in line… in a queue waiting to be served.

Stacks • The data structure known as a stack is a “Last In, First Out” (LIFO) structure. – That is, the last input to the structure is the first output obtained from it. – Consider a stack of papers – when searching through it, one typically starts at the top and searches downward, from newest to oldest.

Stacks c a d b b b a a a

Stacks • Stacks are a very good model for function calls. – When function A calls function B, B must complete before A resumes operation. • Similarly, if B calls C, C completes before B. – A may then call other methods before completing.

Stacks c a d b b b a a a

Stacks • Stacks are a very good model for function calls. – In fact, this is one reason why we’re examining it now. Stacks are the model of how recursion mechanically works. – In turn, recursion is necessary for operating upon many data structures.

Stacks • When debugging, the stack trace (or call stack) of a program at a given point of execution is exactly this – a description of the order of active method calls within the program. • The area of memory where function data lives is literally called the stack space.

Stacks + Math • Stacks have often been used in mathematical operations. – Some graphing calculators use what is called “Reverse Polish Notation” (RPN), which is based upon postfix operators. – Combined with a stack, this notation is much easier to program for than infix operations.

Stacks + Math • Let’s consider the following mathematical expression: 2+5*7– 6/3 • In what order do we perform the operations? – Consider trying to code something that would be able to interpret this!

Stacks + Math • Using the standard order of operations, this becomes: 2 + (5 * 7) – (6 / 3) • The postfix notation for this: 257*+63/((2 (5 7 *) +) (6 3 /) -)

Stacks + Math 2 + (5 * 7) – (6 / 3) 2 + (35) – (2) 37 – 2 35

Stacks + Math 257*+63/- • Let’s see how this facilitates getting the right answer.

Stacks + Math 257*+63/- 7 2 5 5 35 2 2 2 6 37 37

Stacks + Math 2 + (5 * 7) – (6 / 3) 2 + (35) – (6 / 3) 37 – (6 / 3)

Stacks + Math 257*+63/37 6 3 / - 3 37 6 6 2 37 37 37 35

Stacks + Math 2 + (5 * 7) – (6 / 3) 2 + (35) – (6 / 3) 37 – 2 35

Stacks + Math • Math done in “standard” (i. e, infix notation) is typically first converted to postfix notation for actual computation. – This “conversion” is known as the Shunting-yard algorithm. It’s up on Wikipedia, so feel free to take a look.

Stacks • C++ provides the std: : stack class. – This implementation is something of a “wrapper class” that uses a vector, list, or deque internally, limiting it to stack-like behavior. • We’ll see deques in a moment. • The methods push_back(), pop_back(), and back() are designed from a stack perspective.

Questions? • Home exercise – implement and use a stack

Queues • The data structure known as a queue is a “First In, First Out” (FIFO) structure. – That is, the first input to the structure is the first output obtained from it. – Consider a line of people – the person in front has priority to whatever the line is waiting on… like buying tickets at the movies or gaining access to a sports event.

Queues • Queues are significantly like lists, except that we have additional restrictions placed on them. – Additions may only happen at the list’s end. – Removals may only happen at the list’s beginning. • As a result, standard array-based behavior may not be optimal.

Queues a a a b a b c c

Queues • In C++, the queue class is provided. – This implementation is also something of a “wrapper class” that uses a list, or deque internally, limiting it to queue-like behavior. • list works well as a queue, as linked-lists can easily be altered from both ends.

Stacks + Queues • The “deque”, or double-ended queue, combines the behaviors of stacks and queues into a single structure. – Items may be added or removed at either end of the structure. – This allows for either LIFO or FIFO behavior – it’s all in how you use the structure. • Mixed behavior is also possible, so beware!

Deques • C++ defines the deque class for such uses. – This is a full-fledged object in its own right, and is array-based. • It may use multiple arrays and modular arithmetic, to allow efficient additions at the front for example. – It is the default object used internally by both stack and queue.

Questions? • Home exercise – implement and use a queue and a deque