HASH MAPS AN INTRO TO BIGO NOTATION A
HASH MAPS
AN INTRO TO BIG-O NOTATION • A way to classify how efficient an algorithm is, without worrying about CPU architecture • Generally indicates what “class” of function caps the time for the algorithm to run, as a function of n: the number of inputs. • For example, if your algorithm behaves like this: 0. 5 n seconds computation time (s) n • It would be considered O(n) because the running time is less than 0. 5*n.
AN INTRO TO BIG-O NOTATION • But if your algorithm behaves like this: 4*log(n) seconds computation time (s) n • This one would be considered O(log n) since it’s bounded by a logarithmic function • Note that the actual time taken is irrelevant – the growth as a function of n is what’s important • Eliminates Moore’s-Law effect
COMMON BIG-O CLASSES • Here are some common big-O classes (best to worst) • O(1) • • O(log n) O(n 2) O(n 3) O(2 n) O(n!) # Note: the “constant factor” could be high # (e. g. the alg could take 1 mil hours, but if it does # that regardless of input, it’s O(1) These algorithms are considered computationally infeasible for any but small values of n • Good reference: http: //bigocheatsheet. com/
BIG-O ANALYSIS FOR SOME ALGORITHMS WE’VE SEEN • Finding an element in an Array. List: O(______) • Adding an element to the head of a LList: O(______) • Finding the size of a LList: • If we store and update a size attribute: O(______) • If we have to calculate it at run-time: O(______) • Bubble-Sort: O(______)
PRELUDE: HASH CODES • a mapping: hash(object) => integer • Requirements: • Must be deterministic (same input = same output) • Goal (not strictly necessary) • Limited clustering • Limited collisions • Works with any data type • In this lab, we’ll be lazy • Every object in Java (but primitives) has a. hash. Code() function • Meets: Deterministic, and works with any data types • Very poor at clustering and collisions
A TASTE OF “REAL” HASH CODES • (not part of this lab, but maybe the quiz…) • MD 5 algorithm (https: //en. wikipedia. org/wiki/MD 5) • Originally for cryptography – it’s broken • Still used to determine integrity of a downloaded file. • CRC 32 algorithm • Part of MPEG-2 compression • Used in “string-tables” in games • …
HASH MAPS • Like python dictionaries • Can have an arbitrary type for both key and value • Note: This will requires a double-generic in Java. d = {“bob”: 5, “jane”: 3} print(d[“bob”]) # 5 d[“sue”] = 9 # New item d[“bob”] = 55 # Item-replacement e = {} e[(4, 3)] = pygame. image. load(“beep. bmp”) e[(2, 9)] = pygame. image. load(“potato. jpg”) screen. blit(e[(4, 3)], …) • Also known as: • associative arrays • maps (here and in C++)
INITIAL PLANNING • Discuss double-generics • <K, V>: K = key-type, V = value-type • It might make sense to create a protected Pair class, similar to: public class Hash. Map<K, V> protected class Pair { protected K m. Key; protected V m. Value; }
HASH CODE => MAPPING • Upon creation, we allocate a table with many slots • How many? • Load Factor: how many slots are used / slots available • IMO ~ 70% is the maximum you want – else we need to expand. • These “slots” are the hash table • We need a table that allows fast random access and that don’t grow automatically. • normal arrays • java. util. Vector (a little like Array. List, but doesn’t grow automatically) • To map an item… • index = hash. Code(item) % table. Size • Used when: • adding an item • checking to see if an item is stored in the array • removing an item • See the catch?
THE CATCH • If you had a perfect hash function, this wouldn’t be necessary… • …but there is no perfect hash function • Collisions • How do we handle them? • Many ways…we’ll explore two • Circular Linear probing • Buckets
LINEAR CIRCULAR PROBING VS. BUCKETS • Internal representation • Probing public class Hash. Map<K, V> { // Pair class here, as discussed. protected Pair[] m. Table; //… } • Buckets public class Hash. Map<K, V> { // Pair class here, as discussed. protected Linked. List<Pair>[] m. Table; //… }
WHEN A COLLISION HAPPENS… • In probing • Look at successively higher indices, wrapping around. • If we find a null-spot, the item isn’t there. • In buckets • Just iterate through the linked list. • So which is better? • • • Probing uses a little less memory The iterating logic is simpler with Probing The collision logic is a little harder with Probing. Remove method is a bit harder with Probing. As long as we don’t have a ton of collisions, big-O time is the same.
ADD / FIND METHODS • Super-simple. • Hash to an index. • If the key is already there (in bucket or by probing), • Return the value (get) • Replace the value (add) • If not, • Return null / error (get) • Add a new key and value (add)
REMOVE METHOD • Simple for buckets. • Just a *little* harder for probing. • Example: • • • Items “Bob”, “Sue”, and “Jane” all hashed to index 15 We ended up placing “Sue” and “Jane” at index 16, 17. The user removes “Bob”, we have a null there now. The user searches for “Sue” – the hash still takes us to 15. Nothing there, so we report “Sue” isn’t in the table. • Fix: • Remove and re-add all items after initial hash (until we find a null) • Seems a little wasteful, but we should (if our hash function is good and table isn’t too full) have to only look at a handful.
GROWING • If a Hash. Table exceeds its load factor, we should grow. • Otherwise, we’ll get more collisions • It is *very* important that you manually do this. • If the internal table grows automatically, bad things will happen • [Do you see what? ] • After the table does grow, you *must* re-hash *everything*.
BIG-O OPERATIONS OF HASH-TABLE • Has these operations and big-O running time: • • add find remove iteration O(1) O(m) • m is the capacity of the hash • n is the number of things in the hash. n << m • So is it the data structure to rule all other data structures? • …No • It is very useful (e. g. graphs) • Trades large amounts of memory for speed increase • Side-note: Lua (the programming language) uses associative arrays as the only data structure (no lists, classes, etc. )
STARTING LAB 6 • [Write an interface class together] • You implement this is a Bucket or LList version (or both)
RELATED D. S. : SETS • Goal: an unordered set, no duplicates • Already done! • Hash. Map<String, Boolean> my_set; • Could use anything for second type – we don’t care! • my_set. put(“Bob”, true); • We could probably wrap this: • Using “has-a” or “is-a”
- Slides: 19