Compsci 201 Hashing Jeff Forbes February 7 2018
Compsci 201 Hashing Jeff Forbes February 7, 2018 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 1
G is for … • Garbage Collection • Nice to call new and not call delete! • Git • Version control that's so au courant • GPL • First open source license • Google • How to find Stack Overflow 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 2
Policy Reminder • Discussion and classwork • Generous allowance for missed work • Don’t STINF • Assignments • Late penalty. • Submit extension form for excused absence • Not accepted after 1 week • APTs • Do extra and you can do fewer in the future 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 3
Hashing: Log (10100) is big • Comparison-based searches are too slow for lots of data • How many comparisons needed for a billion elements? • What if one billion web-pages indexed? • Hashing is a search method: average case O(1) search • Worst case is very bad, but in practice hashing is good • Associate a number with every key, use the number to store the key • Like catalog in library, given book title, find the book • A hash function generates the number from the key • Goal: Efficient to calculate • Goal: Distributes keys evenly in hash table 2/7/18 Comp. Sci 201, Spring 2018, Hashiing
Hashing • Hash table • array of fixed size • with a key to each location • each key is mapped to an index in the table 0 1 joe 31 2 3 mary 43 4 sam 14 5 6 7 8 9 5
Hashing • hash. Code • Every object has a hash. Code • integer value • In our made-up example • Object – joe • hash. Code – 31 0 1 joe 31 2 3 mary 43 4 sam 14 5 6 • Could two different objects have the same hash. Code? 7 8 9 6
Hashing • Hash function • simple to compute 0 ali 73 Collision • example – hash. Code % (mod) 10 1 • Use hash function to calculate key to hash table joe 31 2 3 mary 43 4 sam 14 5 6 • Add key ali with hash. Code 73 • What happens? 7 8 9 7
Hashing details • • There will be collisions, two keys will hash to the same value • We must handle collisions, still have efficient search • What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? Several ways to handle collisions, in general array/vector used • Linear probing, look in next spot if not found • Hash to index hash. Code(key) = h, try h+1, h+2, …, wrap at end • Clustering problems, deletion problems, growing problems • Quadratic probing • Hash to index h, try h+12, h+22 , h+32 , …, wrap at end • Fewer clustering problems • Double hashing • Hash to index h, with another hash function to j • Try h, h+j, h+2 j, … 0 1 2 3 n-1
Chaining with hashing • With n buckets each bucket stores a structure (e. g. , List) • Compute hash value h, look up key in table[h] • How to store? • Low-level linked lists (up until Java 8) • Low-level binary search trees (Java 8+) • Hopefully linked data structure are short, searching is fast • Unsuccessful searches often faster than successful • Empty linked lists searched more quickly than non-empty • Potential problems? • Hash table details • Size of hash table should be a prime number • Keep load factor small: # keys/size of table • On average, with reasonable load factor, search is O(1) • What if load factor gets too high? Rehash or other method
Hashing • Two equal objects should hash to the same place (have the same hash code and key) mary 43 0 1 joe 31 2 3 mary 43 4 sam 14 5 6 jill 26 7 8 10/26/2020 9 sarah 58 10
String and Object. hash. Code • Every object has a. hash. Code method • Default version? Why does it work for Objects? • http: //bit. ly/201 fall 17 -object x. equals(y) x. hash. Code() == y. hash. Code() • Why do most classes override both. equals and. hash. Code? • Correctness and Performance 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 11
http: //bit. ly/javastring • What are instance variables? Initialized? • Index used so hash of “cat” != “act” public int hash. Code() { int h = hash; if (h == 0 && value. length > 0) { char val[] = value; for (int i = 0; i < value. length; i} (++ h = 31 * h + val[i; [ { hash = h; { return h; { 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 12
Word. Gram equals • Word. Gram equals method • What should it do? When should it return true or false? • What to do next? public boolean equals(Object other) { if (this == other) // point to the same Object return true; if (other == null || // Nothing is equal to null ! (other instance. Of Word. Gram)) // Different objects return false; kinds of Word. Gram wg = (Wordgram) o; // Check if all the words are equal 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 13
Word. Gram hash. Code • The given hash. Code works? • Why store my. Hash? • Efficient. Word. Markov performs very poorly. Why? public Word. Gram(String[] words, int index, int size) { // complete this constructor my. Hash = 17; } public int hash. Code() { // TODO return a better hash value return my. Hash; ; } 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 14
Word. Gram hash. Code v 2 • Compute hash by adding the individual values of the Strings in my. Words • Why not ideal? • Consider hash. Code values for the 4 -gram "jump big dog jump" and "jump big dog" public Word. Gram(String[] words, int index, int size) { //. . Code omitted to set up my. Words my. Hash = 0; for (String word: my. Words) my. Hash += word. hash. Code(); http: //bit. ly/201 -s 18 -0207 -1 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 15
http: //bit. ly/javastring • How do we tell if two strings are equal? • What about a String and an Object? • Examine characters at index k in s 1 and s 2 • If not equals, done, return false • How many chars to examine? • How do we make code easier to read than what’s given in String. java? 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 16
Is this code the same? • See while loop in http: //bit. ly/javastring public boolean equals(Object o) { if (this == o) return true; if (! (o instanceof String)) return false; if (value. length != o. value. length) return false; for(int k=0; k < value. length; k++){ if (value[k] != o. value[k]) return false; } return true; { 17
When Strings Collide • Generate strings that will collide • Find such strings in the wild String hash. Code ayay 3009136 buzzards -931102253 ay. BZ 3009136 righto -931102253 b. Zay 3009136 snitz 109586548 b. Z 3009136 unprecludible 109586548 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 18
WOTO http: //bit. ly/201 -s 18 -0207 -2 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 19
Comparing and Sorting • Arrays. sort, Collections. sort • {“ant”, “bat”, “cat”, “dog”} • What algorithm is used in sorting? • How to change to sort-in-reverse or other order • Strings are Comparable • Lexicographic order • “zebra” > “aardvark” but “Zebra” < “aardvark” 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 20
Not Everything is Comparable 9/20/17 Compsci 201, Fall 2017, Compare+Analysis 21
(x, y) < (z, w) • Can we compare Point objects? • http: //stackoverflow. com/questions/5178092/s orting-a-list-of-points-with-java • https: //stackoverflow. com/questions/6886836/ can-i-use-the-operator-to-compare-pointobjects-in-java • To be “comparable”, implement the Comparable interface, supply. compare. To(. . ) method 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 22
compare. To • takes another Object (of the same class) as an argument • Returns • a negative value if the current object is less than the argument, • zero if the argument is equal, and • a positive value if the current object is greater than the argument x≤y is equivalent to x. compare. To(y) <= 0
Build on What You Know • How does. equals work? • Make sure you have the correct type • Cast, compare public boolean equals(Object o) { if (o == null || ! (o instanceof Point)) { return false; } Point p = (Point) o; return p. x == x && p. y == y; } 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 24
Extend what you know • This is method in Point class Point implements Comparable<Point> public int compare. To(Point if (this. x < p. x) return if (this. x > p. x) return if (this. y < p. y) return if (this. y > p. y) return 0; } p) { -1; 1; -1; 1 • How do we extend to Word. Gram? 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 25
What is a Java Interface? • An enforceable abstraction: methods required • Set and Map interfaces • Comparable interface • If Set<String> is parameter then can pass … • Hash. Set<String> or Tree. Set<String> • Can sort String or anything that’s Comparable • Call. compare. To(. . ) method 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 26
What does Object-Oriented mean? • Very common method of organizing code • Design classes, which encapsulate state and behavior • Some classes can be similar to, but different from their parent class: inheritance • Super class, subclass • Inherit behavior, use as is or modify and use or both • Hard to design a hierarchy of classes, but important • More of this in Comp. Sci 308 or on-the-job training • We solve simple problems, don't design re-usable libraries • Simple doesn't mean it's not hard/difficult • OO in Markov?
Shafi Goldwasser • • 2012 Turing Award Winner RCS professor of computer science at MIT • Twice Godel Prize winner • Grace Murray Hopper Award • National Academy • Co-inventor of zero-knowledge proof protocols Work on what you like, what feels right, I now of no other way to end up doing creative work 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 28
Why use an interface? 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 29
Why use an Interface? • Work with frameworks, e. g. , java. util. Collection • Iterable, Serializable, and more – use with Java • Array. List, Linked. List, Tree. Set, Hash. Set all … • . clear(), . contains(o), . add. All(. . ), . size(), …. to. Array() https: //docs. oracle. com/javase/9/docs/api/java/util/ Collection. html 2/7/18 Comp. Sci 201, Spring 2018, Hashiing 30
- Slides: 30