By the end of this lesson, you should know...
 How do you select a good hash algorithm?
 How do you override the hashCode() method for a class?
 What is the efficiency of the Hash Table operations?
 How does the size of the Hash Table affect its efficiency?
 When is it necessary to resize a hash table and how would you write a simple resizing method?
1. Improving Our Hash Algorithm
 The previous hash algorithm, which mapped a key to an index according to the length of the key, was not a good hashing algorithm.
 What was wrong with this algorithm?
 Too many collisions  we were unable to get an even distribution of values across the table
 Clustering of values around buckets 210
 As a related issue, some indices were unusable
 index 0 can never hold a value as there are no names of length 0
 Both of these problems contribute to a loss of efficiency.
 Perhaps we can find a better hash algorithm...
 Let's try the following:
 Sum the Unicode values of each character in the key
 Scale the result by applying % table size.
method int hash(String key, int TABLE_SIZE)
let sum = 0; for(i = 0 ... key.length())
sum += (int)key.charAt(i) //summing the Unicode values for each char
index = sum % TABLE_SIZE //dividing the summed Unicode values by table_size //and storing remainder as the index return index
2. Overriding hashCode() for a Class
 Our new hash algorithm will call the hashCode() method for each class, and then take the modulus of the returned integer value, dividing it by table size:
private int hash(T t) { int code = t.hashCode(); return code % Table.size();
}
 Then, for each class interacting with the hash table class, we will need to ensure that it overrides hashCode(). For example, given a Book class, we might override hashCode like this:
public class Book {
@Override public int hashCode() {
String key = title + author; //define key for this class int sum = 0;
for (int i = 0; i < key.length(); i++) {
sum += (int) key.charAt(i);
} }  Or, given a class Butterfly, we might override hashCode as follows:
public class Butterfly {
@Override public int hashCode() {
String key = genus + species; //define key for this class int sum = 0;
for (int i = 0; i < key.length(); i++) {
sum += (int) key.charAt(i);
} }  Or, given a class Student, we might override hashCode as follows:
public class Student {
@Override public int hashCode() {
String key = "" + studentID; //define key for this class int sum = 0;
for (int i = 0; i < key.length(); i++) {
sum += (int) key.charAt(i);
} }  Note that we can use the same algorithm, which sums Unicode values, but simply place it inside the hashCode() method for each class we write.
3. Guidelines for Choosing a Good Hash Algorithm
 But, how do we know that our new algorithm is a good one?
 Here are some guidelines for writing good hashing methods:
 Use all of the information provided by a key.
 Our algorithm uses all characters in the name to create the key.
 Make sure the elements to insert can be spread evenly across the entire table.
 By applying % operator we make sure our summed Unicode values will be better distributed throughout the table.
 Verify that your algorithm can map similar keys to different buckets.
 With the new algorithm, will values like "Mark" and "Matt" be placed in the same bucket?
 Applying the hashing algorithm, with table size of 37
 Mark: mapped to index 25
 Matt: mapped to index 3
 Make sure it uses only simple operations (such as the basic arithmetic operators) to minimize runtime.
 When you write a hash table, you will call your hashing method often (insert, search, delete).
 Make certain your hashing method is not computationally intensive.
 While the hashing method is an important aspect in helping to increase the effectiveness of a hash table, other aspects also need to be considered.
4. BigO RunTime of a Hash Table
 For hash table operations such as search, insert and remove we get the following BigO runtimes:
 Best case: O(1)
 Element is alone at collision free index.
 Worst case: O(n)
 Worst case is a linear search, which is O(n).
 Holds for both linear probing and chaining.
Worst Case for Separate Chaining
 Average case: O(1)? Yes!
 Average case runtime of hash table operations is a question. Assuming a perfect hash method, these operations are always O(1).
 However, without a perfect hash method, average runtime will be dependent upon how big the table is and how many values it is currently storing (i.e. on how full the table is).
 As more and more values are inserted into the table, the likelihood of collision increases, resulting in longer chains.
 The speed of hash table operations is therefore dependent on something called the load factor, which is defined to be n/k, where k is the the table size (number of buckets) and n is the number of keys.
load factor = n / k
where,
n = the number of keys (elements)
k = the table size (number of buckets)
 The larger the value for the load factor, the slower the hash table operations.
 Therefore, average case for a hash table is sometimes considered to be O(n/k).
 You might be thinking: But... O(n/k) = O(n), as we do not include lower order terms and coefficients in our BigO runtime.
 So, is a hash table about as efficient as a linked list then?
 In practice, O(n/k) is still a practical improvement over O(n).
 Moreover, if your table size is large enough, average runtime approaches constant time.
 Rule of thumb: let your table size be twice to three times the number of anticipated keys.
 If I set k = 2n (table size is twice the number of keys)
 O(n/2n) > O(1/2) > O(1)
5. Guidelines for Selecting a Table Size
 In order to effectively minimize collisions (thereby maintaining efficiency), a good ruleofthumb is to make your table size be twice the number of entries you are planning to insert.
 However, more often than not, the number of entries to be inserted will not be known in advance.
 Therefore, it may be necessary to resize your hash table:
 Remember the load factor: n / k (number_of_keys divided by table_size)
 As the load factor increases, collisions will become more common, resulting in decreased efficiency.
 How can we decrease the load factor? We increase the table size!
 When should we resize the table?
 Resizing also requires computation time, so only resize when necessary to maintain lookup efficiency.
 Linear probing: Array indices are used up quickly. Resize when load factor >= 1/2
 Separate Chaining: Opinions vary: between 2/3 and 3/4.
 Simplest resizing method:
 Create new hash table with an array sized twice as large as the current hash table.
 Linearly traverse the old hash table.
 Copy each element and insert into new hash table.
 Elements will be rehashed when inserted into new hash table.
 Delete old hash table.
Wrap Up  Answer the review questions on Canvas for this Lesson
 Nice 7 minute summary of Hash Tables: Harvard CS 50 Video
Upcoming Assignments Lab due Monday
 Study for your Quiz
~Enjoy Your Day!~
