Welcome to Lesson 13!

Learning Objectives
By the end of this class, you should know...
• What is a Hash Table?
• What is a hashing algorithm?
• What are some solutions to the problem of collisions?
• What is the difference between linear probing and separate chaining?

1. Hash Table Pre-Assessment: Remember This?
• Unicode Review: Write a method which prints out the Unicode value of each character in a String:
public static void printUnicode(String str)
{
//write method body here
}

• A Hash Table is a structure where values are mapped to an array index according to the rules of an algorithm.
•  This algorithm is called the hashing algorithm.
•  The hashing algorithm takes an input (called a key) and then returns the index of an array.
•  This array index (also called a bucket) is the location in the array where the value can be stored.
•  The key can be the data itself or some portion of the data, as in the example below where the hash table is storing employee objects.
• As we are going to see, hash table operations, such as search, insert, and delete can be performed in just O(1) time!

2.1. A Hash Table Example

• Suppose that I wish to store the names of all the students in my class.
final int CLASS_SIZE = 42; //# students enrolled in the class
String[CLASS_SIZE] names = new String[];
• The underlying structure of a hash table is an array.
• However to insert elements into our hash table array, we utilize a hashing algorithm.
• In general, a hashing algorithm is used to produce a value, usually a number, that is consistent for a given input.
• For a hash table, the hashing algorithm is used to determine at which array index (or bucket) to place a value in the table.
• We will also need to use the hashing algorithm to search for a value (at which index was the value placed in the table) and to remove a value.
• Let's assume a simple hashing algorithm to determine at which array index to store each name on the roster.
• When searching for a student by name, I would call the hashing algorithm, passing the name as input.
• The hashing algorithm would return the index with which the name is associated.

• Assume the hashing algorithm is implemented as the following method:
private int hash(String student_name) //student_name is the key
{
int index = student_name.length();
return index;
}
...
//applying the hash algorithm to the first name in my CIS 22C roster
int index = hash("Anthony"); //Should return 7
names[index] = "Anthony"; //storing the value at index 7
• In the above example, the value "Anthony" is the key.
• The key is the value that is used by the hash method to determine the index of the name within the table.
• Now, when I add students to the roster, they will be stored at the index provided by the hash method:
names = "";
names = "";
names = "";
names = "Kun";
names = "Doma";
names = "Howon";
names = "Maryna";
names = "Anthony";
...
names = "";
• Later, when I search for a name in my table, I can simply call the hash method:
hash("Anthony") => index at which Anthony is stored if he is in the table

2.2. Collisions • As you probably noticed, there is a problem with the above example.
• How many people in this class have a first name that is 3 letters long? 4 letters? 5 letters? 6 letters? 7 letters? 8 letters?
• This problem is known as a collision.
• A collision occurs when more than one value is mapped to the same index in a hash table.
• In my above example, most names would be mapped to the indices 2 - 10.
• Trivia: Average name length is 5.9 letters.
• Collisions can be mitigated with better hashing algorithms.
• However, regardless of how good a hashing algorithm is, collisions will occur.
• perfect hash method (no collisions) can only be achieved if all values are known ahead of time; very unlikely.
• There are various approaches to resolving collisions. We will discuss two principle approaches below.

2.3. Linear Probing
• One simple approach to handling collisions within a hash table is called linear probing.
• In linear probing, if an element is hashed to an index that is already storing another value, the new element is inserted at the next open index.
• For example, imagine our names hash table contains the following values:
names = "";
names = "";
names = "";
names = "Kun";
names = "Doma";
names = "Howon";
names = "Maryna";
names = "Anthony";
names = "";
names = "";
...
names = "";
• Now, further suppose that I want to insert the name "Brandon" into my names table.
• Woops! There is a collision when I apply my hash algorithm.
• Index 7 is already occupied by "Anthony"
• I can use linear probing to resolve this collision.
• Using linear probing, I insert the name "Brandon" in the next available index after the hashed index of 7.
• Where would I place "Brandon" in the table?
• names = "Brandon";
• Now, my table looks like the following:
names = "";
names = "";
names = "";
names = "Kun";
names = "Doma";
names = "Howon";
names = "Maryna";
names = "Anthony";
names = "Brandon"; //placed at next available bucket
names = "";
...
names = "";
• Next, suppose I want to insert the name "Cameron". Where would his name go as index 7 already contains the name "Anthony" and index 8 already contains the name "Brandon"?
• In this case, I would need to place "Cameron" two indices away from his hashed index of 7, at index 9:
names = "";
names = "";
names = "";
names = "Kun";
names = "Doma";
names = "Howon";
names = "Maryna";
names = "Anthony";
names = "Brandon";
names = "Cameron"; //placed at next available bucket
...
names = "";
• Now, suppose that I want to search for the name "Cameron."
• Applying my hash method, I would look for his name at index 7.
• When I do not locate him there, I know that I need to keep searching farther down the table.
• As you can see, the more collisions, the worse the efficiency of my hash table when performing the insert and search operations.
• Given sufficient collisions in my table, search approaches a Big-O run-time of O(n), no improvement over a simple array.
• Is there another approach to linear probing?

2.4. Separate Chaining
• Another approach to collision resolution is called separate chaining.
• Rather than limiting ourselves to storing only a single value at each index of the table, with separate chaining we store a series of 0 or more linked elements at each index.
• In essence, we make our hash table an array of linked lists, as depicted below: • Now, if a name is inserted, and there is a collision, we simply push the new name onto the back of our list at that particular index.
• Therefore, we could re-envision our previous names hash table like so:
names = "";
names = "";
names = "";
names = "Kun";
names = "Doma" --> "Sean";
names = "Howon" --> "Qilin";
names = "Maryna";
names = "Anthony" --> "Brandon" --> "Cameron";
...
names = "";
• Now when I want to search for the name "Cameron," if his name is in the table, it must be in the list at index 7.
• To check, simply perform a linear search through the list at names.
• For lab 7, we will look at how to implement a version of a hash table that uses separate chaining.