Welcome to Lesson 14!Learning ObjectivesBy the end of today's lessons, you should knowWhat is the Big-O runtime of the hash table operations (insert, delete, search)?What is the load factor?What are some guidelines for selecting a good hashing algorithm?AnnouncementsQuiz 5 after the breakCollect Lab 6Return Lab 5Lab 7 Assigned - due one week from todayMidterm postponed by one class dayNew date: Tuesday, November 22No online office hours tomorrow due to Veteran's Day HolidayHowever, I will check my email a couple of times tomorrow to answer questionsDiscuss Project Report 3 due in one weekReview ActivityWith a partner, answer the following questions...Trace BST Insert on the following values: 10, 12, 15, 4, 7, 13, 9, 11What would be the output when calling pre, post and in order print on the above BSTs?What is the Big-O runtime of the BST operations (search, insert, delete)? How do these compare to the Big-O runtime for a linked list?How is data organized in a hash table?What is a hashing algorithm?What is a collision?What are two strategies that used to handle collisions?What is the difference between them?Hash Tables ContinuedSeparate ChainingAnother approach to collision resolution is called separate chaining.Rather than limiting ourselves to storing only a single value at each index of the table, with separate chaining we store a series of 0 or more linked elements at each index.In essence, we make our hash table an array of linked lists, as depicted below: Image source.Now, each time that a name is inserted, and there is a collision, we simply push the new name onto the back of our list at that particular index.Therefore, we could re-envision our previous names hash table like so:    names = "";    names = "";    names = "";    names = "Kun";    names = "Doma" --> "Sean";    names = "Howon" --> "Qilin";    names = "Maryna";    names = "Anthony" --> "Brandon" --> "Cameron";    ...    names = "";Now when I want to search for the name "Cameron," if his name is in the table, it must be in the list at index 7.To check, simply perform a linear search through the list at names.Later in the lesson, we will look at how to implement a version of a hash table that uses separate chaining.Big-O Run-Time of a Hash Table For hash table operations such as search, insert and delete we get the following Big-O run-times:Best case: O(1)Element is alone at collision free index.Worst case: O(n)Worst case is a linear search, which is O(n).Holds for both linear probing and chaining.Average case: O(1)?Average case run-time of hash table operations is a question. Assuming a perfect hash function, these operations are always O(1). However, without a perfect hash function, average run-time will be dependent upon how big the table is and how many values it is currently storing (i.e. on how full the table is).As more and more values are inserted into the table, the likelihood of collision increases resulting in longer chains.The speed of hash table operations is therefore dependent on something called the load factor, which is defined to be n/k, where k is the the table size (number of buckets) and n is the number of keys.load factor = n / kThe larger the value for the load factor, the slower the hash table operations.Therefore, average case for a hash table is sometimes considered to be O(n/k). You might be thinking: But... O(n/k) = O(n), as we do not include lower order terms and coefficients in our Big-O run-time. So, is a hash table about as efficient as a linked list then?In practice, O(n/k) is still a practical improvement over O(n).Moreover, if your table size is large enough, average run-time approaches constant time:If I set k = 2n (table size is twice the number of keys)O(n/2n) -> O(1/2) -> O(1) Improving Our Hashing AlgorithmThe above hashing algorithm, which mapped a key to an index according to the length of the key, was not a good hashing algorithm.What was wrong with this algorithm?Too many collisions - we were unable to get an even distribution of values across the tableClustering of values around buckets 2-10As a related issue, some indices were unusableindex 0 can never hold a value as there are no names of length 0Both of these problems contribute to a loss of efficiency.Perhaps we can find a better hashing algorithm...Let's try the following:Sum the ASCII values of each character in the keyScale the result by applying % table size.int hash(string key, const int TABLE_SIZE){    int index, sum = 0;    for(int i = 0; i < key.length(); i++)        sum += (int) key[i]; //summing the ASCII values for each character in the string    index = sum % TABLE_SIZE; //dividing the summed ASCII values by 35 && storing remainder as my index    return index;}Guidelines for Choosing a Good Hash FunctionBut, how do we know that our new algorithm is a good one?Here are some guidelines for writing good hashing functions:Use all of the information provided by a key.Our algorithm uses all characters in the name to create the key.Make sure the elements to insert can be spread across the entire table.By applying % operator we make sure our summed ASCII values will be better distributed throughout the table.Verify that your algorithm can map similar keys to different buckets.Will values like "Mark" and "Matt" be placed in the same bucket?Applying the hashing algorithm, with table size of 37Mark: mapped to index 25Matt: mapped to index 36Make sure it uses only simple operations (such as the basic arithmetic operators) to minimize run-time.When you write a hash table, you will call your hashing function often (insert, search, delete).Make certain your hashing function is not computationally intensive.While the hashing function is an important aspect in helping to increase the effectiveness of a hash table, other aspects also need to be considered.Guidelines for Selecting a Table SizeIn order to effectively minimize collisions (thereby maintaining efficiency), a good rule-of-thumb is to make your table size be twice the number of entries you are planning to insert.However, more often than not, the number of entries to be inserted will not be known in advance.Therefore, it may be necessary to re-size your hash table:Remember the load factor: n /k (number_of_keys  divided by table_size)As the load factor increases, collisions will become more common, resulting in decreased efficiency.How can we decrease the load factor? We increase the table size!When should we re-size the table?Linear probing: Array indices are used up quickly. Re-size when load factor >= 1/2Separate Chaining: Opinions vary: between 2/3 and 3/4.Simplest re-sizing method:Create new hash table with an array sized twice as large as the current hash table.Linearly traverse the old hash table.Copy each element and insert into new hash table.Elements will be rehashed when inserted into new hash table.Delete old hash table.Work on Lab 7Wrap UpWith your partner, answer the questions from today's learning objectivesUpcoming AssignmentsLab 7 due Thursday~Have a Great Weekend!~