Welcome to Lesson 10!

Learning Objectives
By the end of this class, you should know...
  • What is a tree and how does it differ from a linked list?
  • What is a Binary Tree ADT?
  • What are the properties of Binary Trees?
  • How do you determine if a Binary Tree is balanced?
  • How do you define a Binary Search Tree ADT?
  • What is the difference between a Binary Tree and a Binary Search Tree?
  • Why is the BST a recursive structure? 
  • What are the BST operations?
  • What is the Big-O run-time of these operations on a BST?
  • How do you implement insert for a BST
Announcements
  • Excellent midterm results!
    • As: 29
    • Bs: 5
    • Cs: 7
    • Ds: 2
    • Fs: 2
  • De Anza Women in CS Club meeting tomorrow at 12:30 in the ATC 203 main lab!
    • Hand out flyers
  • No quiz until next week
  • Project Report 2 due on Thursday

Wrap Up Algorithm Efficiency

Tree ADTs

Introducing Trees
  • Consider the ADTs we have studied so far:
    • Array
    • Linked List
    • Queue 
    • Stack
  • These structures all store data in a linear fashion, as in the example of a linked list show below:
  • However, storing data linearly is not the only option.
  • In many cases, our data might not have a linear relationship.
  • Consider, for example, a company's org. chart:

  • Or, consider this depiction of the relationship among the countries in the world and their cities and states:
Image source.

  • Or, consider this data which is arranged into what we call a decision tree:

Image source.

  • Finally, we can represent the song "Hey Jude" as a decision tree:

http://strawp.net/files/hey_jude.txt.png

  • Given the nature of certain types of data, sometimes it can make more sense to organize the data in hierarchical structure.
  • Also, certain operations, such as search and insertion or deletion can be more efficient when we organize data hierarchically rather than linearly.
  • In computer science, we can use a tree ADT to organize data hierarchically.

The Structure of the Tree ADT
  • Like linked lists, trees are comprised of nodes.
  • These nodes store data and pointers to other nodes in the tree.
  • Like the head and tail of a linked list, trees have specially-named nodes that serve important functions in the tree.
  • The most important node is known as the root.
  • Like the head of a linked list, a root is what allows us to access the first node in the tree.
  • You can visualize a tree ADT with its root node like this:


  • Trees in computer science are usually depicted with the root node on top.
  • Therefore, you have to visualize the tree upside down.


  • The branches of the tree are called links or edges.
  • Unless the tree is empty or has size==1, each Node in the tree is connected to other nodes by these pointers (links).
  • Nodes in a tree have parent-child relationships.
  • Each node can have only one parent node. But a node can have 0 or more child nodes.
  • The root node in the image below has four child nodes.
  • The root is considered the parent node of those four child nodes.
  • For example, A is considered the child node of the root and the root is considered the parent node of A.

  • In other words, the terminology we use to describe trees and their nodes, is similar to that of a family tree.


  • Nodes in a tree can have parents, grandparents, cousins, siblings, children and grandchildren.
  • They can also have ancestors and descendants.
  • For example, in the tree below, Nodes 1,4,7, and 10 and their parents are all descendants of the the root node 5.
  • The root node 5 is thus their ancestor.

Tree graph example

  • Another important node in a tree is the leaf node.
  • The leaf node is any node in the tree with no children.
  • In other words, the links of a leaf node point to Null.
  • Any node that is not the root or a leaf is called an interior node as depicted above.
  • The root and leaves of the tree are exterior nodes.

Properties of Trees
  • Consider a tree with n nodes. How many edges will it contain.
    • Exactly n-1. Why is this the case?
    • The reason is that all nodes, except the root, have exactly one incoming edge.
  • The depth of a node X in a tree is defined as the number of edges from the root to the node X.
    • What is the depth of the root node, given this definition?
  • The height of a node X is defined as the number of edges in the longest path from the node X down to a leaf node.
    • What is the height of any leaf node given this definition?
  • Trees themselves can also have heights. The height of any tree is the same as the height of its root.




Activity Labeling a Tree
  • Given the tree below, answer the following questions:


  • Step 1: Label the root and leaf nodes
  • Step 2: List at least one of each of the following relationships:
    • Parent-child
    • Siblings
    • Ancestor
    • Descendant
  • Step 3: identify 3 interior nodes
  • Step 4: label the depth of the following nodes:
    • html
    • body
    • li

  • Step 5: label the height of the same nodes.


The Binary Tree ADT
  • A binary tree is a tree where each node can have at most 2 children.
  • Below is an example of a binary tree. Note that each node has 0, 1 or 2 children.

Image source
  • These two children are called the left child and the right child.
  • For example, in the BT above, 4 is the left child of its parent node 6 and 7 is the right child of its parent node 6.
  • 14 only has a left child, but does not have a right child.
  • We can consider binary trees to be recursive structures.
    • The root node links to root nodes of its left and right subtrees.
    • These roots of the subtrees in turn link to the root nodes of their left and right subtrees.
    • We will discuss this concept more when we get to Binary Search Trees.

    

Image source.

  • Therefore, each node in the tree will contain some data as well as two pointers, one to the left child and one to the right child as depicted below:
Binary Tree
  • If a node has no right or left child, these values are set to Null.
  • As you might have predicted, when we go to implement the binary search tree, we will be representing its node as a private inner struct like this:
struct Node
{
    int data;
    Node* left;
    Node* right;
};
  • Therefore, a Node has three fields:
    • a field for the data
    • a field for the address of the left child (another Node).
    • a field for the address of the right child (another Node).
  • A binary tree is made up of Nodes, each with pointers to a left and right child node as shown below:

  • The examples of binary trees that we have seen so far are balanced trees.
  • Here is another example of a valid binary tree. Why does this tree meet the definition of a binary tree?

  • This is also a valid binary tree. Why? Also, what does it resemble?



 
Image source.




Special Types of Binary Trees
  • A binary tree is considered a proper or strict binary tree if each node has either 0 or 2 children. Below is an example of a strict binary tree.
  • complete binary tree is one in which all levels are filled (except the last) and all nodes are as far left as possible. Below is an example of a complete binary tree.

  • A perfect or full binary tree is one in which all levels are completely filled.

Balanced Binary Tree
  • But what does balanced mean?
  • A balanced binary tree is one in which the difference between the height of the left and right subtrees is not more than 1.
| heightleft - heightright| <= 1 
  • Let's give this a try.
  • The below tree is balanced even though it might not appear so. Why?

An example binary tree


  • Consider the height of the right and left subtree of the root node A.
  • hleft = 1
  • hright = 2
  • |1 - 2| = 1 <--balanced!
  • Let's check another node, B.
  • hleft = -1 (height of an empty tree is -1)
  • hright = 0
  • |-1 - 0| = 1 <--balanced!



Partner Activity
  • With a partner answer the following question: Is the binary tree depicted below balanced? Why or why not?

Image source.


The Binary Search Tree ADT
  • A Binary Search Tree (BST) is a binary tree for which the following is true: for any node in the BST, the value of all nodes in its left subtree are less than or equal to the value of the node, and the value at all nodes in the right subtree are greater than the value of the node.
  • The following tree is a BST. Verify that the above property is true for this tree.

BST: A Recursive Structure
  • Remember that the root node is linked to the root nodes of its left and right subtrees.
  • These root nodes are in turned linked to the root nodes of other subtrees.
  • BST are recursive because the BST property must hold true for all nodes in the tree and their subtrees.

Image source.


  • Examine the tree above. Do all subtrees meet the requirements for a BST?
  • Due to the recursive definition of BSTs, we are able to implement most BST operations recursively, as we shall see in a moment.


BST Operations


Insertion

  • As always, the best functions to write first are your default constructor, an insertion function and a print function.
  • The functions you will be writing for the BST will take advantage of the recursive structure of the BST.
  • To get an idea of how to write the code for this function, let's look at an example
  • Imagine that we are trying to insert 4 into our BST as in the image below.
  • At each root node of a subtree, we need to make a decision. Do we go left? Do we go right? Or if the root node is null, we should insert where we are.

image

Image source.


  • Thus, our process is recursive as we are making the same decision at each node.
  • Our BST users do not have direct access to any of the nodes in the BST or their data or links.
  • Therefore, we need to call a private helper function inside of insert and pass the root node to this function.
  • The private helper function is the recursive function that is going to do most of the work here.
  • The private helper function takes the root and a value as parameters.
  • The public wrapper function simply takes the value to insert as a parameter
  • Here is the code for the public wrapper function:
template <class bstdata>
void BST<bstdata>::insert(bstdata data)
{
    if (root == NULL)
    {
        root = new Node(data); //if the tree is empty insert the value at the root
    }
    else
    {
        insertNode(root, data); //otherwise call the helper function, passing it the root
    }
}
  • Here is the pseudocode for the private helper function:

Pseudocode for insertNode(root, value)

1. Base Case: the value to insert is equal to the data stored at the root

If this is true, return //no duplicates

2. Check if the value to insert is smaller than the data stored at the root.

2a. If this is true

2aa. Check if root's leftchild is NULL

2aaa. If this is true, insert the value at root's leftchild

2b. If the root's leftchild is not NULL

2bb.recursively call the insertNode function passing it root's leftchild and value

3. Otherwise

3a. Check if the root's rightchild is NULL

3aa. If this is true, insert the value at root's rightchild

3b. If the root's rightchild is not NULL

3bb. recursively call the insertNode function passing it root's rightchild and value


Introduction to Run-time Analysis on Binary Search Tree Operations

  • The Big-O runtime of the binary tree operations is closely allied to the height of the tree.
  • The closer the tree is to a perfect binary tree, the better the run-time.
  • Why is this?
  • Consider a binary tree like this one:



  • This binary tree is no different than a linked list.
  • Is search efficient on a linked list? We should have seen in Lab 4 that searching a linked list is not efficient.
  • Because linked lists do not offer random access of their elements, binary search is not an efficient algorithm on a linked list.
  • Did you notice this inefficiency while completing your Lab 4?
    • Thus, linear search is a better choice for linked lists.
    • What is the Big-O run-time of linear search on a linked list: O(n).
  • For binary trees organized like the one above, search also takes O(n) time, as does insertion and removal.
  • We are going to see with balanced Binary Search Trees that search is possible in O (log n) time, just like binary search on an array.
  • However, to keep search (and other operations) efficient, we want to keep our trees as balanced as possible.
  • This idea should make intuitive sense to you.

Comparison of Arrays, Linked Lists and Binary Search Trees
  • Let's compare the average case Big-O run-times of Arrays and Linked Lists to those of the BST when performing 3 common ADT operations.
Array            Linked List            Binary Search Tree            
Search                O(log n)                O(n)                    O(log n)
Insertion             O(n)                     O(n)                    O(log n)
Deletion              O(n)                     O(n)                    O(log n)

  • Why are these operations O(log n) for a BST?
  • Consider search. In the image below, we are searching for the value 4 to determine if it is in the tree:

image

Image source.

  • As in binary search, we continually divide the search space in half until it is only one element:
n -> n / 2 -> n / 4 -> n/ 8... -> n/ 2k = 1
  • Multiply both sides by 2k
n = 2k
  • Then take the log of both sides:
log (n) = k
  • This leaves us with O(log n).
  • Let's walk through an example to verify this works on a balanced BST. What happens if we search for the value 12?
  • Insertion and removal have the same run-time on average as search. Let's walk through an example of inserting the value 9 in the BST above.
  • But, what about when the BST is unbalanced?
  • The Big-O run-time in this case will be equal to that of a linked list: O(n)
  • Why?


Animated BST Efficiency compared to a Linked List (although it says array)




Group Activity:
  • With a partner, insert the following values into a BST in the given order: 5 3 6 9 0 2 7 12 4
  • How does the tree change if you swap the order of 5 and 9?  9 3 6 5 0 2 7 12 4
  •  What about 5 and 12? 12 3 6 9 0 2 7 5 4



Wrap Up
  • With your partner, answer the learning objective questions from the beginning of the lesson.


Upcoming Assignments.

  • Lab 6 due Tuesday, February 20