Intro to Binary Search Tree

Binary search tree is very efficient and has strengths of array and linked list.
Array and Linked List are great data structure but they have some weaknesses.
It’s fast to access array but slow to adjust the size.
It’s slow to access linked list but fast to adjust the size.
However, it’s fast to access binary search tree and it’s still very flexible to adjust the size.

What is Binary Tree?

Before diving into binary search tree we need to understand binary tree first.
Binary tree is an extended form of singly linked list but has two pointers to other nodes (children) instead.
Pointers are not to the previous and next but to left and right children.
Let’s first take a look at the picture of an example binary tree.

binary search tree

As you see the above, the topmost node is the root of the tree.
The root has two children node, left and right. (please note that you don’t have to call it left & right)
The right child node doesn’t have any child and thus it is called leaf node of the tree.

How is binary search tree different from binary tree?

The only difference between binary search tree and binary tree is the order of nodes when a node is inserted.
Binary search tree always forces the rule below.

  • Any value less than the value of the current node is always placed on the left child leaf node.
  • Any value greater than the value of the current node is always placed on the right child leaf node.
  • Move to proper child node if either child node is already created (non-leaf).

Let’s take a look at the example below for insertion.
I am going to insert 5 nodes and let’s see how the insertion is done.

I first inserted a node with value 20 which became the root of the tree since the tree was empty.
Then I inserted another node with value 10 which is the left child of the root node because 10 is less than 20.

binary search tree example

If I have another node with value 15 where will it be placed?
15 is less than 20 so it goes to the left child of the root node.
But the left child is already there so we compare 10 and 15.
15 is greater than 10 and the node with value 10 doesn’t have any right child.
Therefore, a node with value 15 will be the right child of the node with value 10. And it is a leaf node.
Please note that the node is always inserted as the leaf node of the tree.

binary search tree example

I have two more nodes with values 8 and 21 respectively.
The node with value 8 will be the left child of the node with value 10.
The node with value 21 will be the right child of the node with value 20.

Performance of Binary Search Tree

As you see the above example binary search tree always maintains the order of nodes after the insertion.
This is great since it will be very fast to find and access a certain node thanks to its property.

Let’s take a look at how it will find a node with an example.
Let’s say I want to find the node with value 15.
First, it takes a look at the root node and finds out that 15 is less than its current value 20.
Therefore the node must exist somewhere on the left side of the root node.
It checks the node with value 10 and finds out that 15 is greater than 10.
Therefore the node must exist somewhere on the right side of the node with value 10.
Then, finally, the node with value 15 is found in the right child of the node with value 10.
Did you notice that we are eliminating half of the tree every time we compare which is very similar to binary search?
In fact, insertion and search and deletion are really the same as binary search and their performance is also the same – O(log n).

Insertion – O(log n)
Search – O(log n)
Deletion – O(log n)

Please note that the above performance is valid only if the tree is balanced which depends on the insertion order unless it balances automatically after every insertion.
In the worst-case scenario where the tree is extremely unbalanced and essentially becomes like a linked list all the operation cost will be O(n).
I won’t discuss how to balance the tree as it will be a lot of information for just one post.

Code Example

Here is the code example of binary search tree in C++

struct Tree
{
    // This could be any data type
    int val;
    
    Tree *leftChild;
    Tree *rightChild;
};

Tree *searchTree(Tree *elt, int searchVal)
{
    // value not found in the tree
    if (!elt)
    {
        return 0;
    }
    
    // found value
    if (searchVal == elt->val)
    {
        return elt;
    }
    // search value must be in one of left children nodes
    else if (searchVal < elt->val)
    {
        return searchTree(elt->leftChild, searchVal);
    }
    // search value must be in one of right children nodes
    else
    {
        return searchTree(elt->rightChild, searchVal);
    }
}

void insertTree(Tree **elt, int newVal)
{
    // found correct place to insert
    if (!*elt)
    {
        Tree *newNode = new Tree;
        newNode->val = newVal;
        newNode->leftChild = 0;
        newNode->rightChild = 0;
        *elt = newNode;
        return;
    }
    
    Tree *node = *elt;
    // new node must be inserted in one of left children
    if (newVal <= node->val)
    {
        insertTree(&node->leftChild, newVal);
    }
    // new node must be inserted in one of right children
    else
    {
        insertTree(&node->rightChild, newVal);
    }
}

// example of inorder traversal. I am just printing here
// this is called inorder traversal. 
// this will print above example tree as this.
// 8 10 15 20 21
void printInOrder(Tree *head)
{
    if (!head)
    {
        return;
    }
    
    printInOrder(head->leftChild);
    // you can do any operation here
    cout << head->val << " ";
    printInOrder(head->rightChild);
}

Conclusion

We have taken a look at binary search tree and how it maintains the order of nodes which enables fast operations such as insertion, search, and deletion.
It is a powerful data structure that has strengths of array – fast search – and linked list – flexible size adjustment.
However, the tree needs to be balanced in order to be efficient.
There is multiple tree balance logic and I will try to explain them in other posts.

Intro to Heap data structure

Heap is one of the most efficient data structure that can handle data for certain purposes.
To have a better understanding of heap you can imagine a priority queue.

For example, let’s say there are multiple jobs that you need to do for today. They could be chores, work, and any activities.
However, not all of them have the same priority and you probably want to finish jobs in order of importance.

A priority queue is a queue that takes tasks and handles them in order of specified priority.
Therefore, unlike a normal queue which is FIFO – first in first out, a priority queue always guarantees to provide an element in the order of priority.
The order could be the minimum or maximum of the priority depending on the user’s choice.

Why do I talk about the priority queue in the heap data structure post?
It’s because the heap is the most crucial part of priority queue implementation.
The heap data structure actually maintains elements based on the order specified.
Let’s take a look at how the heap is structured and how it handles data.

How is heap structured?

Typically many efficient data structures maintain elements in sorted order to access data fast.
Binary search is a good example that requires all the elements to be sorted.
Binary search tree also guarantees all the tree nodes are placed in sorted order.

However, although heap uses binary tree it is not as restrictive as the binary search tree.
The only requirement of heap is that all the children of the tree node need to be smaller than the parent. (max heap)
If a user wants min-heap then all the children of the tree node need to be bigger than the parent.
Max heap guarantees that the root of the tree will always be the largest element of all the tree nodes.
Min heap guarantees that the root of the tree will always be the smallest element of all the tree nodes.

Let’s take a look at the picture of max heap as a picture is worth a thousand words.

heap data structure

As you see the above, 98 is the largest value of all the nodes.
Then node 98 has two children, 85 and 90. Node 85 has two children 81 and 82. Node 90 has two children 84 and 88.
Please note that this tree is not strictly sorted like binary search tree.
You may have noticed that node 88, the child of 90, is greater than node 85 but it’s located at the leaf node.
Basically it is fine as long as the parent node is greater than its children.

As you can see heap condition is maintained in the above example.

  1. Node 98: 98 > 85 && 98 > 90
  2. Node 85: 85 > 81 && 85 > 82
  3. Node 90: 90 > 84 && 90 > 88

Why do we only care about parent and children relationship?
It’s because we just want to maintain everything based on priority!
What heap guarantees is that the root will be the largest value for max-heap and smallest for min-heap.

Where do you use heap for?

Now you might wonder why heap is useful when it only guarantees about the root element.
This is important because as you keep inserting elements the heap will always maintain the elements based on the priority.

Let’s go back to the very first example I gave in the beginning of the post – to dos for a day.
As you are finishing your job for the day there might be more jobs created (i.e. your family asked to pick up pizza)
The important thing is that the priority order needs to be maintained as new jobs are requested.

Therefore, heap is really the core of priority queue and priority queue works best if you want to process each element based on specified priority.

How do you maintain the heap?

The critical point of the heap is maintaining the order.
Maintenance is really simple as there are only two cases – insertion and pop of the element.

Insertion

  1. Insert the element to the leaf of the tree (I will explain which leaf in code example)
  2. Now, it is possible that heap order is broken so check and swap if the new element is greater/smaller than the parent.
  3. Recursively check until heap invariant is satisfied

This is fixing the heap upward.

Pop/Deletion

What does pop mean? You are dequeuing the root element from the heap.
After popping the root you also need to maintain the heap.

  1. Save the root element
  2. Place last element (leaf) to the root position
  3. Check and swap parent and children node until heap condition is satisfied

This is fixing the heap downward.

Array can be used to represent binary tree for the heap!

For efficient access you can use array to represent binary tree.

heap implemented based on array

Although I could use binary tree it will be less efficient since 2 children pointers are required which is less efficient since you can use the index of an array to find out the location of children.

Now, you might ask if array could be used for binary search tree.
That’s very unlikely since array is not as flexible as binary tree and it will be quite hard to maintain completely sorted order when there is constant insertion and deletion.

Code Example

This code example implemented max heap class.
I used array to represent binary tree.

#include <iostream>
#include <vector>
using namespace std;

class MaxHeap
{
    int lastIdx;
    int queueSize;
    vector<int> queue;
    
public:
    
    MaxHeap(int _queueSize) 
        : lastIdx(-1), 
          queueSize(_queueSize), 
          queue(_queueSize) {}
    
    bool insert(int val)
    {
        if (lastIdx >= queueSize)
        {
            return false;    
        }
        
        // insert the element to the last leaf node of queue 
        // and fix the heap upward
        ++lastIdx;
        queue[lastIdx] = val;
        
        fixHeapUpward(lastIdx);
        
        return true;
    }
    
    int max()
    {
        if (lastIdx == -1)
        {
            throw std::runtime_error("Extracting from empty queue not allowed");
        }
        
        return queue[0];
    }
    
    void pop()
    {
        if (lastIdx == -1)
        {
            return;
        }
        
        // get max value of the heap
        int result = queue[0];
        
        // now root is empty and get last element and fix heap downward
        queue[0] = queue[lastIdx--];
        fixHeapDownward(0);
        
        return result;
    }
    
    void print()
    {
        for (unsigned int i = 0; i < queue.size(); ++i)
        {
            cout << queue[i] << " ";
        }
        cout << endl;
    }
    
private:
    int findParentIdx(int idx)
    {
        if (!idx)
        {
            // this is root index
            return 0;
        }

        if (idx % 2)
        {
            // this must be left, 2 * h + 1
            return idx / 2;
        }
        else
        {
            // this must be right, 2 * h + 2
            return (idx - 2) / 2;
        }
    }
    
    void fixHeapUpward(int idx)
    {
        int parentIdx = findParentIdx(idx);
        if (queue[parentIdx] < queue[idx])
        {
            swap(queue[parentIdx], queue[idx]);
            fixHeapUpward(parentIdx);
        }
    }
    
    void fixHeapDownward(int idx)
    {
        int maxIdx = idx;
        for (int childIdx = idx * 2 + 1; 
             childIdx <= idx * 2 + 2 && childIdx <= lastIdx; 
             ++childIdx)
        {
            if (queue[childIdx] > queue[maxIdx])
            {
                maxIdx = childIdx;
            }    
        }
        
        if (maxIdx != idx)
        {
            swap(queue[maxIdx], queue[idx]);
            fixHeapDownward(maxIdx);
        }
    }
};

int main() 
{
    // heap size 10	
	MaxHeap heap(10);
	
    heap.insert(81);
    heap.insert(85);
    heap.insert(82);
    heap.insert(98);
    heap.insert(84);
    heap.insert(90);
    heap.insert(88);
	
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();

	return 0;
}

Performance

Insertion: O(log n)
Pop: O(log n)
Making heap: O(n log n)

Conclusion

We have taken a look at heap data structure.
Although it might be slightly hard to fully understand heap it is not really complicated data structure once you understand the purpose and how it fulfills it.
In fact, heap could be used to solve quite a bit of real-world problems pretty nicely.
Please let me know if you have any questions or suggestions and good luck!