Intro to Heap data structure

Heap is one of the most efficient data structure that can handle data for certain purposes.
To have a better understanding of heap you can imagine a priority queue.

For example, let’s say there are multiple jobs that you need to do for today. They could be chores, work, and any activities.
However, not all of them have the same priority and you probably want to finish jobs in order of importance.

A priority queue is a queue that takes tasks and handles them in order of specified priority.
Therefore, unlike a normal queue which is FIFO – first in first out, a priority queue always guarantees to provide an element in the order of priority.
The order could be the minimum or maximum of the priority depending on the user’s choice.

Why do I talk about the priority queue in the heap data structure post?
It’s because the heap is the most crucial part of priority queue implementation.
The heap data structure actually maintains elements based on the order specified.
Let’s take a look at how the heap is structured and how it handles data.

How is heap structured?

Typically many efficient data structures maintain elements in sorted order to access data fast.
Binary search is a good example that requires all the elements to be sorted.
Binary search tree also guarantees all the tree nodes are placed in sorted order.

However, although heap uses binary tree it is not as restrictive as the binary search tree.
The only requirement of heap is that all the children of the tree node need to be smaller than the parent. (max heap)
If a user wants min-heap then all the children of the tree node need to be bigger than the parent.
Max heap guarantees that the root of the tree will always be the largest element of all the tree nodes.
Min heap guarantees that the root of the tree will always be the smallest element of all the tree nodes.

Let’s take a look at the picture of max heap as a picture is worth a thousand words.

heap data structure

As you see the above, 98 is the largest value of all the nodes.
Then node 98 has two children, 85 and 90. Node 85 has two children 81 and 82. Node 90 has two children 84 and 88.
Please note that this tree is not strictly sorted like binary search tree.
You may have noticed that node 88, the child of 90, is greater than node 85 but it’s located at the leaf node.
Basically it is fine as long as the parent node is greater than its children.

As you can see heap condition is maintained in the above example.

  1. Node 98: 98 > 85 && 98 > 90
  2. Node 85: 85 > 81 && 85 > 82
  3. Node 90: 90 > 84 && 90 > 88

Why do we only care about parent and children relationship?
It’s because we just want to maintain everything based on priority!
What heap guarantees is that the root will be the largest value for max-heap and smallest for min-heap.

Where do you use heap for?

Now you might wonder why heap is useful when it only guarantees about the root element.
This is important because as you keep inserting elements the heap will always maintain the elements based on the priority.

Let’s go back to the very first example I gave in the beginning of the post – to dos for a day.
As you are finishing your job for the day there might be more jobs created (i.e. your family asked to pick up pizza)
The important thing is that the priority order needs to be maintained as new jobs are requested.

Therefore, heap is really the core of priority queue and priority queue works best if you want to process each element based on specified priority.

How do you maintain the heap?

The critical point of the heap is maintaining the order.
Maintenance is really simple as there are only two cases – insertion and pop of the element.

Insertion

  1. Insert the element to the leaf of the tree (I will explain which leaf in code example)
  2. Now, it is possible that heap order is broken so check and swap if the new element is greater/smaller than the parent.
  3. Recursively check until heap invariant is satisfied

This is fixing the heap upward.

Pop/Deletion

What does pop mean? You are dequeuing the root element from the heap.
After popping the root you also need to maintain the heap.

  1. Save the root element
  2. Place last element (leaf) to the root position
  3. Check and swap parent and children node until heap condition is satisfied

This is fixing the heap downward.

Array can be used to represent binary tree for the heap!

For efficient access you can use array to represent binary tree.

heap implemented based on array

Although I could use binary tree it will be less efficient since 2 children pointers are required which is less efficient since you can use the index of an array to find out the location of children.

Now, you might ask if array could be used for binary search tree.
That’s very unlikely since array is not as flexible as binary tree and it will be quite hard to maintain completely sorted order when there is constant insertion and deletion.

Code Example

This code example implemented max heap class.
I used array to represent binary tree.

#include <iostream>
#include <vector>
using namespace std;

class MaxHeap
{
    int lastIdx;
    int queueSize;
    vector<int> queue;
    
public:
    
    MaxHeap(int _queueSize) 
        : lastIdx(-1), 
          queueSize(_queueSize), 
          queue(_queueSize) {}
    
    bool insert(int val)
    {
        if (lastIdx >= queueSize)
        {
            return false;    
        }
        
        // insert the element to the last leaf node of queue 
        // and fix the heap upward
        ++lastIdx;
        queue[lastIdx] = val;
        
        fixHeapUpward(lastIdx);
        
        return true;
    }
    
    int max()
    {
        if (lastIdx == -1)
        {
            throw std::runtime_error("Extracting from empty queue not allowed");
        }
        
        return queue[0];
    }
    
    void pop()
    {
        if (lastIdx == -1)
        {
            return;
        }
        
        // get max value of the heap
        int result = queue[0];
        
        // now root is empty and get last element and fix heap downward
        queue[0] = queue[lastIdx--];
        fixHeapDownward(0);
        
        return result;
    }
    
    void print()
    {
        for (unsigned int i = 0; i < queue.size(); ++i)
        {
            cout << queue[i] << " ";
        }
        cout << endl;
    }
    
private:
    int findParentIdx(int idx)
    {
        if (!idx)
        {
            // this is root index
            return 0;
        }

        if (idx % 2)
        {
            // this must be left, 2 * h + 1
            return idx / 2;
        }
        else
        {
            // this must be right, 2 * h + 2
            return (idx - 2) / 2;
        }
    }
    
    void fixHeapUpward(int idx)
    {
        int parentIdx = findParentIdx(idx);
        if (queue[parentIdx] < queue[idx])
        {
            swap(queue[parentIdx], queue[idx]);
            fixHeapUpward(parentIdx);
        }
    }
    
    void fixHeapDownward(int idx)
    {
        int maxIdx = idx;
        for (int childIdx = idx * 2 + 1; 
             childIdx <= idx * 2 + 2 && childIdx <= lastIdx; 
             ++childIdx)
        {
            if (queue[childIdx] > queue[maxIdx])
            {
                maxIdx = childIdx;
            }    
        }
        
        if (maxIdx != idx)
        {
            swap(queue[maxIdx], queue[idx]);
            fixHeapDownward(maxIdx);
        }
    }
};

int main() 
{
    // heap size 10	
	MaxHeap heap(10);
	
    heap.insert(81);
    heap.insert(85);
    heap.insert(82);
    heap.insert(98);
    heap.insert(84);
    heap.insert(90);
    heap.insert(88);
	
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();

	return 0;
}

Performance

Insertion: O(log n)
Pop: O(log n)
Making heap: O(n log n)

Conclusion

We have taken a look at heap data structure.
Although it might be slightly hard to fully understand heap it is not really complicated data structure once you understand the purpose and how it fulfills it.
In fact, heap could be used to solve quite a bit of real-world problems pretty nicely.
Please let me know if you have any questions or suggestions and good luck!

Leave a Reply

Your email address will not be published.