Intro to Heap data structure

Heap is one of the most efficient data structure that can handle data for certain purposes.
To have a better understanding of heap you can imagine a priority queue.

For example, let’s say there are multiple jobs that you need to do for today. They could be chores, work, and any activities.
However, not all of them have the same priority and you probably want to finish jobs in order of importance.

A priority queue is a queue that takes tasks and handles them in order of specified priority.
Therefore, unlike a normal queue which is FIFO – first in first out, a priority queue always guarantees to provide an element in the order of priority.
The order could be the minimum or maximum of the priority depending on the user’s choice.

Why do I talk about the priority queue in the heap data structure post?
It’s because the heap is the most crucial part of priority queue implementation.
The heap data structure actually maintains elements based on the order specified.
Let’s take a look at how the heap is structured and how it handles data.

How is heap structured?

Typically many efficient data structures maintain elements in sorted order to access data fast.
Binary search is a good example that requires all the elements to be sorted.
Binary search tree also guarantees all the tree nodes are placed in sorted order.

However, although heap uses binary tree it is not as restrictive as the binary search tree.
The only requirement of heap is that all the children of the tree node need to be smaller than the parent. (max heap)
If a user wants min-heap then all the children of the tree node need to be bigger than the parent.
Max heap guarantees that the root of the tree will always be the largest element of all the tree nodes.
Min heap guarantees that the root of the tree will always be the smallest element of all the tree nodes.

Let’s take a look at the picture of max heap as a picture is worth a thousand words.

heap data structure

As you see the above, 98 is the largest value of all the nodes.
Then node 98 has two children, 85 and 90. Node 85 has two children 81 and 82. Node 90 has two children 84 and 88.
Please note that this tree is not strictly sorted like binary search tree.
You may have noticed that node 88, the child of 90, is greater than node 85 but it’s located at the leaf node.
Basically it is fine as long as the parent node is greater than its children.

As you can see heap condition is maintained in the above example.

  1. Node 98: 98 > 85 && 98 > 90
  2. Node 85: 85 > 81 && 85 > 82
  3. Node 90: 90 > 84 && 90 > 88

Why do we only care about parent and children relationship?
It’s because we just want to maintain everything based on priority!
What heap guarantees is that the root will be the largest value for max-heap and smallest for min-heap.

Where do you use heap for?

Now you might wonder why heap is useful when it only guarantees about the root element.
This is important because as you keep inserting elements the heap will always maintain the elements based on the priority.

Let’s go back to the very first example I gave in the beginning of the post – to dos for a day.
As you are finishing your job for the day there might be more jobs created (i.e. your family asked to pick up pizza)
The important thing is that the priority order needs to be maintained as new jobs are requested.

Therefore, heap is really the core of priority queue and priority queue works best if you want to process each element based on specified priority.

How do you maintain the heap?

The critical point of the heap is maintaining the order.
Maintenance is really simple as there are only two cases – insertion and pop of the element.

Insertion

  1. Insert the element to the leaf of the tree (I will explain which leaf in code example)
  2. Now, it is possible that heap order is broken so check and swap if the new element is greater/smaller than the parent.
  3. Recursively check until heap invariant is satisfied

This is fixing the heap upward.

Pop/Deletion

What does pop mean? You are dequeuing the root element from the heap.
After popping the root you also need to maintain the heap.

  1. Save the root element
  2. Place last element (leaf) to the root position
  3. Check and swap parent and children node until heap condition is satisfied

This is fixing the heap downward.

Array can be used to represent binary tree for the heap!

For efficient access you can use array to represent binary tree.

heap implemented based on array

Although I could use binary tree it will be less efficient since 2 children pointers are required which is less efficient since you can use the index of an array to find out the location of children.

Now, you might ask if array could be used for binary search tree.
That’s very unlikely since array is not as flexible as binary tree and it will be quite hard to maintain completely sorted order when there is constant insertion and deletion.

Code Example

This code example implemented max heap class.
I used array to represent binary tree.

#include <iostream>
#include <vector>
using namespace std;

class MaxHeap
{
    int lastIdx;
    int queueSize;
    vector<int> queue;
    
public:
    
    MaxHeap(int _queueSize) 
        : lastIdx(-1), 
          queueSize(_queueSize), 
          queue(_queueSize) {}
    
    bool insert(int val)
    {
        if (lastIdx >= queueSize)
        {
            return false;    
        }
        
        // insert the element to the last leaf node of queue 
        // and fix the heap upward
        ++lastIdx;
        queue[lastIdx] = val;
        
        fixHeapUpward(lastIdx);
        
        return true;
    }
    
    int max()
    {
        if (lastIdx == -1)
        {
            throw std::runtime_error("Extracting from empty queue not allowed");
        }
        
        return queue[0];
    }
    
    void pop()
    {
        if (lastIdx == -1)
        {
            return;
        }
        
        // get max value of the heap
        int result = queue[0];
        
        // now root is empty and get last element and fix heap downward
        queue[0] = queue[lastIdx--];
        fixHeapDownward(0);
        
        return result;
    }
    
    void print()
    {
        for (unsigned int i = 0; i < queue.size(); ++i)
        {
            cout << queue[i] << " ";
        }
        cout << endl;
    }
    
private:
    int findParentIdx(int idx)
    {
        if (!idx)
        {
            // this is root index
            return 0;
        }

        if (idx % 2)
        {
            // this must be left, 2 * h + 1
            return idx / 2;
        }
        else
        {
            // this must be right, 2 * h + 2
            return (idx - 2) / 2;
        }
    }
    
    void fixHeapUpward(int idx)
    {
        int parentIdx = findParentIdx(idx);
        if (queue[parentIdx] < queue[idx])
        {
            swap(queue[parentIdx], queue[idx]);
            fixHeapUpward(parentIdx);
        }
    }
    
    void fixHeapDownward(int idx)
    {
        int maxIdx = idx;
        for (int childIdx = idx * 2 + 1; 
             childIdx <= idx * 2 + 2 && childIdx <= lastIdx; 
             ++childIdx)
        {
            if (queue[childIdx] > queue[maxIdx])
            {
                maxIdx = childIdx;
            }    
        }
        
        if (maxIdx != idx)
        {
            swap(queue[maxIdx], queue[idx]);
            fixHeapDownward(maxIdx);
        }
    }
};

int main() 
{
    // heap size 10	
	MaxHeap heap(10);
	
    heap.insert(81);
    heap.insert(85);
    heap.insert(82);
    heap.insert(98);
    heap.insert(84);
    heap.insert(90);
    heap.insert(88);
	
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();
	cout << heap.max() << endl;
	heap.pop();

	return 0;
}

Performance

Insertion: O(log n)
Pop: O(log n)
Making heap: O(n log n)

Conclusion

We have taken a look at heap data structure.
Although it might be slightly hard to fully understand heap it is not really complicated data structure once you understand the purpose and how it fulfills it.
In fact, heap could be used to solve quite a bit of real-world problems pretty nicely.
Please let me know if you have any questions or suggestions and good luck!

Intro to Linked List data structure

Today, we will go over a popular data structure – linked list.
Linked list may not make much sense for some languages such as python and javascript since there is already a good one such as list.
However, since linked list is quite useful for languages like C++ it is still worthwhile to go over.
In this post I will go over some important characteristics about linked list and strengths and weaknesses.
Please note that all the explanations will be based on C++.

What is Linked List?

Linked list is a data structure that each node is connected like a chain via pointer.
It is one of fundamental data structure in C++ and widely used in C++ standard library (STL).

I believe a picture is worth a thousand words and let’s take a look at how a linked list looks like before any explanations.

As you see above there are three nodes in the linked list.
Each node is based on Node struct as below which contains a value and a pointer to the next Node.

struct Node
{
    // value that this Node contains. 
    // It could be any data type you want to contain
    int   val;
    
    // pointer to next node
    Node *next;
    
    // pointer to previous node.
    // this is optional
    // Node *prev;
};

The example code has only one pointer ‘next’ but you can also have another Node * to point to the previous Node.
If you have only one pointer (likely it’s a pointer to the next Node) it’s called a singly linked list.
If you have two pointers one pointing to the next and the other to the previous then it’s called a doubly linked list.

Linked List Operations – insert, search, delete

As you can imagine there are three basic operations for a linked list.
You need to be able to search a node in the list, insert and delete.
Let’s go over each operation.

Search

How should the search be?
Given you have a value to search you need to start from the head of the list and check if each node in a chain is the same as the search value.
Return a pointer to the node if you find it or null pointer otherwise.
Let’s take a look at code example for search.

/**
 * @brief  search a node based on search value
 * 
 * @param  head       head of linked list
 * @param  searchVal  value to search in the list
 * 
 * @return pointer to found node. null otherwise
 */
Node *searchNode(Node *head, int searchVal)
{
    // end of the list. value not found
    if (!head)
    {
        return 0;
    }

    if (head->val == serachVal)
    {
        return head;
    }
    else
    {
        return searchNode(head->next, searchVal);
    }
}

I implemented the operation recursively so I don’t need to write a loop. (I thought the recursive solution was easier than loop but it’s your choice)
I actually also used loop based implementation for insertion operation for your reference.

Insertion

Inserting a node could be slightly trickier depending on how you want to maintain the list.
There are three possible cases for insertion.

1. Insert the node at the head of the list
This one is the easiest as you just need to create a node and insert the node.
Here is the code example for head insertion.

/**
 * @brief  node insertion at the beginning of the list
 * 
 * @param  head    double pointer to head since you need to update 'head' after insertion
 * @param  newVal  new value to insertNode
 */
void insertNode(Node **head, int newVal)
{
    Node *newNode = new Node;
    newNode->val = newVal;
    newNode->next = *head;

    // update head
    *head = newNode;
}

2. Insert the node at the tail of the list
This one is slightly harder than head insertion but still pretty simple.
What you need to do is to traverse to the end of the list and insert the node there.

/**
 * @brief  node insertion at the end of the list
 * 
 * @param  head    double pointer to head since you still might need to update 'head'
 *                 if the list is empty
 * @param  newVal  new value to insert
 */
void insertNode(Node **head, int newVal)
{
    Node *newNode = new Node;
    newNode->val = newVal;
    newNode->next = 0;

    Node *tailNode = *head;
    while (tailNode && tailNode->next)
    {
        tailNode = tailNode->next;
    }

    if (!tailNode)
    {
        *head = newNode;
    }
    else
    {
        tailNode->next = newNode;
    }
}

3. Insert the node based on the sorted order of the list.
Imagine you want to maintain the list in sorted order (decreasing or increasing).
Then you will need to find a proper location to insert.

For example, let’s say you have a list like below and you would like to insert value 4.
1-> 3-> 5

Then you need to traverse the list and find a node that is not less than 4 which is 5, the last one, and create a node and insert.
However, there is a couple of other edge cases such as the inserting position happens to be at the head or tail of the list.
Although this one is not terribly difficult it does require little more thoughts than the others.

void insertNode(Node **head, int newVal)
{
    // create new Node
    Node *newNode = new Node;
    newNode->val = newVal;
    newNode->next = 0;
    
    // if empty list then just insert it here
    if (!*head)
    {
        *head = newNode;
        return;
    }
    // check if it needs to be inserted at head.
    else if (newNode->val < (*head)->val)
    {
        newNode->next = *head;
        *head = newNode;
        return;
    }
    
    Node *iter = *head;
    
    // loop until you find first Node that is not less than new value
    while (iter && iter->next)
    {
        if (iter->next->val >= newNode->val)
        {
            break;
        }
        
        iter = iter->next;
    }
    
    if (iter->next)
    {
        newNode->next = iter->next;
    }
    
    iter->next = newNode;
}

The tricky part of insertion is that you have to make sure you update the next pointers properly.
1. If insertion is happening at the head, make sure head pointer is updated and the next pointer of new node point to last head
2. If insertion is happening at the tail, make sure the last node of the list is pointing to the new node properly
3. If insertion is happening in the middle of the list, make sure the previous node’s next points to new node and new node points to the proper next one.

deletion

The deletion of a node is a very similar process as insertion.
First, you need to find which node to delete if there is any and properly updates the next pointers.
Just like insertion deleting node could be the head, the tail or in the middle of the list.
Let’s take a look at the code example.

/**
 * @brief  delete a node from the list if there is one
 *         for simplicity I don't delete all the duplicate values here
 * 
 * @param  head       double pointer to head in case you have to delete head node
 * @param  deleteVal  value to delete
 */
void deleteNode(Node **head, int deleteVal)
{
    Node *iter = *head;
    
    // nothing to delete
    if (!*head)
    {
        return;
    }
    // deleting head. need to update head
    else if ((*head)->val == deleteVal)
    {
        *head = (*head)->next;
        delete iter;
        return;
    }

    while (iter && iter->next)
    {
        if (iter->next->val == deleteVal)
        {
            break;
        }

        iter = iter->next;
    }

    if (iter->next)
    {
        Node *delNode = iter->next;
        iter->next = iter->next->next;
        delete delNode;
    }
}

Pros

  • Unlike an array size of a linked list is very flexible as long as the memory permits
  • Insertion and deletion are much simpler than an array. For an array insertion/deletion you have to move many other elements after the operation
  • It is a perfect data structure to implement more complex data structures like Stack and Queue since you only need to maintain head(or tail) of the list for insertion, search, deletion

Cons

  • It requires extra memory (next, prev pointers)
  • It does not allow random access like an array and therefore search could be slower than an array

Performance

Search – O(n)
Insertion – O(1) if always insert at the head (for cases like Stack)
O(n) otherwise
Deletion – O(1) if always delete head (for cases like Stack)
O(n) otherwise

Conclusion

We have taken a look at some basics of linked list.
Linked list is a great data structure for some special purposes due to its flexibility of insertion and deletion.
However, there are some weaknesses about it so it requires some discretion to wisely use it.

Thank you for reading my post and please let me know if you have any questions or suggestions and good luck!