Intro to Union-Find Data Structure

Union-Find is a data structure that mimics tree data structure except that nodes point to their parent instead of children.

The union-find is very useful for set operations such as the following.
1. find – find the root of the current node
2. same component – check if two trees have the same root node. (belongs to the same set)
3. union (merge) – merge two trees (sets) into one

Such operations are very useful for problems like connected components – each node belongs to only one connected component – and vertex coloring.

The union-find is useful for algorithms such as minimum-spanning tree (Kruskal’s algorithm) – a minimum subset of edges forming a tree connecting all vertices.

How is the union-find structured?

The union-find data structure represents each subset as a reverse or backward tree that a node points to the parent instead of children.
In this example, I used an array to keep track of parent nodes efficiently and for simplicity.
But you could use hashmap to deal with a more complicated node structure.

Let’s take a look at the example below.

union find data structure example

There are two sets. (Two trees with two nodes)
The first set has 4 nodes – 2, 1, 0, 3 – where node 2 is the root node.
The other set has 1 node – 4.

As you can see from the above I have an array to keep track of the parent of each node.
For example, the parent node of node 0 is 1 and the parent node of node 1 is 2.
You can see that the value of indices 0 and 1 is the value of their parent nodes.

Algorithm

Let’s take a look at the algorithms for each union-find operation.

Find

Again, this operation checks the root node of the provided node.
For the find operation, you just need to recursively check the parent node until it reaches the root.
How do you know when it reaches the root?
In the above structure, you will see the index and the value are equal.
If you are using another data structure, which will likely be the one using a key and value, you just need to check if a key and value are the same.

Same Component

This operation checks if two nodes belong to the same tree, have the same root node.
This one is very easy once ‘Find’ operation is implemented as you just need to find the root node of both nodes and check if they are the same.

Merge

The operation merges two trees into one.
However, merging without much thought may cause an unbalanced tree and hurt overall performance.
How should we merge the tree?
You can always link the tree that has a lower height to the higher one.
For example, if you are to merge two trees in the above picture, there are only two cases.
You can merge root node 2 to 4 which will increase the height by 1.
Or you can merge root node 4 to 2 of which the height will remain the same.
Therefore you should always merge the lower tree to the higher one.

Code Example

#include <vector>

using namespace std;

class UnionFind
{
    int totalNodes;
    
    // indicates parent nodes
    vector<int> parents;
    
    // indicates number of elements in a subtree
    vector<int> numElts;
    
public:
    UnionFind(int _totalNodes) : totalNodes(_totalNodes), parents(totalNodes), numElts(totalNodes)
    {
        for (int i = 0; i < totalNodes; ++i)
        {
            parents[i] = i;
            numElts[i] = 1;
        }
    }
    
    int find(int nodeId)
    {
        if (parents[nodeId] == nodeId)
        {
            return nodeId;
        }
        else
        {
            return find(parents[nodeId]);
        }
    }
    
    bool sameComponent(int node1, int node2)
    {
        return find(node1) == find(node2);
    }
    
    void merge(int node1, int node2)
    {
        int root1 = find(node1);
        int root2 = find(node2);
        
        // already in the same component. noop
        if (root1 == root2)
        {
            return;
        }
        
        
        if (numElts[root1] >= numElts[root2])
        {
            parent[root2] = root1;
            numElts[root1] += numElts[root2];
        }
        else
        {
            parent[root1] = root2;
            numElts[root2] += numElts[root1];
        }
    }
};


Performance

Find – O(log n)
It just needs to climb the tree until it reaches the root

Same Component – O(log n)
It only needs to complete two find operations.

Merge – O(log n)
It only needs to complete two find operations.

Conclusion

The union-find is a very powerful data structure that handles set operations in a really fast time.
And the data structure itself is very simple too!
The data structure is used for many algorithms such as minimum spanning tree, Kruskal’s algorithm.

Leave a Reply

Your email address will not be published.