Minimum Spanning Tree – Kruskal’s Algorithm

This post explains what minimum spnning tree is and how to construct minimum spanning tree using Kruskal’s algorithm

What is minimum spanning tree

Given you have a list of nodes in a graph and weighted edges connecting the nodes. What is the minimum cost to connect the nodes that all the nodes are connected? For example, looking at the graph below, all the nodes are connected. However, some edges can be removed so that total weight of the graph can be minimized and still connecting all the nodes. Minimum spanning tree is a graph that all the nodes are connected with the minimum cost of edges.

There are some characteristics of minimum spanning tree.
1. Any subset of solution cannot have cycle. Cycle means there is a redundant connection that can be dropped
2. Need to span all nodes
3. connected

How to construct minimum spanning tree?

Given a list nodes and weighted edges, how to construct minimum spanning tree? In this section, we will use Kruskal’s algorithm – greedy approach. Here is the algorithm.

  1. Start with lists of nodes and edges
  2. At each step, add the cheapest edge to the tree
  3. Skip the edge if it creates the cycle

The algorithm itself is surprisingly simple and you might wonder if this actually results in a correct minimum spanning tree. Since greedy algorithm is typically wrong for many problems, it requires proof that the algorithm is correct. I will explain the proof later in the post.

Now, let’s look at the code. The key part of the algorithm is
1. Need to be able to select the cheapest edges, which requires the edges to be sorted based on the cost
2. Need to be able to detect if the given edge creates any cycle

#1 is fairly easy since you just need to sort the algorithm. But what about detecting a cycle? Would you you typical BFS or DFS for this? Running BFS or DFS every time you add the edge could be quite expensive as it will always result in quadratic time complexity. Union Find will be used in this algorithm since time complexity of the union find is practically O(n). Please refer to this post if you are unfamiliar with it. Pseudo code is used for simplicity

// n is a list of nodes from id 0 to n-1
n = [0,1,2,...n-1]

// edges is a list of edges. u is source, v is dest, w is weight
// assume any n log n sorting algorithm is used to sort the edges in ascending order of w
edges = [(u,v,w)...]

// parents are sizes are used for union find. Refer to the union find post above
parents = [0,1,2,...,n-1]
sizes = [1,1,1,...1]

total_cost = 0
num_components = n
mst = []

for u,v,w in edges:
    rootU = find(u)
    rootV = find(v)
    // if rootU and rootV are same, adding this edge will create a cycle
    if rootU != rootV:
        if sizes[rootU] < sizes[rootV]:
            parents[rootU] = rootV
            sizes[rootV] += sizes[rootU]
            parents[rootV] = rootU
            sizes[rootU] += sizes[rootV]

        // decrease every time connecting different components. This will check if all the nodes are connected
        // keeping track of the total cost
        total_cost += w
        // edges used in minimum spanning tree

// not all nodes are connected 
if num_components != 1:
    return -1
// found minimum spanning tree
print (total_cost, mst)

Following things are happening in the above code.

  1. Iterating all the edges which is sorted in ascending order of weights
  2. Using union find, check if adding the edge will create any cycle
  3. If creating a cycle, skip
  4. else add the edge to the graph
  5. If all the nodes are connected (checked by num_components), minimum spanning tree is found

Time complexity of the algorithm:
1. O(n log n) for sorting the edges. Assume the edges are sparse
2. O(n) running the edges using union find
Total Time complexity is O(n log n)

Proof of Kruskal’s algorithm

I will prove the algorithm using proof by contradiction.

Suppose I constructed minimum spanning tree using Kruskal’s algorithm (greedy approach) – I will call this K. And suppose there is another minimum spanning tree that has less total cost – I will call this B. Now, looking at all the edges used in both K and B, it will look like this
K1, K2, K3, K(n-1) edges for graph K, sorted in ascending order
B1, B2, B3, B(n-1) edges for graph B, sorted in ascending order

Since we are claiming graph B is better than graph K, there will be some i-th edge that is different between K and B. Let’s call that K(i) and B(i). It means all the edges until i-1 are the same. There are two cases possible.

Case 1. K(i) > B(i)
At K(i), let’s say the edge value is 12 and edge value of B(i) is 9. Is this actually possible? Up to this point, all the edges are same. Now given I add edge in increasing order, the fact K(i) edge value 12 and B(i) edge value 9 means I didn’t add edge 9 to K. What would be the case I didn’t add it? The only case I skip the edge is when it creates a cycle. If you added all the edges up to this point and add edge 9, it creates a cycle which means the edge is redundant. Therefore, this case is not possible.

Case 2. K(i) < B(i)
Now let’s say edge value of K(i) is 9 and B(i) is 12 instead. Given all the edges up to this point exactly matching, adding edge 9 to K is the best choice because it doesn’t create a cycle. Since the edges are sorted in ascending order of weights, there isn’t any way graph B will be better than graph K. So adding edge 9 to K is the best choice indeed.

As you see in both cases, Kruskal’s algorithm actually works. The implementation is simple yet works perfectly using union find.

Union Find

There are two methods in order to traverse a graph – BFS, DFS. Let’s say there is a graph problem to count the number of connected components. Here are the steps you can take.
1. Build the graph
2. Perform BFS or DFS
3. Outer loop to check each unvisited node
4. Number of calls to BFS or DFS is equivalent to the number of connected components because BFS or DFS will mark the connected component as visited

Time complexity would be O(V+E) where V is the node and E is the edge. It is linear complexity.

This is a simple problem with a simple solution: BFS or DFS works perfectly.

Now, let’s tweak the problem a little bit. Let’s say there is a stream of edges instead of the entire graph from the beginning. And you are asked to provide the number of connected components as the new edges are coming in. You have to perform BFS or DFS every time new edges come in because the new edge may or may not decrease the number of connected components. Let’s say there is an edge connecting node between u and v. Two possibilities of this edge is
1. if u, v belong to the same connected component, the number of connected components is unchanged
2. if u, v belong to the different connected components, the number of connected components will decrease by one because the component which u belongs will be connected to component which v belongs (or vice versa)

Time complexity of naive approach would be O(V+E) * O(E) = O(VE + E^2). It would be quadratic regardless this is dense or sparse graph.

The question we ask is can there be a better way to solve the problem?

Naive approach of union find

Let’s have an array to keep track of its component ids. The size of the array will be the size of the nodes each node will be assigned a component id it belongs. With the setting, when new edges come in, we look up the table and perform the following actions.
1. if they (two nodes in the edge) belong to the same component (have the same component id), then skip.
2. if they belong to different component, then update the component id.

Here is the psuedo code for the updating component id

componentid = array(size V), initialized to node id (node id will be component id)

for each edge(u,v):
    if componentid[u] != componentid[v]: // find operation
      cid = componentid[u]
      for each i in 0 to n-1: // n is number of nodes. line 6-8 is merge (or union) operation
          if componentid[i] == cid:
              componentid[i] = componentid[v]

For each edge, number of union operation is n-1. The total cost of union operation would be O(VE) which is O(V^2) assuming this is sparse graph. For each edge, number of find operation is E and the total cost of find operation would be O(E) since find operation itself is O(1). All together, the total cost would be O(V^2) + O(E) = O(V^2). The code is simpler yet the complexity is still the same because union operation is expensive. Is there any way to improve union operation?

Optimized union operation

If you think about it, we actually don’t need to iterate every single node in a component and update the id. Let’s say there is a root node in a component, then we only need to update the root node component id. Let’s take a look at the example below.

In the above graph, there will be an edge between u and v nodes. Now, when updating connected component, you only need to update component id of root node of u. Each find operation will look up the component id of root node then update. How do you find the root node of component? Just keep looking at parents where node id == parent[node id]. Find operation will not be O(1) but union operation will be O(1) now because you always update 1 node for the operation. Here is the optimized version of pseudo code.

for each edge(u,v) in edges:
    rootU = find(u)
    rootV = find(v)
    if rootU != rootV:
        componentid[rootV] = componentid[rootU]
function find(x):
    while componentid[x] != x:
        x = componentid[x]
    return x

Let’s look at the complexity of the operation.
The cost of each union operation is O(1) and there are O(E) edges. Total cost of union operation is O(E).
The cost of find operation is O(V) in worst case (where the tree is like a linked list, single chain) and there are O(E) edges. The total cost of find operation in worst case would be O(VE) which is O(V^2) assuming sparse graph.

We are heading to the correct direction but the time complexity didn’t improve which means there needs further optimization.

Union find with merging rules

If you blindly merge one tree to another, you might end up a single chain, linked list. The best case of tree structure would be a balanced tree where the height is O(log n). How can we achieve almost balanced tree to make find operation O(log n)?

Here is the claim of the rule:
When merging together two components, you will achieve O(log n) if you make the root of the smaller component point to the root of the larger component.

Let’s prove if the claim is correct. Let’s say there are two connected components (trees) – u and v

With the above setting, we need to prove: h <= log(N(u) + N(v)) where h is height of the new component (tree)
This will be a proof by induction.

Base case:
1. All nodes are disconnected: since all nodes are disconnected there is only one node per component. height of the component is 0 and it is true that 0 <= log 1 (only 1 node in the tree)
2. There is an edge between two nodes from step 1. In this case, h=1 and n=2. Claim is still true after first union operation.

After the above steps, there are two possible cases
1. two disconnected nodes are joined. Same as 1st step above
2. 2 nodes and 1 node join which becomes n=3, h=1 and 1 <= log3 according to the rule.

Assuming the claim is true after i union operations, we need to prove: h <= log n after i+1 union operation. Let’s note height of V tree as H(v) and height of U tree as H(u). There are two possible cases of the trees (u and v) where H(v) > H(u) or H(v) < H(u). With that said, the height of newly joined tree will be like this: h = max(H(v), H(u)+1). Note that number of nodes in V are greater than u here.

Case 1. H(v) > H(u). In this case, joining u tree to v doesn’t change the height and new height of the result tree is still H(v). We know that H(v) <= log N(v) (induction hypothesis) which also makes the following statement true H(v) <= log (N(v) + N(u)). New height h is H(v) so finally this guarantees h <= log(N(v) + N(u)).

Case 2. H(v) < H(u). h (new height) = H(u) + 1. based on induction hypothesis, we know H(u) <= log N(u) is true. Let’s add some tweak to the hypothesis: H(u) + 1 <= log N(u) + 1. H(u) + 1 is new height h. log N(u) + 1 would be log 2N(u). Then, it leads to this equality:
h <= log 2N(u). Given N(u) < N(v), this is also true: h <= log (N(u) + N(v)), which guarantees new height will be lower than log (N(u) + N(v)).

Based on the above proof, we know that the height will be always balanced if two trees are merged according to the rule.

With that said, the total complexity will be
(n-1) * cost of union op + E * cost of find op = (n-1) * O(1) + E * O(log n) = O (n log n)

Here is the pseudo code based on all the optimization

parents = a 1D array of size n
sizes = a 1D array of size n

for i in 0 to n-1:
    parents[i] = i
    sizes[i] = 1

for (u, v) in edges:
    rootU = find(u)
    rootV = find(v)
    if rootU != rootV:
        if sizes[rootU] < sizes[rootV]:
            parents[rootU] = rootV
            sizes[rootV] += sizes[rootU]
            parents[rootV] = rootU
            sizes[rootU] += sizes[rootV]
function find(x):
    while parents[x] != x:
        x = parents[x]
    return x

Time complexity is O(n log n), slightly slower than DFS or BFS but you don’t need to build graph for traversal. In addition to that, when graph changes dynamically, union find is better than DFS or BFS

Final optimization – Path Compression

There is one final optimization. You might have noticed that we are traversing to find the parent node every time we merge. However, what if we update the parent node id to the root node after traversing? It is slightly more work in the beginning but will save time for next root node search. In other words, we can compress the path (flattening the tree). The updated code will use recursion for simplicity.

function find(x):
    // base case
    if x == parents[x]:
        return x
    // recursive case
    rootX = find(parent[x])
    parent[x] = rootX
    return rootX

Time complexity of the find is still O(log n). However, with many traversals, this practically makes it O(1)


This is it for the union find. DFS or BFS is typically used for many graph problems. However, there are cases when DFS or BFS doesn’t work perfectly and Union Find can actually improve for the better performance with simple code. I hope this helped you understand union find better. Thank you for reading.

Dijkstra’s Algorithm Shortest Path

Dijkstra’s algorithm is the most popular algorithm to find the shortest paths from a certain vertex in a weighted graph.
In fact, the algorithm will find the shortest paths to every vertex from the start vertex.
The algorithm takes both greedy and dynamic programming approach that it will try to find local optimum paths first and use them to find the shortest paths.
Let’s take a look at how it works!

How does Dijkstra’s algorithm work?

Let’s say we need to find the shortest path from node A to node E.

dijkstra's algorithm shortest path

There are five nodes and each edge is weighted with the cost to traverse from a node to a node.
For example, it costs 5 to traverse from node A to B.

What would be the most efficient way to find out the shortest path?
In order to find the shortest path from node A to E, we need to find out what is the shortest path to nodes that are right before E – A, B, D.
Can you see where we are going?
In order to find out the shortest path to nodes B, D (I skipped A since it’s the start node), we need to find the shortest path to nodes right before – A, C.

Let’s take a look at each step.

dijkstra's algorithm shortest path

From node A, we can reach to nodes B, C, D, and E.
But let’s start with the nodes B and C first.
What is the shortest path from A to B? There is only one path and it costs 5.
What is the shortest path from A to C? There is only one path and it costs 1.
The nodes B and C are very simple but it’s very important because this will be stepping stone to find the shortest paths to further nodes.

dijkstra's algorithm shortest path

Now, based on the shortest paths to the node C, we can find out the shortest path from the node A to the node D.
There are two paths – A to D directly and A to C to D.
As we can see the shortest path from A to D is A -> C -> D with the total cost 3.

Finally, we have found all the shortest paths from A to B, C, and D which are the nodes right before E.
Based on the shortest paths we found so far, we just need to find which is the cheapest way to reach to E.

dijkstra's algorithm shortest path

There are a total of four possible routes from A to E.

  • A -> E : 10
  • A -> B -> E : 11
  • A -> D -> E : 9
  • A -> C -> D -> E : 8

The shortest path from A to E is A -> C -> D -> E with the cost 8.

The graph itself is pretty simple.
However, the steps we took to find the shortest path is like this.
It really finds the shortest paths to adjacent nodes of A then builds the path based on previous best paths each round.

  • Start from the start node A
  • Just like breadth-first search, find shortest paths to adjacent nodes which are B, C, D, and E.
  • Make sure you keep track of those records.
  • From nodes B, C, and D, find the shortest path to the node E.

Code Example

This code example has the implementation of Dijkstra’s algorithm which uses the graph in the above example.

#include <iostream>
#include <vector>
#include <limits>
#include <algorithm>

using namespace std;

struct Edge
    int destNodeId;
    int weight;
    Edge(int _destNodeId, int _weight) : 
        destNodeId(_destNodeId), weight(_weight)

struct Node
    // node id which is index of vector in this case
    int nodeId;
    // set of edges belong to this node
    vector<Edge> edges;

    // indicates if the node is processed - the shortest path found.    
    bool processed;
    Node(int _nodeId, vector<Edge> _edges) : 
        nodeId(_nodeId), edges(_edges), processed(false)

struct Graph
    // list of nodes in this graph
    vector<Node> nodes;
    // indicates the shortest paths up to this node so far.
    vector<int> shortestDistSoFar;
    // indicates parent node id for each node id, which is represented by index
    vector<int> parentNodes;

void initializeGraph(Graph &graph)
    // initialize node A
    vector<Edge> edges;
    edges.emplace_back(1, 5);
    edges.emplace_back(2, 1);
    edges.emplace_back(3, 4);
    edges.emplace_back(4, 10);
    graph.nodes.emplace_back(0, edges);
    // initialize node B
    edges.emplace_back(4, 6);
    graph.nodes.emplace_back(1, edges);
    // initialize node C
    edges.emplace_back(3, 2);
    graph.nodes.emplace_back(2, edges);
    // initialize node D
    edges.emplace_back(4, 5);
    graph.nodes.emplace_back(3, edges);
    // initialize node E
    graph.nodes.emplace_back(4, edges);
    graph.shortestDistSoFar.resize(5, numeric_limits<int>::max());
    graph.parentNodes.resize(5, -1);

vector<Node> dijkstra(Graph &graph, int startNodeId)
    int currNodeId = startNodeId;
    // distance from A to A is 0
    graph.shortestDistSoFar[currNodeId] = 0;
    // A doesn't have the parent node
    graph.parentNodes[currNodeId] = 0;

    while (!graph.nodes[currNodeId].processed)
        graph.nodes[currNodeId].processed = true;
        // take a look at adjacent nodes
            [&graph, currNodeId](const Edge &destNodeEdge)
                // an adjacent node from current node being processed
                int destNodeId = destNodeEdge.destNodeId;
                // weight of the edge
                int weight = destNodeEdge.weight;
                // if this is shorter than the record, update it
                if (graph.shortestDistSoFar[currNodeId] + weight < graph.shortestDistSoFar[destNodeId])
                    graph.parentNodes[destNodeId] = currNodeId;
                    graph.shortestDistSoFar[destNodeId] = graph.shortestDistSoFar[currNodeId] + weight;
        // find next node to process
        // need to process shortest distance first
        int minDistSoFar = numeric_limits<int>::max();
            [&currNodeId, &graph, &minDistSoFar](const Node &node)
                if (!node.processed && graph.shortestDistSoFar[node.nodeId] < minDistSoFar)
                    minDistSoFar = graph.shortestDistSoFar[node.nodeId];
                    currNodeId = node.nodeId;
    vector<Node> shortestPath;
    // need to find shortest path by recursively tracking parent node
    currNodeId = 4;
    while (graph.parentNodes[currNodeId] != currNodeId)
        currNodeId = graph.parentNodes[currNodeId];
    // make sure A is in the path
    reverse(shortestPath.begin(), shortestPath.end());
    return shortestPath;

int main()
    Graph graph;
    // dijkstra from A of which node id is 0
    vector<Node> shortestPath = dijkstra(graph, 0);
        [](const Node &node)
            switch (node.nodeId)
            case 0:
                cout << "A -> ";
            case 1:
                cout << "B -> ";
            case 2:
                cout << "C -> ";
            case 3:
                cout << "D -> ";
            case 4:
                cout << "E ";
    cout << endl;
    return 0;


Time complexity: O(n^2)
It’s because the algorithm needs to visit each node to find the shortest path.
(You can simply find out there is nested loop)


The algorithm works correctly as long as there aren’t any negative weight edges.
The reason is that the huge negative edge may change the shortest path record to certain edge which was already processed.
But I assume there won’t be many cases that graph will have negative edges.
Thank you for reading the post and please let me know if you have any questions/suggestions.

Intro to Topological Sort

Topological sort is an important algorithm in DAG – directed acyclic graphs.

It orders each vertex in line from left to right such that left most vertex could be considered as the highest level or root or entrance of the graph and rightmost as the lowest of the graph.

In the end, topological sort tells you the order of vertices you need to go through from one node to the other without missing any middle vertices.

There could be multiple topological sorts in the same graph.

Let’s take a look at an example for a better understanding

Course Prerequisite Example

topological sort

The course prerequisite is a great example of this kind of problem.

Let’s say there are 4 courses A, B, C, D and here are some prerequisites for each course.

  1. Course C requires A, B, D
  2. Course B requires A
  3. CourseD requires A, B
  4. A is a fundamental course you need to complete before any courses

With that in mind, in order to reach from A to C, we need to follow these steps.

  1. We see a path to node C directly from A but we also see B and D
  2. Visit B
  3. From B we see a path to C and D
  4. Visit D
  5. Now the order of node visit is A, B, D, C which is the correct order for course completion

However, the above steps are not complete yet for topological sort algorithm since some intuition plays a role there.
Here is the correct algorithm.

Topological Sort Algorithm

  1. Visit a node and mark it visited (but not processed yet)
  2. Search any nodes connected to the node from #1 and visit only if it’s not visited yet
  3. If you see any visited but not processed node then there is a cycle and you cannot find topological sort. (It will be an infinite loop!)
  4. Mark the node processed and push it into stack
  5. After all the nodes are processed pop each value from stack and pop order is the correct order for topological sort

Code Example

enum Status

struct Node
    Status      status;
    vector<int> destNodeIds;

// return false if there is a cycle
bool findOrderHelper(int currNodeId, unordered_map<int, Node> &nodes, vector<int> &order)
    auto it = nodes.find(currNodeId);
    it->second.status = DISCOVERED;

    bool canPublish = true;

        [&nodes, &canPublish, &order](int destNodeId)
            auto it = nodes.find(destNodeId);
            if (it->second.status == UNDISCOVERED)
                canPublish &= findOrderHelper(destNodeId, nodes, order);
            else if (it->second.status == DISCOVERED)
                canPublish = false;

    it->second.status = PROCESSED;
    return canPublish;

vector<int> findOrder(int numCourses, vector<vector<int>>& prerequisites) 
    vector<int> order;

    // nodes is a graph constructed from prerequisites vector
    unordered_map<int, Node> nodes;
        [&nodes](const vector<int> &prerequisite)
            auto it = nodes.insert(make_pair(prerequisite.back(), Node()));

            nodes.insert(make_pair(prerequisite.front(), Node()));

    // now iterate nodes (graph) to find topological sort
    bool canPublish = true;
        [&nodes, &order, &canPublish](const pair<int, Node> &nodeInfo)
            if (nodeInfo.second.status == UNDISCOVERED)
                canPublish &= findOrderHelper(nodeInfo.first, nodes, order);

    if (!canPublish)
    else if (order.size() != numCourses)
        // courses don't have many prerequisites
        for (int i = 0; i < numCourses; ++i)
            auto it = find(order.begin(), order.end(), i);
            if (it == order.end())

    reverse(order.begin(), order.end());

    return order;

Intro to Graphs

A graph is an abstract representation of a complex system such as networks, human relationships, and any organization.
A graph is very important because it can represent and simplify any complex relationships in a visual format.
It is pretty amazing that messy applied problems often could be described with simple and classical graph properties.

In this post, I am going to discuss some popular graphs and brief examples.

Graph Examples

A graph G can be represented as a set of vertices and edges which can be said as G = (V, E)
Now, let’s take a look at various graphs and their properties.

Undirected vs. Directed Graph

undirected vs directed graph

A graph G = (V, E) is undirected if an edge between two vertices is equally applicable.
In other words, if you can travel from one vertex to another vertex in both directions with one edge it’s an undirected graph.
If not, it is a directed graph.
In an undirected graph, you can visit A from B, C, D whereas there isn’t any vertex that can visit A in a directed graph.

Weighted vs. Unweighted Graph

weighted vs unweighted graph

If each edge in a graph is assigned with numerical value then it is considered as a weighted graph.
Each edge may have different weights and therefore the costs of visiting node may be different even if the number of edges are the same.
If each edge in a graph is not assigned with numerical value then it is an unweighted graph. (All edges are assumed to be of equal weight)

As you can see above graph, the shortest path from A to C in the unweighted graph is edge A, C.
However, the story is different in weighted graph.
Since it costs 10 to use direct path it is cheaper to visit B and then C which is 5 (4 + 1)

Cyclic vs. Acyclic Graph

An acyclic graph does not have any cycles.
There is a cycle in a graph if you can visit a vertex and come back to the same vertex while traversing.

In the above cyclic graph example you can see that starting from vertex A you can visit C, B, and can come back to A which is a cycle.
But there isn’t any cycle in the acyclic graph.

Code Example

Now, let’s discuss what kind of data structure we can use in order to represent a graph.
First, we need information about all the vertices and edges associated with each vertex.
In that sense, it would be reasonable to define a user-defined type for each edge.
And each vertex will know which edges are associated with it.

Here is a sample code representing data structure for a graph.
There could be many other ways as this is just one way to represent it.

struct Edge
    // destination vertex id for this edge
    // it could be any other data structure to represent destination
    int vertexId;
    // weight of the edge if application. 0 means unweighted.
    int weight;

struct Vertex
    // all the edges associated with this vertex
    vector<Edge> edges;    

struct Graph
    // here I use index of each vertex as vertex id for simplicity
    // but you can use your own way to represent it differently.
    vector<Vertex> vertices;
    // is this directed graph?
    bool directed;


There are many other types such as simple vs non-simple, sparse vs dense and etc which are not discussed here.
I just wanted to show very basic concepts of important graphs and their properties.