Intro to Depth First Search (DFS)

Depth first search is one of the popular graph traversal algorithms that work well for certain purposes.
As you can infer from the name DFS is quite different from BFS – breadth first search.
Therefore the applications of DFS are different from BFS.
In this post, I am going to discuss how DFS works with examples.
Please refer to my other post of BFS if you are interested in.

How does DFS work?

I am going to start by explaining DFS by comparing it with BFS.
Please note that BFS searches all the neighbors at the same level or degree before you move to the next level.
You can imagine the water ripple starts from the center and move on to the next circle gradually.
BFS always visits all of its neighbor nodes first before any others.

DFS, on the other hand, uses a different algorithm to traverse.
DFS always visits available new nodes if there is any and therefore it quickly moves away from the starting point.
Basically, DFS always tries to move on to the first available neighbor node before processing the others.
And it winds back to the parent node (start node) after processing children nodes (neighbor nodes).

Let’s take a look DFS example.

Example

In this example, I am going to use the following notation to mark the nodes to avoid duplicate visits.

UNDISCOVERED – node is not visited in the past
DISCOVERED – node is visited but not completely processed
PROCESSED – node is processed (you can do any custom process of the node if necessary)

I am going to use DFS to traverse the below graph starting from node 1.

depth first search example

From node 1, found node 2 and move on to the node.
Node 1 is marked as DISCOVERED but not PROCESSED yet.

depth first search example

We are at node 2 at the moment. Look for neighbors of the node.
Node 2 has two neighbors – nodes 1 and 4.
And only node 4 is marked as UNDISCOVERED at the moment.
Move onto node 4 after marking node 2 DISCOVERED.

depth first search example

From node 4 we see only one unvisited neighbor – node 5.
Mark node 4 DISCOVERED and move onto node 5.

Node 5 doesn’t have any unvisited neighbors.
Process node 5 (mark as PROCESSED) and wind back to the parent node – node 4.

process node 4 (mark as PROCESSED) and wind back to the parent node – node 2.

process node 2 (mark as PROCESSED) and wind back to the parent node – node 1.

Now, we are at the start node – node 1 – but there is still unvisited neighbor node – node 3.
Move onto node 3 and process the node since node 3 doesn’t have any unvisited neighbors.

Wind back to the parent node and now you can mark node 1 PROCESSED.

Internal Data Structure for DFS

DFS uses stack which is LIFO (Last in First Out) and that’s why it always processes the newest found node first.
When implemented recursively you don’t actually need to use stack explicitly because recursive calls can behave like stack.

Where do you use DFS for?

As you see DFS actually tries to search nodes in depth.
Thanks to its nature, DFS is good for problems like finding cycle, topological sort and etc.
I will discuss finding cycle and topological sort in a different post.

Code Example

#include <iostream>
#include <queue>
#include <vector>
#include <algorithm>
using namespace std;

enum NodeStatus
{
    UNDISCOVERED,
    DISCOVERED,
    PROCESSED
};

struct Node
{
    // default status is UNDISCOVERED
    NodeStatus status;
    
    // node id which is index of vector in this case
    int nodeId;
    
    // parent node id
    int parentNodeId;
    
    // set of edges belong to this node
    // destination node id for this edge
    vector<int> edges;
};

vector<Node> initializeGraph()
{
    vector<Node> nodes(5);
    for (int i = 0; i < 5; ++i)
    {
        nodes[i].status = UNDISCOVERED;
        nodes[i].nodeId = i;
    }
    
    // edges for node 0
    nodes[0].edges.push_back(1);
    nodes[0].edges.push_back(2);
    
    // edges for node 1
    nodes[1].edges.push_back(0);
    nodes[1].edges.push_back(3);
    
    // edges for node 2
    nodes[2].edges.push_back(0);
    
    // edges for node 3
    nodes[3].edges.push_back(1);
    nodes[3].edges.push_back(4);
    
    // edges for node 4
    nodes[4].edges.push_back(3);
    
    return nodes;
}

void dfs(vector<Node> &graph, int nodeId)
{
    cout << "current node id[" << nodeId << "] is discovered" 
         << endl;
    graph[nodeId].status = DISCOVERED;
    
    for_each(
        graph[nodeId].edges.begin(),
        graph[nodeId].edges.end(),
        [&graph, nodeId](int destNodeId)
        {
            if (graph[destNodeId].status == UNDISCOVERED)
            {
                cout << "Found next node[" << destNodeId << "] to visit" 
                     << endl;
                graph[destNodeId].parentNodeId = nodeId;     
                dfs(graph, destNodeId);
            }
            else if (graph[destNodeId].status != PROCESSED && 
                     graph[destNodeId].parentNodeId != nodeId)
            {
                // node is discovered but not processed yet.
                // this likely indicates cycle because this is not
                // windback case
            }
            
        });
        
    graph[nodeId].status = PROCESSED;
}

int main() 
{	
	vector<Node> graph = initializeGraph();

    // 0 is start point
	dfs(graph, 0);
	return 0;
}

Intro to Breadth First Search (BFS)

When dealing with a graph it might be necessary to traverse the graph.
Graph traversal is a systemic method to traverse the graph.
Breadth first search is one of the most popular graph traversal algorithms that work very well for certain purposes.
Today, I am going to introduce breadth first search in this post.

How does breadth first search traverse?

BFS is a graph traversal algorithm to visit vertices from one starting point in a certain way.
How does it visit other nodes? It first visits its nearest neighbors and processes them before reaching any further nodes.
And it marks nodes as it visits so the same node won’t be visited more than once.
How it marks really depends on each implementation but I will use the following notation in this post.

UNDISCOVERED – node is not visited in the past
DISCOVERED – node is visited but not completely processed
PROCESSED – node is processed (you can do any custom process of the node if necessary)

BFS Example

Now, let’s take a look at the below example.
Let’s say we have a graph to traverse from node 1 using BFS.

Here are the algorithm steps for BFS

  1. Start from node 1 (or any node you would like to start), look at neighbors of node 1 and save the node 2 and 3 (you typically use a queue for this purpose. Enqueue for save and dequeue for pop) for next processing. Mark node 1 processed, node 2 and 3 as discovered so they are not enqueued more than once.
  2. Dequeue node 2 and enqueue neighbors of it which are node 1 and 4 but only node 4 will be enqueued since node 1 is already processed. Mark node 2 processed and node 4 discovered.
  3. Dequeue node 3 and enqueue neighbors which are node 1 and node 4 but nothing will be enqueued since node 1 is already processed and node 4 is discovered – already enqueued. Mark node 3 processed
  4. Dequeue node 4 and enqueue neighbors which are the node 2, 3 and 5 but only node 5 will be enqueued since 2 and 3 are already processed. Mark node 4 processed
  5. Dequeue node 5 and mark it processed (nothing to enqueue – node 4 is already processed)

Since it might be hard to fully understand with the above algorithm let’s take a look at each step.

Starting from node 1 take a look at unvisited neighbors.
By the way, an unvisited node means its status is UNDISCOVERED.
Enqueue unvisited neighbor nodes and mark them DISCOVERED.
Now node 1 should be marked as PROCESSED.

breadth first search example

Now dequeue node 2 (it could be node 3 depending on the enqueue order but it is node 2 here because I enqueued it first) and check unvisited neighbors of the node.
Enqueue node 4 and mark it DISCOVERED.
Although node 1 is the neighbor of node 2 we don’t enqueue it because it’s already marked PROCESSED.
Node 2 should be marked as PROCESSED.

breadth first search example

Dequeue node 3 and look for any unvisited neighbor node.
It looks like node 3 has two neighbors – nodes 1 and 4.
However, we don’t enqueue any node because node 1 is marked PROCESSED and node 4 is marked DISCOVERED.
Mark node 3 as PROCESSED.

breadth first search example

Dequeue node 4 and look for any unvisited neighbor node which is node 5 only.
Enqueue node 5 and mark it DISCOVERED.
Node 4 is also processed.

breadth first search example

Dequeue node 5 and there isn’t any unvisited or processed nodes at the moment.
Mark node 5 as PROCESSED and breadth first search is complete.

breadth first search example

Internal Data Structure for BFS

BFS uses queue which is FIFO (First in First Out). As you can already imagine it processes all of the neighbor nodes of the current node before moving on.
Therefore it radiates out from the starting node slowly.

Where do you use BFS for?

Now you might wonder where breadth first search algorithm is used for?
There are many BFS applications but I will just list two common ones here.

  1. Shortest path of an unweighted graph. Due to its nature BFS guarantees to find the shortest path of the unweighted graph. All you need to do is keep track of the parent node.
  2. The number of connected components in a graph. All the nodes may be connected for some graphs but there may be a graph that may have multiple components that are not connected. BFS could be very useful to find the number of connected components in a graph

Code Example

Here I provided a code example, finding the shortest path from the node.

#include <iostream>
#include <queue>
#include <vector>
#include <algorithm>
using namespace std;

enum NodeStatus
{
    UNDISCOVERED,
    DISCOVERED,
    PROCESSED
};

struct Node
{
    // default status is UNDISCOVERED
    NodeStatus status;
    
    // node id which is index of vector in this case
    int nodeId;
    
    // parent node id
    int parentNodeId;
    
    // set of edges belong to this node
    // destination node id for this edge
    vector<int> edges;
};

// this is really for testing purpose.
// I initialized graph as the above example.
vector<Node> initializeGraph()
{
    vector<Node> nodes(5);
    for (int i = 0; i < 5; ++i)
    {
        nodes[i].status = UNDISCOVERED;
        nodes[i].nodeId = i;
    }
    
    // edges for node 0
    nodes[0].edges.push_back(1);
    nodes[0].edges.push_back(2);
    
    // edges for node 1
    nodes[1].edges.push_back(0);
    nodes[1].edges.push_back(3);
    
    // edges for node 2
    nodes[2].edges.push_back(0);
    nodes[2].edges.push_back(3);
    
    // edges for node 3
    nodes[3].edges.push_back(1);
    nodes[3].edges.push_back(2);
    nodes[3].edges.push_back(4);
    
    // edges for node 4
    nodes[4].edges.push_back(3);
    
    return nodes;
}

// print each node as traversing
void bfs(vector<Node> &graph)
{
    queue<int> q;
    q.push(graph[0].nodeId);
    
    while (!q.empty())
    {
        int currentNodeId = q.front();
        q.pop();
        
        // mark it processed first to prevent enqueuing itself
        // for self-loop case
        graph[currentNodeId].status = PROCESSED;
        
        for_each(
            graph[currentNodeId].edges.begin(), 
            graph[currentNodeId].edges.end(),
            [&graph, &q, currentNodeId](int destNodeId)
            {
                if (graph[destNodeId].status == UNDISCOVERED)
                {
                    q.push(destNodeId);
                    graph[destNodeId].status = DISCOVERED;
                    graph[destNodeId].parentNodeId = currentNodeId;
                }
                
                // you can do any edge processing you want here
            });
            
        //cout << "processed node[" << currentNodeId << "]" << endl;
    }
}

void printShortestPath(int currentNodeId, const vector<Node> &graph)
{
    if (currentNodeId == 0)
    {
        cout << "root node[" << currentNodeId << "]" << endl;
    }
    else
    {
        cout << "node[" << currentNodeId << "]" << endl;
        printShortestPath(graph[currentNodeId].parentNodeId, graph);
    }
}

int main() 
{	
	vector<Node> graph = initializeGraph();

	bfs(graph);

	// print shortest path from node 0 -> node 4
	printShortestPath(graph.size() - 1, graph);
	return 0;
}

Conclusion

Breadth first search is a popular algorithm that is very good for certain applications such as finding the shortest path of an unweighted graph.
There is another graph traversal algorithm called depth first search which I will discuss in another post.