Intro to Depth First Search (DFS)

Depth first search is one of the popular graph traversal algorithms that work well for certain purposes.
As you can infer from the name DFS is quite different from BFS – breadth first search.
Therefore the applications of DFS are different from BFS.
In this post, I am going to discuss how DFS works with examples.
Please refer to my other post of BFS if you are interested in.

How does DFS work?

I am going to start by explaining DFS by comparing it with BFS.
Please note that BFS searches all the neighbors at the same level or degree before you move to the next level.
You can imagine the water ripple starts from the center and move on to the next circle gradually.
BFS always visits all of its neighbor nodes first before any others.

DFS, on the other hand, uses a different algorithm to traverse.
DFS always visits available new nodes if there is any and therefore it quickly moves away from the starting point.
Basically, DFS always tries to move on to the first available neighbor node before processing the others.
And it winds back to the parent node (start node) after processing children nodes (neighbor nodes).

Let’s take a look DFS example.

Example

In this example, I am going to use the following notation to mark the nodes to avoid duplicate visits.

UNDISCOVERED – node is not visited in the past
DISCOVERED – node is visited but not completely processed
PROCESSED – node is processed (you can do any custom process of the node if necessary)

I am going to use DFS to traverse the below graph starting from node 1.

depth first search example

From node 1, found node 2 and move on to the node.
Node 1 is marked as DISCOVERED but not PROCESSED yet.

depth first search example

We are at node 2 at the moment. Look for neighbors of the node.
Node 2 has two neighbors – nodes 1 and 4.
And only node 4 is marked as UNDISCOVERED at the moment.
Move onto node 4 after marking node 2 DISCOVERED.

depth first search example

From node 4 we see only one unvisited neighbor – node 5.
Mark node 4 DISCOVERED and move onto node 5.

Node 5 doesn’t have any unvisited neighbors.
Process node 5 (mark as PROCESSED) and wind back to the parent node – node 4.

process node 4 (mark as PROCESSED) and wind back to the parent node – node 2.

process node 2 (mark as PROCESSED) and wind back to the parent node – node 1.

Now, we are at the start node – node 1 – but there is still unvisited neighbor node – node 3.
Move onto node 3 and process the node since node 3 doesn’t have any unvisited neighbors.

Wind back to the parent node and now you can mark node 1 PROCESSED.

Internal Data Structure for DFS

DFS uses stack which is LIFO (Last in First Out) and that’s why it always processes the newest found node first.
When implemented recursively you don’t actually need to use stack explicitly because recursive calls can behave like stack.

Where do you use DFS for?

As you see DFS actually tries to search nodes in depth.
Thanks to its nature, DFS is good for problems like finding cycle, topological sort and etc.
I will discuss finding cycle and topological sort in a different post.

Code Example

#include <iostream>
#include <queue>
#include <vector>
#include <algorithm>
using namespace std;

enum NodeStatus
{
    UNDISCOVERED,
    DISCOVERED,
    PROCESSED
};

struct Node
{
    // default status is UNDISCOVERED
    NodeStatus status;
    
    // node id which is index of vector in this case
    int nodeId;
    
    // parent node id
    int parentNodeId;
    
    // set of edges belong to this node
    // destination node id for this edge
    vector<int> edges;
};

vector<Node> initializeGraph()
{
    vector<Node> nodes(5);
    for (int i = 0; i < 5; ++i)
    {
        nodes[i].status = UNDISCOVERED;
        nodes[i].nodeId = i;
    }
    
    // edges for node 0
    nodes[0].edges.push_back(1);
    nodes[0].edges.push_back(2);
    
    // edges for node 1
    nodes[1].edges.push_back(0);
    nodes[1].edges.push_back(3);
    
    // edges for node 2
    nodes[2].edges.push_back(0);
    
    // edges for node 3
    nodes[3].edges.push_back(1);
    nodes[3].edges.push_back(4);
    
    // edges for node 4
    nodes[4].edges.push_back(3);
    
    return nodes;
}

void dfs(vector<Node> &graph, int nodeId)
{
    cout << "current node id[" << nodeId << "] is discovered" 
         << endl;
    graph[nodeId].status = DISCOVERED;
    
    for_each(
        graph[nodeId].edges.begin(),
        graph[nodeId].edges.end(),
        [&graph, nodeId](int destNodeId)
        {
            if (graph[destNodeId].status == UNDISCOVERED)
            {
                cout << "Found next node[" << destNodeId << "] to visit" 
                     << endl;
                graph[destNodeId].parentNodeId = nodeId;     
                dfs(graph, destNodeId);
            }
            else if (graph[destNodeId].status != PROCESSED && 
                     graph[destNodeId].parentNodeId != nodeId)
            {
                // node is discovered but not processed yet.
                // this likely indicates cycle because this is not
                // windback case
            }
            
        });
        
    graph[nodeId].status = PROCESSED;
}

int main() 
{	
	vector<Node> graph = initializeGraph();

    // 0 is start point
	dfs(graph, 0);
	return 0;
}