Understanding DFS & Kosaraju's Algorithm

🗺️ Chapter 1: What Is a Directed Graph?

🌍 Real-Life Analogy

Imagine a city where some streets are one-way. You can drive from A to B, but that doesn't mean you can drive from B back to A. A directed graph is like a map of one-way streets.

A directed graph (or digraph) has two things:

Vertices (also called nodes) — like intersections in the city.
Directed edges (also called arcs) — like one-way streets. Each edge has a start and an end. We write an edge from u to v as (u, v), and we draw it as an arrow: u → v.

A → B, A → C, B → D, C → D

Two important terms we'll need:

Neighbors of u (or adjacency list of u): all vertices v such that there is an edge u → v. We call these the out-neighbors.
Path from u to v: a sequence of edges that takes you from u to v, following the arrow directions.

🔦 Chapter 2: DFS — Exploring a Maze

🌍 Real-Life Analogy

Imagine you're exploring a dark cave with a flashlight and a ball of string. Your strategy:

Walk into a tunnel you haven't explored yet. Unroll string behind you.
Keep going deeper into new tunnels whenever you can.
When you hit a dead end (or only see tunnels you've already visited), backtrack along your string to the last fork, and try another tunnel.
Repeat until you've explored everything reachable.

That's DFS: go as deep as possible first, then backtrack.

The DFS Procedure (Pseudocode)

DFS has two parts: a main loop and a recursive visit function.

// Global clock — starts at 0
time = 0
DFS(Graph G):
  for each vertex u in G:
    color[u] = WHITE
  for each vertex u in G:
    if color[u] == WHITE:
      DFS-Visit(u)
DFS-Visit(u):
  time = time + 1
  d[u] = time // discovery time
  color[u] = GRAY // "I'm working on u"
  for each neighbor v of u:
    if color[v] == WHITE:
      DFS-Visit(v) // go deeper!
  time = time + 1
  f[u] = time // finish time
  color[u] = BLACK // "I'm completely done with u"

Let's break down what happens when we call DFS-Visit(u):

Tick the clock, record the discovery time d[u].
Paint u GRAY — this means "I've started exploring u, but I'm not done yet."
Look at each neighbor v of u. If v is WHITE (unexplored), dive into it by calling DFS-Visit(v). This is the recursion — we go deeper before finishing u.
Once ALL of u's neighbors have been handled (either they were already visited or we recursed and came back), tick the clock again and record the finish time f[u].
Paint u BLACK — this means "I'm completely finished with u and all its descendants."

🎨 Chapter 3: The Three Colors

Every vertex is in exactly one of three states at any moment during DFS:

WHITE — "Haven't visited yet"

GRAY — "Currently exploring (on the stack)"

BLACK — "Completely finished"

🌍 Real-Life Analogy

Think of coloring rooms in a house as you explore:

WHITE room: You've never entered it. The door is closed.
GRAY room: You walked in, you're currently in there (or you left to explore a room deeper inside, but you haven't come back to close the door yet). Your string is still trailing through it.
BLACK room: You've explored everything accessible from that room and walked back out. Door is closed and locked. Done.

🔑 Key Fact: The Gray Path

At any moment during DFS, the GRAY vertices form a path from the starting vertex down to the vertex currently being explored. This is exactly the recursion stack — the trail of rooms you've entered but haven't finished yet.

If you ever encounter a GRAY vertex while exploring, you've found a cycle (a back edge), because you've walked in a circle back to a room you're still inside.

⏱️ Chapter 4: Discovery Time & Finish Time

We keep a global clock (a counter that starts at 0 and goes up by 1 each tick). We tick the clock at two moments for each vertex:

📖 Discovery Time d[u]

The clock value when we first visit u (paint it GRAY). This is when we "open the door" to room u.

📖 Finish Time f[u]

The clock value when we finish u (paint it BLACK). This is when we've explored everything reachable from u and "lock the door" behind us.

Since the clock ticks once for each discovery and once for each finish, and there are n vertices, the clock goes from 0 up to 2n.

🔑 The Parenthesis Theorem

For any two vertices u and v, their discovery/finish intervals [d[u], f[u]] and [d[v], f[v]] are either:

Completely nested: one interval is entirely inside the other, OR
Completely separate: they don't overlap at all.

They never partially overlap. Think of it like matching parentheses: ( ( ) ) is OK, ( ) ( ) is OK, but ( ( ) is impossible.

Why? If we discover u first and then discover v before finishing u, that means we called DFS-Visit(v) from within the recursive call chain starting at u. So v must finish before u finishes. The interval for v is nested inside u's interval.

🔑 The Key Rule for Edges

If there is an edge u → v in the graph, what can we say about finish times?

When we're exploring u (u is GRAY) and we look at the edge u → v, there are three cases:

v is WHITE: We haven't visited v yet. We recurse into v. v will finish before u. So f[v] < f[u]. ✓
v is BLACK: v is already completely done. Its finish time was recorded earlier. So f[v] < f[u]. ✓
v is GRAY: v is an ancestor of u in the DFS tree — we're inside v's exploration! This edge u → v goes "back" to an ancestor. This is called a back edge, and it means there's a cycle v → ... → u → v.

Bottom line: In a DAG (no cycles), for every edge u → v, we always have f[v] < f[u].

🎮 Chapter 5: Interactive DFS Walkthrough

Let's trace DFS on this graph step by step. Watch the colors and timestamps change!

Graph: A → B, A → C, B → D, C → D, D → E

White

Gray

Black

Press Next Step to begin DFS from vertex A. We'll process neighbors in alphabetical order.

Timestamps Table

Vertex	A	B	C	D	E
d[v]	—	—	—	—	—
f[v]	—	—	—	—	—

🏘️ Chapter 6: Strongly Connected Components (SCCs)

🌍 Real-Life Analogy

Imagine neighborhoods in a city with one-way streets. A strongly connected component is a neighborhood where, starting from any house, you can drive to any other house in that same neighborhood and drive back, following the one-way streets.

📖 Formal Definition

A Strongly Connected Component (SCC) of a directed graph G is a maximal set of vertices C ⊆ V such that for every pair of vertices u, v ∈ C:

There exists a path from u to v, AND
There exists a path from v to u.

Maximal means you can't add any more vertices and still have the property hold.

Let's see an example:

Edges: 1→2, 2→3, 3→1, 2→4, 4→5, 5→4. Red dashed = cross-SCC edge.

SCC 1 = {1, 2, 3}: From 1, go to 2, go to 3, go back to 1. Every pair can reach each other.
SCC 2 = {4, 5}: From 4, go to 5. From 5, go back to 4. They can reach each other.
Can we merge them? No! You can go from SCC 1 to SCC 2 (via 2→4), but you cannot go from SCC 2 back to SCC 1. So they're separate SCCs.

🔑 The Component Graph Is a DAG

If you shrink each SCC into a single "super-vertex", the resulting graph (called the Component Graph) is always a DAG — it has no cycles.

Why? If there were a cycle among super-vertices, say SCC_A → SCC_B → ... → SCC_A, then every vertex in all those SCCs could reach every other vertex, so they should all be in the same SCC. That contradicts them being in separate SCCs.

🔧 Chapter 7: Kosaraju's Algorithm — The Steps

Kosaraju's algorithm finds all SCCs in a directed graph in O(V + E) time. Here are the three steps:

Kosaraju(Graph G):
  // STEP 1: Run DFS on original graph G.
  //         Record finish times f[v] for all vertices.
  Run DFS(G)
  // STEP 2: Build the transpose graph G^T.
  //         (Reverse every edge: if G has u→v, G^T has v→u)
  Compute GT
  // STEP 3: Run DFS on G^T, but in the main loop,
  //         process vertices in DECREASING f[v] order.
  //         Each DFS tree = one SCC.
  Run DFS(GT) with vertices sorted by decreasing f[v]
  return each tree from Step 3 as an SCC

🌍 Step-by-Step Analogy

Step 1: Explore the whole city, noting what time you finish exploring each neighborhood. The neighborhoods you finish last are the ones you started from or that are "upstream" — they have roads leading out to other places.

Step 2: Magically reverse every one-way street in the city.

Step 3: Now explore again, but start from the neighborhood that finished last in Step 1. With the roads reversed, you can only reach vertices in the same SCC — the reversed roads that used to go OUT now go IN, so you can't "escape" to other SCCs. Each connected chunk you find is exactly one SCC.

📖 What is the Transpose Graph G^T?

G^T has the same vertices as G, but every edge is reversed.

If G has edge u → v, then G^T has edge v → u.

Crucial property: G and G^T have exactly the same SCCs. If you can go from u to v and back in G, you can also go from u to v and back in G^T (just using the reversed path).

🧮 Chapter 8: WHY Kosaraju's Works — The Full Proof

This is the most important chapter. We'll build the proof step by step, like stacking bricks.

Brick 1: Define what we need to show

We need to prove that in Step 3, each DFS tree in the forest is exactly one SCC. This means two things:

Every vertex in a DFS tree belongs to the same SCC (we don't mix vertices from different SCCs in one tree).
Every vertex of an SCC ends up in the same DFS tree (we don't split an SCC across trees).

Brick 2: Finish times and SCCs in Step 1

📖 Definition: Finish time of an SCC

For an SCC C, define f(C) = max { f[u] : u ∈ C }. That is, the finish time of the SCC is the largest finish time among all its vertices.

🔑 Lemma 1: If there's an edge between SCCs, the "source" SCC finishes later

Claim: If there is an edge from a vertex in SCC C to a vertex in SCC C' (where C ≠ C'), then f(C) > f(C').

Why? Consider the DFS in Step 1. There are two sub-cases depending on which SCC gets discovered first:

Sub-case A: Some vertex in C is discovered before any vertex in C'. Then from C, DFS will follow the cross-SCC edge into C' and explore all of C' before coming back to finish C. So every vertex in C' finishes before the last vertex in C finishes. Therefore f(C) > f(C').

Sub-case B: Some vertex in C' is discovered first. But there's no edge from C' to C (if there were, combined with the edge from C to C', they'd be in the same SCC!). So DFS explores all of C' and finishes it entirely. Only later does DFS discover C. So again, f(C) > f(C').

In short: in the component graph (the DAG of SCCs), "upstream" SCCs have larger finish times.

Brick 3: Step 3 processes SCCs in the right order

In Step 3, we process vertices in decreasing order of their Step 1 finish times. By Lemma 1, this means:

🔑 Consequence

We start Step 3's DFS from a vertex in the SCC with the highest finish time. By Lemma 1, this is an SCC that has no incoming edges from other SCCs in the component graph — it's a source in the component DAG.

Brick 4: In G^T, source SCCs become sinks

Remember, Step 3 runs on the transpose graph G^T. When we reverse all edges:

An SCC that was a source in the component DAG (had only outgoing cross-SCC edges) becomes a sink in the transposed component DAG (has only incoming cross-SCC edges, which are reversed to outgoing... wait, let's be more careful).

Actually, let's think about it precisely:

In G, if there's a cross-SCC edge from C to C', then in G^T, that edge goes from C' to C.
So in G^T's component DAG, the arrows between SCCs are all reversed.
An SCC with the highest f(C) was a source in G's component DAG (no incoming cross-SCC edges in G).
In G^T's component DAG, this SCC has no outgoing cross-SCC edges — it's a sink.

🔑 Critical Insight

When we start a DFS from a vertex u in this "sink SCC" of G^T, the DFS can reach all other vertices in u's SCC (since SCCs are the same in G and G^T), but it cannot escape to any other SCC (because there are no outgoing cross-SCC edges from this SCC in G^T).

Therefore, the DFS tree rooted at u contains exactly the vertices of u's SCC. Nothing more, nothing less.

Brick 5: After peeling off one SCC, the argument repeats

Once we've identified the first SCC and colored all its vertices BLACK, the next unvisited vertex we try (the one with the next-highest finish time) belongs to the SCC with the next-highest f(C). By the same argument:

This SCC is now a sink in the remaining transposed component graph (the previous sink has been removed).
DFS from here can only reach vertices in this SCC (in G^T).
So again, the DFS tree = exactly this SCC.

This keeps repeating until all vertices are processed. Each DFS tree is exactly one SCC. ∎

Putting it all together

📜 Complete Proof Summary

Step 1 gives us finish times. Lemma 1 tells us that if SCC C has an edge to SCC C', then f(C) > f(C'). So SCCs are "sorted" by finish time in the component DAG, with sources having the highest times.
Step 2 reverses edges. The SCCs don't change (mutual reachability is symmetric to reversal). But in the transposed component DAG, all arrows between SCCs flip.
Step 3 processes vertices in decreasing f[v] order on G^T. The first vertex we pick is in the SCC with the highest finish time — a source in G's component DAG, hence a sink in G^T's component DAG. DFS from here explores exactly that SCC and nothing else (can't escape a sink). We mark those vertices done.
Induction: After removing the found SCC, the next-highest-finish-time SCC becomes a sink in the remaining transposed component DAG. Repeat. Each DFS tree = one SCC.
Complexity: Two DFS calls + one graph transposition = O(V + E) + O(V + E) + O(V + E) = O(V + E).

⚠️ Common Mistake

"Why can't we just run DFS on G (without transposing) and call each tree an SCC?"

Because DFS on the original graph can escape from one SCC to another! If SCC C has an edge to SCC C', DFS starting in C will wander into C'. The transpose prevents this escape by flipping the cross-SCC edges.

⚠️ Another Common Mistake

"Why not just run DFS on G^T in any order?"

Because if you start from a vertex in a non-sink SCC of G^T, DFS could escape to other SCCs (following the transposed cross-SCC edges that now point outward). The decreasing-finish-time order ensures you always start from a sink of the remaining transposed component DAG.

🎮 Chapter 9: Interactive Kosaraju's Walkthrough

Let's trace Kosaraju's algorithm on the graph: 1→2, 2→3, 3→1, 3→4, 4→5, 5→4.

Press Next Step to begin Kosaraju's algorithm.

Tracking Table

Vertex	1	2	3	4	5
d[v] (Step 1)	—	—	—	—	—
f[v] (Step 1)	—	—	—	—	—
SCC	—	—	—	—	—

🎯 Chapter 10: Summary

DFS Recap

Colors: WHITE (unvisited) → GRAY (in progress) → BLACK (done).
Discovery time d[u]: When we first visit u (turn it GRAY).
Finish time f[u]: When we're completely done with u and all its descendants (turn it BLACK).
Parenthesis Theorem: Intervals [d[u], f[u]] are properly nested or disjoint.
Edge u→v in a DAG: Always f[v] < f[u].

Kosaraju's Algorithm Recap

DFS on G → get finish times.
Transpose G → reverse all edges → same SCCs.
DFS on G^T in decreasing finish time order → each tree = one SCC.

Why It Works (One-Paragraph Version)

In Step 1, SCCs that can reach other SCCs get higher finish times (Lemma 1). In Step 3, we process high-finish-time vertices first on the reversed graph. On the reversed graph, the SCC with the highest finish time is a sink — it has no outgoing cross-SCC edges. So DFS from there can only reach vertices in that same SCC. After removing them, the next SCC becomes a sink, and the process repeats. Each DFS tree captures exactly one SCC.

🔑 The Genius of Kosaraju's

The algorithm is just two DFS calls and one edge reversal. The magic lies in the ordering: Step 1's finish times tell us which SCC to peel off first, and the transpose ensures DFS can't wander across SCC boundaries. Together, they give us all SCCs in linear time.

🧭 DFS, Finish Times & Kosaraju's Algorithm

🗺️ Chapter 1: What Is a Directed Graph?

🔦 Chapter 2: DFS — Exploring a Maze

The DFS Procedure (Pseudocode)

🎨 Chapter 3: The Three Colors

⏱️ Chapter 4: Discovery Time & Finish Time

🎮 Chapter 5: Interactive DFS Walkthrough

Timestamps Table

🏘️ Chapter 6: Strongly Connected Components (SCCs)

🔧 Chapter 7: Kosaraju's Algorithm — The Steps

🧮 Chapter 8: WHY Kosaraju's Works — The Full Proof

Brick 1: Define what we need to show

Brick 2: Finish times and SCCs in Step 1

Brick 3: Step 3 processes SCCs in the right order

Brick 4: In GT, source SCCs become sinks

Brick 5: After peeling off one SCC, the argument repeats

Putting it all together

🎮 Chapter 9: Interactive Kosaraju's Walkthrough

Tracking Table

🎯 Chapter 10: Summary

DFS Recap

Kosaraju's Algorithm Recap

Why It Works (One-Paragraph Version)

Brick 4: In G^T, source SCCs become sinks