[Algorithm Notes] Solving the Lowest Common Ancestor (LCA) Problem – Doubling Algorithm

Translation Notice
This article was machine-translated using DeepSeek-R1.

  • Original Version: Authored in Chinese by myself
  • Accuracy Advisory: Potential discrepancies may exist between translations
  • Precedence: The Chinese text shall prevail in case of ambiguity
  • Feedback: Technical suggestions regarding translation quality are welcomed

Preface

The Lowest Common Ancestor, abbreviated as LCA, is the deepest node that is a common ancestor of two given nodes in a tree.

This algorithm has wide applications, such as efficiently solving the shortest path problem on trees.

For convenience, we denote the LCA of a node set $S=\{v_1,v_2,\ldots,v_n\}$ as $\text{LCA}(v_1,v_2,\ldots,v_n)$ or $\text{LCA}(S)$.

Partial content references OI Wiki. All algorithms in this article are implemented in C++.

Example Problem: Luogu P3379 【Template】Lowest Common Ancestor (LCA)

Properties

  1. $\text{LCA}(\{u\})=u$;
  2. $u$ is an ancestor of $v$ if and only if $\text{LCA}(u,v)=u$;
  3. If neither $u$ is an ancestor of $v$ nor $v$ is an ancestor of $u$, then $u$ and $v$ reside in two different subtrees of $\text{LCA}(u,v)$;
  4. In pre-order traversal, $\text{LCA}(S)$ appears before all elements in $S$, while in post-order traversal, it appears after all elements in $S$;
  5. The LCA of the union of two node sets is the LCA of their individual LCAs: $\text{LCA}(A\cup B)=\text{LCA}(\text{LCA}(A), \text{LCA}(B))$;
  6. The LCA of two nodes always lies on the shortest path between them;
  7. $d(u,v)=h(u)+h(v)-2h(\text{LCA}(u,v))$, where $d$ is the distance between two nodes on the tree, and $h$ represents the distance from a node to the root.

Solving Algorithms

Prerequisite 1: Adjacency List Storage of Trees

In simple terms, adjacency list storage means for each node, store all nodes directly reachable via one directed or undirected edge.

Traditional implementations use linked lists (or simulated linked lists), which can be cumbersome and error-prone. For better readability, we use vector from the STL.

1
2
3
4
#include <vector> // Requires STL vector
#define maxn 100005 // Maximum number of nodes

std::vector<int> G[maxn];

To add an undirected edge $u\leftrightarrow v$:

1
2
G[u].push_back(v);
G[v].push_back(u);

To add a directed edge $u\to v$:

1
G[u].push_back(v);

Traverse all nodes directly reachable from $v$:

1
2
for(int u: G[v])
    cout << u << endl;

Prerequisite 2: DFS Traversal & Node Depth Calculation

Both algorithms require preprocessing node depths.
The depth of a node is defined as its distance from the root.

Preprocessing all node depths is straightforward using a tree DP approach. Let $h_u$ denote the depth of node $u$:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <cstdio>
#include <vector>
#define maxn 100005
using namespace std;

vector<int> G[maxn]; // Adjacency list
int depth[maxn]; // Depth of each node

void dfs(int v, int par) // dfs(current node, parent node)
{
    int d = depth[v] + 1; // Child depth = current depth + 1
    for(int u: G[v])
        if(u != par) // Without this check, infinite recursion occurs
        {
            depth[u] = d; // Update child depth
            dfs(u, v); // Continue DFS
        }
}

int main()
{
    // Build the graph
    // ...
    // Assume graph is stored in G:
    int root = 0; // Default root is node 0; adjust as needed
    dfs(root, -1); // Root has no parent (-1)
    return 0;
}

Naive Algorithm

Let $u,v$ be the nodes whose LCA is sought. Preprocess each node’s parent (denoted as $f_v$).

Steps:

  1. Equalize depths of $u$ and $v$: Move the deeper node upward until both depths match.
  2. While $u \ne v$: Set $u \gets f_u$, $v \gets f_v$.
  3. When $u = v$, this value is the LCA.

Time Complexity:

  • Preprocessing: $\mathcal O(N)$ via DFS.
  • Per Query: Worst-case $\mathcal O(N)$, average $\mathcal O(\log N)$ (for random trees with height $\lceil\log N\rceil$).

Reference Code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <cstdio>
#include <vector>
#include <algorithm>
#define maxn 500005
using namespace std;

vector<int> G[maxn];
int depth[maxn], par[maxn];

void dfs(int v)
{
    int d = depth[v] + 1;
    for(int u: G[v])
        if(u != par[v])
        {
            par[u] = v, depth[u] = d;
            dfs(u);
        }
}

int lca(int u, int v)
{
    if(depth[u] < depth[v])
        swap(u, v);
    while(depth[u] > depth[v])
        u = par[u];
    while(u != v)
        u = par[u], v = par[v];
    return u;
}

int main()
{
    int n, q, root;
    scanf("%d%d%d", &n, &q, &root);
    for(int i=1; i<n; i++)
    {
        int u, v;
        scanf("%d%d", &u, &v);
        G[u].push_back(v);
        G[v].push_back(u);
    }
    par[root] = -1, depth[root] = 0;
    dfs(root);
    while(q--)
    {
        int u, v;
        scanf("%d%d", &u, &v);
        printf("%d\n", lca(u, v));
    }
    return 0;
}

This code results in TLE on the last four test cases due to worst-case $\mathcal O(NQ)$ complexity when $N,Q\le 5\times 10^5$.

TLE

Doubling Algorithm

The doubling algorithm improves the naive approach and is the classic LCA solution.

Preprocessing:

  • Let $\text{fa}_{x,i}$ denote the $2^i$-th ancestor of node $x$.
  • During DFS, preprocess $\text{fa}_{x,i}$:
    • For $i=0$, $\text{fa}_{x,0}$ is the parent of $x$.
    • For $i>0$, $\text{fa}_{x,i} = \text{fa}_{\text{fa}_{x,i-1}, i-1}$.

Query Steps:

  1. Equalize depths using binary lifting.
  2. If $u = v$ after equalizing, return $u$.
  3. Simultaneously move $u$ and $v$ upward until their ancestors converge. The LCA is $\text{fa}_{u,0}$.

Time Complexity:

  • Preprocessing: $\mathcal O(N \log N)$.
  • Per Query: $\mathcal O(\log N)$.

Reference Code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <cstdio>
#include <vector>
#include <cmath>
#define maxn 500005
using namespace std;

vector<int> G[maxn];
int fa[maxn][19]; // 2^19=524288
int depth[maxn];

void dfs(int v, int par)
{
    fa[v][0] = par;
    int d = depth[v] + 1;
    for(int i=1; (1<<i)<d; i++)
        fa[v][i] = fa[fa[v][i - 1]][i - 1];
    for(int u: G[v])
        if(u != par)
            depth[u] = d, dfs(u, v);
}

inline int lca(int u, int v)
{
    if(depth[u] < depth[v])
        u ^= v ^= u ^= v;
    int m = depth[u] - depth[v];
    for(int i=0; m; i++, m>>=1)
        if(m & 1)
            u = fa[u][i];
    if(u == v) return u; // This line is essential
    for(int i=log2(depth[u]); i>=0; i--)
        if(fa[u][i] != fa[v][i])
            u = fa[u][i], v = fa[v][i];
    return fa[u][0];
}

int main()
{
    int n, q, root;
    scanf("%d%d%d", &n, &q, &root);
    for(int i=1; i<n; i++)
    {
        int x, y;
        scanf("%d%d", &x, &y);
        G[--x].push_back(--y);
        G[y].push_back(x);
    }
    depth[--root] = 0;
    dfs(root, -1);
    while(q--)
    {
        int u, v;
        scanf("%d%d", &u, &v);
        printf("%d\n", lca(--u, --v) + 1);
    }
    return 0;
}

AC

Exercises

Summary

This article details LCA problem-solving using two algorithms. Comparison:

Algorithm Preprocessing Time Per Query Time1 Space Passes Example2?
Naive $\mathcal O(N)$ $\mathcal O(N)$ $\mathcal O(N)$
Doubling $\mathcal O(N \log N)$ $\mathcal O(\log N)$ $\mathcal O(N \log N)$ ✔️

Your support through likes and shares is appreciated!

Built with Hugo
Theme Stack designed by Jimmy