【Algorithm Notes】Kruskal/Prim Algorithms — Solving Minimum Spanning Tree Problems

Translation Notice
This article was machine-translated using DeepSeek-R1.

  • Original Version: Authored in Chinese by myself
  • Accuracy Advisory: Potential discrepancies may exist between translations
  • Precedence: The Chinese text shall prevail in case of ambiguity
  • Feedback: Technical suggestions regarding translation quality are welcomed

Preface

We often encounter problems like this in real life:

Highway Construction
There are several cities where highways need to be built between them. Each pair of cities can be connected via a bidirectional road. Each road between two cities has a specific construction cost. How can we construct roads with the minimal total cost while ensuring all cities are connected directly or indirectly?

We can abstract this problem by representing cities as graph vertices and roads as weighted undirected edges, forming a weighted undirected graph. Since we want minimal total cost, the constructed roads must form a spanning tree, transforming the problem into:

Given a weighted undirected graph $G$, find its spanning tree where the sum of all edge weights is minimized.

This is the well-known Minimum Spanning Tree (MST) problem.
Consider the following graph:

MST-Graph

The green edges form its minimum spanning tree.

Mathematicians have studied this problem extensively. However, focusing only on correctness without considering simplicity, feasibility, and efficiency led many algorithms to obsolescence. Two algorithms stood out: Kruskal and Prim.

Template Problem: Luogu P3366 【Template】Minimum Spanning Tree
Data Constraints: $N\le5000,M\le2\times10^5,w\le 10^4$.

Kruskal

The Kruskal algorithm, proposed by Joseph Kruskal in 1956, has a time complexity of $\mathcal O(m\log m)$. Let’s examine its workflow.

Kruskal Algorithm Workflow

  1. Sort all edges in ascending order of weight and iterate through them.
  2. For each edge, if adding it to the current subgraph doesn’t form a cycle, include it as part of the MST.
  3. The algorithm terminates after selecting $N-1$ edges.

Disjoint-Set Union (DSU) — Accelerating the Algorithm

Before implementing Kruskal, we need DSU. Using DFS for cycle detection would result in $\mathcal O(nm + m\log m)$ time complexity, which is inefficient. DSU reduces this to $\mathcal O(m\log m)$ (dominated by sorting). DSU Template:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class dsu
{
private:
    const int n;
    int* fa;
public:
    inline dsu(int count): n(count) // Initialize DSU with size n
    {
        fa = new int[n]; // Allocate memory
        for(int i=0; i<n; i++)
            fa[i] = i; // Initialize fa[i]=i
    }
    inline ~dsu() { delete[] fa; }  // Prevent memory leaks
    inline int size() { return n; } // Return DSU size
    int find(int x) { return fa[x] == x? x: fa[x] = find(fa[x]); } // Find with path compression
    inline bool same(int x, int y) { return find(x) == find(y); } // Check connectivity
    inline void merge(int x, int y) { fa[find(x)] = find(y); } // Merge two components
    inline bool connected() // Check if the entire graph is connected
    {
        int p = find(0);
        for(int i=0; i<n; i++)
            if(find(i) != p)
                return false;
        return true;
    }
};

Using DSU, the algorithm’s time complexity becomes $\mathcal O(m\log m)$. Now, let’s see the code.

Reference Code

For readers unfamiliar with DSU, feel free to copy the template first.Study it later
For single Kruskal runs, use priority_queue (more efficient than sort). For multiple runs, pre-sort edges.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <cstdio>
#include <queue>
using namespace std;

// Represents an edge for sorting
struct Edge
{
    int from, to, weight;
    inline bool operator <(const Edge& e2) const
    {
        return weight > e2.weight; // Note: Reverse order for priority_queue
    }
    inline void read()
    {
        scanf("%d%d%d", &from, &to, &weight);
        from --, to --;
    }
};

// DSU Template
class dsu
{
private:
    const int n;
    int* fa;
public:
    inline dsu(int count): n(count)
    {
        fa = new int[n];
        for(int i=0; i<n; i++)
            fa[i] = i;
    }
    inline ~dsu() { delete[] fa; }
    inline int size() { return n; }
    int find(int x) { return fa[x] == x? x: fa[x] = find(fa[x]); }
    inline bool same(int x, int y) { return find(x) == find(y); }
    inline void merge(int x, int y) { fa[find(x)] = find(y); }
    inline bool connected()
    {
        int p = find(0);
        for(int i=0; i<n; i++)
            if(find(i) != p)
                return false;
        return true;
    }
};

int main()
{
    int n, m;
    scanf("%d%d", &n, &m); // Read vertices and edges
    priority_queue<Edge> q; // Priority queue for sorting
    while(m--)
    {
        Edge e;
        e.read();  // Read edge
        q.push(e); // Enqueue
    }
    int ans = 0, // Total weight
        cnt = 0; // Edge count
    dsu d(n);    // Initialize DSU
    while(!q.empty() && cnt < n - 1) // Process edges until n-1 selected
    {
        auto [u, v, w] = q.top(); q.pop(); // Extract minimum edge
        if(!d.same(u, v))  // No cycle formed
        {
            d.merge(u, v);    // Add edge
            ans += w, cnt ++; // Update
        }
    }
    if(cnt == n - 1) printf("%d\n", ans); // Output if MST exists
    else puts("orz"); // Otherwise...
    return 0;
}

Alternative ending (slower but concise):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int ans = 0;
dsu d(n);
while(!q.empty())
{
    auto [u, v, w] = q.top(); q.pop();
    if(!d.same(u, v))
    {
        d.merge(u, v);
        ans += w;
    }
}
if(d.connected()) printf("%d\n", ans);
else puts("orz");

Prim

The Prim algorithm was discovered by Czech mathematician Vojtěch Jarník in 1930, independently by Robert C. Prim in 1957, and rediscovered by Edsger W. Dijkstra in 1959. Hence, it’s also called DJP, Jarník, or Prim-Jarník algorithm.

Prim Algorithm Workflow

Similar to Dijkstra, vertices are divided into sets $S$ and $T$:

  1. Initially, all vertices are in $S$, $T$ is empty.
  2. Move any vertex from $S$ to $T$.
  3. Repeat until all vertices are in $T$:
    • Select edge $(u,v,w)$ where $u\in S$, $v\in T$, with minimal $w$.
    • Add this edge to MST and move $u$ to $T$.

We present optimized versions using priority_queue and set.

Priority Queue Optimization

Runtime: $328\mathrm{ms}$
Time Complexity: $\mathcal O(n\log m)$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <cstdio>
#include <queue>
#define maxn 5005
#define INF 2147483647
using namespace std;

using pii = pair<int, int>;
vector<pii> G[maxn];
int dis[maxn];

int main()
{
    int n, m;
    scanf("%d%d", &n, &m);
    while(m--)
    {
        int u, v, w;
        scanf("%d%d%d", &u, &v, &w);
        G[--u].emplace_back(--v, w);
        G[v].emplace_back(u, w);
    }
    for(int i=1; i<n; i++)
        dis[i] = INF;
    priority_queue<pii, vector<pii>, greater<pii>> q;
    q.emplace(0, 0);
    int ans = 0, left = n;
    while(!q.empty() && left > 0)
    {
        auto [d, v] = q.top(); q.pop();
        if(d != dis[v]) continue;
        dis[v] = -INF, left --, ans += d;
        for(auto [u, w]: G[v])
            if(w < dis[u])
                q.emplace(dis[u] = w, u);
    }
    if(left) puts("orz");
    else printf("%d\n", ans);
    return 0;
}

Set Optimization

Runtime: $351\mathrm{ms}$
Time Complexity: $\mathcal O(n\log n)$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <cstdio>
#include <vector>
#include <set>
#define maxn 5005
#define INF 2147483647
using namespace std;

using pii = pair<int, int>;
vector<pii> G[maxn];
int dis[maxn];

int main()
{
    int n, m;
    scanf("%d%d", &n, &m);
    while(m--)
    {
        int u, v, w;
        scanf("%d%d%d", &u, &v, &w);
        G[--u].emplace_back(--v, w);
        G[v].emplace_back(u, w);
    }
    for(int i=1; i<n; i++)
        dis[i] = INF;
    set<pii> s;
    s.emplace(0, 0);
    int ans = 0, left = n;
    while(!s.empty() && left > 0)
    {
        auto it = s.begin();
        auto [d, v] = *it; s.erase(it);
        dis[v] = -INF, left --, ans += d;
        for(auto [u, w]: G[v])
            if(w < dis[u])
            {
                if(dis[u] != INF)
                    s.erase(pii(dis[u], u));
                s.emplace(dis[u] = w, u);
            }
    }
    if(left) puts("orz");
    else printf("%d\n", ans);
    return 0;
}

Exercises

Summary

Comparison of Kruskal and Prim:

Metric Kruskal Prim
Time Complexity $\mathcal O(m\log m)$ $\mathcal O(n\log m)$1
Runtime2 $255\mathrm{ms}$ $328\mathrm{ms}$
Code Complexity Low Medium
Applicability Sparse Graphs Dense Graphs

In most cases, Kruskal is preferred. Use Prim for specific requirements.
If you find this article helpful, please like/share/bookmark. Thanks for your support!


  1. For priority_queue optimization. Set optimization is $\mathcal O(n\log n)$. ↩︎

  2. Results from Luogu P3366. Kruskal uses DSU+priority_queue; Prim uses priority_queue. ↩︎

Built with Hugo
Theme Stack designed by Jimmy