Proof of O(log*n) time complexity of union–find


In computer science, Union Find is an algorithm for doing certain operations on sets. This page is about proof of O amortized time of Union Find
Statement: If m operations, either Union or Find, are applied to n elements, the total run time is O, where log* is the iterated logarithm.

Proof

Lemma 1: As the find function follows the path along to the root, the rank of node it encounters is increasing.
Lemma 2: A node u which is root of a subtree with rank r has at least [|2]r nodes.
Lemma 3: The maximum number of nodes of rank r is at most n/2r.
For convenience, we define "bucket" here: a bucket is a set that contains vertices with particular ranks.
We create some buckets and put vertices into the buckets according to their ranks inductively. That is, vertices with rank 0 go into the zeroth bucket, vertices with rank 1 go into the first bucket, vertices with ranks 2 and 3 go into the second bucket. If the Bth bucket contains vertices with ranks from interval = then the st bucket will contain vertices with ranks from interval .
We can make two observations about the buckets.
  1. The total number of buckets is at most log*n
  2. :Proof: When we go from one bucket to the next, we add one more two to the power, that is, the next bucket to will be
  3. The maximum number of elements in bucket is at most 2n/2B
  4. :Proof: The maximum number of elements in bucket is at most n/2B + n/2B+1 + n/2B+2 + … + n/22B – [|1] ≤ 2n/2B
Let F represent the list of "find" operations performed, and let
Then the total cost of m finds is T = T1 + T2 + T3
Since each find operation makes exactly one traversal that leads to a root, we have T1 = O.
Also, from the bound above on the number of buckets, we have T2 = O.
For T3, suppose we are traversing an edge from u to v, where u and v have rank in the bucket and v is not the root.
Fix u and consider the sequence v1,v2,...,vk that take the role of v in different find operations.
Because of path compression and not accounting for the edge to a root, this sequence contains only different nodes and because of [|Lemma 1] we know that the ranks of the nodes in this sequence are strictly increasing.
By both of the nodes being in the bucket we can conclude that the length k of the sequence is at most the number of ranks in the buckets B, i.e. at most 2B − 1 − B < 2B.
Therefore,
From Observations 1 and 2, we can conclude that
Therefore, T = T1 + T2 + T3 = O.