Amortized analysis of the Set-Union data structure

We prove: A sequence of [Graphics:Images/index_gr_1.gif] MAKE-SET, LINK, FIND-SET operations, [Graphics:Images/index_gr_2.gif] of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank and path compression in worst-case time [Graphics:Images/index_gr_3.gif].

In general, we use the variable names from the course book unless otherwise defined. Note that [Graphics:Images/index_gr_4.gif] - we use different braces for the ease of reading.

Assign
    [Graphics:Images/index_gr_5.gif]# operations in total
    [Graphics:Images/index_gr_6.gif]# MAKE-SET operations (hence [Graphics:Images/index_gr_7.gif])
    [Graphics:Images/index_gr_8.gif]
    [Graphics:Images/index_gr_9.gif]Set of the nodes in the forest
    [Graphics:Images/index_gr_10.gif]is not a root and [Graphics:Images/index_gr_11.gif]
    [Graphics:Images/index_gr_12.gif]arbitrary parts of trees (we use [Graphics:Images/index_gr_13.gif] in diagrams)

Using the potential method of amortized analysis, our goal is to show the following inequality chain
    [Graphics:Images/index_gr_14.gif]
which especially holds, if
    [Graphics:Images/index_gr_15.gif]
    [Graphics:Images/index_gr_16.gif]
    [Graphics:Images/index_gr_17.gif]

We can arbitrarily set [Graphics:Images/index_gr_18.gif] := 0  ([Graphics:Images/index_gr_19.gif] will follow from the definition of [Graphics:Images/index_gr_20.gif]). Hence our goal is to show that any of the operations (MAKE-SET, LINK, FIND-SET) at any position [Graphics:Images/index_gr_21.gif] in the sequence of operations has amortized cost [Graphics:Images/index_gr_22.gif]
    [Graphics:Images/index_gr_23.gif].

Important properties of the function [Graphics:Images/index_gr_24.gif] are
    [Graphics:Images/index_gr_25.gif] for a fixed node [Graphics:Images/index_gr_26.gif] can only increase during execution
    [Graphics:Images/index_gr_27.gif]    if [Graphics:Images/index_gr_28.gif]
    [Graphics:Images/index_gr_29.gif]    if [Graphics:Images/index_gr_30.gif]
    [Graphics:Images/index_gr_31.gif]
Let us define:
    [Graphics:Images/index_gr_32.gif]
    [Graphics:Images/index_gr_33.gif]
We refer to the table on the last page of the handout where we give some values for the two functions where [Graphics:Images/index_gr_34.gif] and [Graphics:Images/index_gr_35.gif].
More than once in the proof we will need the properties
    [Graphics:Images/index_gr_36.gif]
    [Graphics:Images/index_gr_37.gif] for a fixed node [Graphics:Images/index_gr_38.gif] can only increase during execution
    [Graphics:Images/index_gr_39.gif]
    when level(x) does not change, [Graphics:Images/index_gr_40.gif], for a fixed node [Graphics:Images/index_gr_41.gif], can only increase.
Now we are set to define the potential function:
    [Graphics:Images/index_gr_42.gif]
    [Graphics:Images/index_gr_43.gif]

It is imporant to observe, that [Graphics:Images/index_gr_44.gif] only depends on [Graphics:Images/index_gr_45.gif] and [Graphics:Images/index_gr_46.gif], to be more specific, [Graphics:Images/index_gr_47.gif] only depends on [Graphics:Images/index_gr_48.gif] and [Graphics:Images/index_gr_49.gif]. To avoid the overuse of indicees, we do not assign [Graphics:Images/index_gr_50.gif] to the rank, level and iter functions, although they may vary over the sequence of operations!

Lemma 21.9a: Let [Graphics:Images/index_gr_51.gif]. Then after the [Graphics:Images/index_gr_52.gif]-th operation the potential of [Graphics:Images/index_gr_53.gif] does not change if [Graphics:Images/index_gr_54.gif] and [Graphics:Images/index_gr_55.gif] do not change, otherwise [Graphics:Images/index_gr_56.gif] drops at least by one.
Proof: Follows from the properties of the function level and iter and their image sets. For example in the case that [Graphics:Images/index_gr_57.gif] increases by one, [Graphics:Images/index_gr_58.gif] can only drop by [Graphics:Images/index_gr_59.gif].
Lemma 21.9b: Let [Graphics:Images/index_gr_60.gif] but [Graphics:Images/index_gr_61.gif] is not a root then after the [Graphics:Images/index_gr_62.gif]-th operation the potential of [Graphics:Images/index_gr_63.gif] does not change. Proof follows from definition of [Graphics:Images/index_gr_64.gif], since [Graphics:Images/index_gr_65.gif] does not increase.

Amortized cost of MAKE-SET: Assume that the [Graphics:Images/index_gr_66.gif]-th operation is a MAKE-SET([Graphics:Images/index_gr_67.gif]) operation, where [Graphics:Images/index_gr_68.gif] denotes the new node. The actual cost of this operation is [Graphics:Images/index_gr_69.gif] = 1. Let us diagram the evolution of the datastructure:

    [Graphics:Images/index_gr_70.gif]    [Graphics:Images/index_gr_71.gif]    [Graphics:Images/index_gr_72.gif]

Since we assign [Graphics:Images/index_gr_73.gif] to the new node the potential of [Graphics:Images/index_gr_74.gif] is [Graphics:Images/index_gr_75.gif]. From the definition of MAKE-SET we get that the rank values of the existing nodes do not change. Hence their potential does not change and the computation of the amortized cost yields
    [Graphics:Images/index_gr_76.gif]

Amortized cost of LINK: Assume that the [Graphics:Images/index_gr_77.gif]-th operation is a LINK([Graphics:Images/index_gr_78.gif]) operation, where [Graphics:Images/index_gr_79.gif] and [Graphics:Images/index_gr_80.gif] are roots of two trees. W.l.o.g. [Graphics:Images/index_gr_81.gif] shall become a node of the remaining root [Graphics:Images/index_gr_82.gif]. The actual cost of this operation is [Graphics:Images/index_gr_83.gif] = 1. Let us diagram the evolution of the datastructure:

    [Graphics:Images/index_gr_84.gif]             [Graphics:Images/index_gr_85.gif]            [Graphics:Images/index_gr_86.gif]    [Graphics:Images/index_gr_87.gif]




The amortized cost can then be written as
    [Graphics:Images/index_gr_88.gif]
The sums over [Graphics:Images/index_gr_89.gif] cancel immediately. The potential of [Graphics:Images/index_gr_90.gif] decreases considering the image sets of the level and the iter function:
    [Graphics:Images/index_gr_91.gif]
Either [Graphics:Images/index_gr_92.gif] is left alone or it is increased. In either case we have:
    [Graphics:Images/index_gr_93.gif]
So the potential of [Graphics:Images/index_gr_94.gif] increases at most by [Graphics:Images/index_gr_95.gif].
From Lemma 21.9a and Lemma 21.9b we conclude that
    [Graphics:Images/index_gr_96.gif].
All together, it follows that [Graphics:Images/index_gr_97.gif].

Amortized cost of FIND-SET: Assume that the [Graphics:Images/index_gr_98.gif]-th operation is a FIND-SET([Graphics:Images/index_gr_99.gif]) operation. The actual cost of this operation is [Graphics:Images/index_gr_100.gif] where [Graphics:Images/index_gr_101.gif] is the path length from [Graphics:Images/index_gr_102.gif] to the root the specific tree. Let us diagram the evolution of the datastructure:

    [Graphics:Images/index_gr_103.gif]                [Graphics:Images/index_gr_104.gif]    [Graphics:Images/index_gr_105.gif]









The Find-path is illustrated as a thick line. We regard [Graphics:Images/index_gr_106.gif] as the first node on the Find-path. Allow us the remark that in the following lies the very magic of the function [Graphics:Images/index_gr_107.gif] and the only justification for its use. We define
    [Graphics:Images/index_gr_108.gif]
    [Graphics:Images/index_gr_109.gif][Graphics:Images/index_gr_110.gif] lies on the Find-path[Graphics:Images/index_gr_111.gif]
    [Graphics:Images/index_gr_112.gif]    for [Graphics:Images/index_gr_113.gif]
The amortized cost can then be written as
    [Graphics:Images/index_gr_114.gif]
Again, the sums over the nodes of [Graphics:Images/index_gr_115.gif] cancel immediately. Note that if [Graphics:Images/index_gr_116.gif] for each set [Graphics:Images/index_gr_117.gif] there is a node, (listen to more Bach music) which is has no successor on the find path in [Graphics:Images/index_gr_118.gif]. Hence, in total there are at most [Graphics:Images/index_gr_119.gif] such nodes. Let us define the set
    [Graphics:Images/index_gr_120.gif]
Note that
    [Graphics:Images/index_gr_121.gif]
By Lemma 21.9a the potential of nodes from [Graphics:Images/index_gr_122.gif] does not increase. If we can show that for all nodes [Graphics:Images/index_gr_123.gif] the potential [Graphics:Images/index_gr_124.gif], e.g. decreases at least by one, then the amortized cost computation resolves in
    [Graphics:Images/index_gr_125.gif]
To see for all nodes [Graphics:Images/index_gr_126.gif] the potential [Graphics:Images/index_gr_127.gif], choose [Graphics:Images/index_gr_128.gif] where [Graphics:Images/index_gr_129.gif] succeeds [Graphics:Images/index_gr_130.gif] in the find path. Note that [Graphics:Images/index_gr_131.gif]. Then prior to path compression the following inequalities hold:
    [Graphics:Images/index_gr_132.gif]
    [Graphics:Images/index_gr_133.gif]
    [Graphics:Images/index_gr_134.gif]
Putting these together yields
    [Graphics:Images/index_gr_135.gif].
So [Graphics:Images/index_gr_136.gif] increases and by Lemma 21.9a we have [Graphics:Images/index_gr_137.gif] for all nodes [Graphics:Images/index_gr_138.gif].

We want to conclude with the remark, that we did not say it is going to be easy, and that the [Graphics:Images/index_gr_139.gif] are actually [Graphics:Images/index_gr_140.gif] and [Graphics:Images/index_gr_141.gif], but we can overcome the constant hidden behind the actual costs by scaling up the [Graphics:Images/index_gr_142.gif]'s.

References:
1. CLRS (mostly green cover).
2. Galil et al. Data Structures and Algorithms for Disjoint Set Union Problems. ACM Computing Surveys, Vol. 23, No. 3, September 1991


Converted by Mathematica      January 4, 2005