The Fowlkes–Mallows index is an external evaluationmethod that is used to determine the similarity between two clusterings, and also a metric to measure confusion matrices. This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher value for the Fowlkes–Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
Preliminaries
The Fowlkes–Mallows index, when results of two clustering algorithms are used to evaluate the results, is defined as
Definition
Consider two hierarchical clusterings of objects labeled and. The trees and can be cut to produce clusters for each tree. For each value of, the following table can then be created where is of objects common between the th cluster of and th cluster of. The Fowlkes–Mallows index for the specific value of is then defined as where can then be calculated for every value of and the similarity between the two clusterings can be shown by plotting versus. For each we have. Fowlkes–Mallows index can also be defined based on the number of points that are common or uncommon in the two hierarchical clusterings. If we define It can be shown that the four counts have the following property and that the Fowlkes–Mallows index for two clusterings can be defined as
Discussion
Since the index is directly proportional to the number of true positives, a higher index means greater similarity between the two clusterings used to determine the index. One basic way to test the validity of this index is to compare two clusterings that are unrelated to each other. Fowlkes and Mallows showed that on using two unrelated clusterings, the value of this index approaches zero as the number of total data pointschosen for clustering increase; whereas the value for the Rand index for the same data quickly approaches making Fowlkes–Mallows index a much more accurate representation for unrelated data. This index also performs well if noise is added to an existingdataset and their similarity compared. Fowlkes and Mallows showed that the value of the index decreases as the component of the noise increases. The index also showed similarity even when the noisy dataset had a different number of clusters than the clusters of the original dataset. Thus making it a reliable tool for measuring similarity between two clusters.