Pointwise mutual information, or point mutual information, is a measure of association used in information theory and statistics. In contrast to mutual information which builds upon PMI, it refers to single events, whereas MI refers to the average of all possible events.
Definition
The PMI of a pair of outcomesx and y belonging to discrete random variablesX and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically: The mutual information of the random variables X and Y is the expected value of the PMI. The measure is symmetric. It can take positive or negative values, but is zero ifX and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events is positive. PMI maximizes when X and Y are perfectly associated, yielding the following bounds: Finally, will increase if is fixed but decreases. Here is an example to illustrate:
x
y
p
0
0
0.1
0
1
0.7
1
0
0.15
1
1
0.05
Using this table we can marginalize to get the following additional table for the individual distributions: With this example, we can compute four values for. Using base-2 logarithms:
Similarities to mutual information
Pointwise Mutual Information has many of the same relationships as the mutual information. In particular, Where is the self-information, or.
Normalized pointwise mutual information (npmi)
Pointwise mutual information can be normalized between resulting in -1 for never occurring together, 0 for independence, and +1 for complete co-occurrence. Where is the joint self-information, which is estimated as.
PMI variants
In addition to the above-mentioned npmi, PMI has many other interesting variants. A comparative study of these variants can be found in
Chain-rule for pmi
Like mutual information, point mutual information follows the chain rule, that is, This is easily proven by:
Applications
In computational linguistics, PMI has been used for finding collocations and associations between words. For instance, countings of occurrences and co-occurrences of words in a text corpus can be used to approximate the probabilities and respectively. The following table shows counts of pairs of words getting the most and the least PMI scores in the first 50 millions of words in Wikipedia filtering by 1,000 or more co-occurrences. The frequency of each count can be obtained by dividing its value by 50,000,952.
Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score.