Chauvenet's criterion


In statistical theory, Chauvenet's criterion is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is likely to be spurious.

Derivation

The idea behind Chauvenet's criterion is to find a probability band, centered on the mean of a normal distribution, that should reasonably contain all n samples of a data set. By doing this, any data points from the n samples that lie outside this probability band can be considered to be outliers, removed from the data set, and a new mean and standard deviation based on the remaining values and new sample size can be calculated. This identification of the outliers will be achieved by finding the number of standard deviations that correspond to the bounds of the probability band around the mean and comparing that value to the absolute value of the difference between the suspected outliers and the mean divided by the sample standard deviation.
Eq.1)
where
In order to be considered as including all observations in the sample, the probability band must only account for samples. In reality we cannot have partial samples so is approximately. Anything less than is approximately and is not valid because we want to find the probability band that contains observations, not samples. In short, we are looking for the probability,, that is equal to out of samples.
Eq.2)
where
The quantity corresponds to the combined probability represented by the two tails of the normal distribution that fall outside of the probability band. In order to find the standard deviation level associated with, only the probability of one of the tails of the normal distribution needs to be analyzed due to its symmetry.
Eq.3)
where
Eq.1 is analogous to the -score equation.
Eq.4)
where
Based on Eq.4, to find the find the z-score corresponding to in a -score table. is equal to the score for. Using this method can be determined for any sample size. In Excel, can be found with the following formula: =ABS.

Calculation

To apply Chauvenet's criterion, first calculate the mean and standard deviation of the observed data. Based on how much the suspect datum differs from the mean, use the normal distribution function to determine the probability that a given data point will be at the value of the suspect data point. Multiply this probability by the number of data points taken. If the result is less than 0.5, the suspicious data point may be discarded, i.e., a reading may be rejected if the probability of obtaining the particular deviation from the mean is less than.

Example

For instance, suppose a value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50. The mean is 16.7 and the standard deviation 16.34. 50 differs from 16.7 by 33.3, slightly more than two standard deviations. The probability of taking data more than two standard deviations from the mean is roughly 0.05. Six measurements were taken, so the statistic value is 0.05×6 = 0.3. Because 0.3 < 0.5, according to Chauvenet's criterion, the measured value of 50 should be discarded.

Peirce's criterion

Another method for eliminating spurious data is called Peirce's criterion. It was developed a few years before Chauvenet's criterion was published, and it is a more rigorous approach to the rational deletion of outlier data. Other methods such as Grubbs's test for outliers are mentioned under the listing for Outlier.

Criticism

Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while Chauvenet's criterion provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known.