Robust measures of scale

In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of numerical data. The most common such statistics are the interquartile range and the median absolute deviation. These are contrasted with conventional measures of scale, such as sample variance or sample standard deviation, which are non-robust, meaning greatly influenced by outliers.
These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation, a defect that is not shared by robust statistics.

IQR and MAD

One of the most common robust measures of scale is the interquartile range, the difference between the 75th percentile and the 25th percentile of a sample; this is the 25% trimmed range, an example of an L-estimator. Other trimmed ranges, such as the interdecile range can also be used.
Another familiar robust measure of scale is the median absolute deviation, the median of the absolute values of the differences between the data values and the overall median of the data set; for a Gaussian distribution, MAD is related to as .

Estimation

Robust measures of scale can be used as estimators of properties of the population, either for parameter estimation or as estimators of their own expected value.
For example, robust estimators of scale are used to estimate the population variance or population standard deviation, generally by multiplying by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation. For example, dividing the IQR by 2 erf⁻¹, makes it an unbiased, consistent estimator for the population standard deviation if the data follow a normal distribution.
In other situations, it makes more sense to think of a robust measure of scale as an estimator of its own expected value, interpreted as an alternative to the population variance or standard deviation as a measure of scale. For example, the MAD of a sample from a standard Cauchy distribution is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.

Efficiency

These robust estimators typically have inferior statistical efficiency compared to conventional estimators for data drawn from a distribution without outliers, but have superior efficiency for data drawn from a mixture distribution or from a heavy-tailed distribution, for which non-robust measures such as the standard deviation should not be used.
For example, for data drawn from the normal distribution, the MAD is 37% as efficient as the sample standard deviation, while the Rousseeuw–Croux estimator Q_n is 88% as efficient as the sample standard deviation.

Absolute pairwise differences

Rousseeuw and Croux propose alternatives to the MAD, motivated by two weaknesses of it:

It is inefficient at Gaussian distributions.
it computes a symmetric statistic about a location estimate, thus not dealing with skewness.

They propose two alternative statistics based on pairwise differences: S_n and Q_n, defined as:
where is a constant depending on.
These can be computed in O time and O space.
Neither of these requires location estimation, as they are based only on differences between values. They are both more efficient than the MAD under a Gaussian distribution: S_n is 58% efficient, while Q_n is 82% efficient.
For a sample from a normal distribution, S_n is approximately unbiased for the population standard deviation even down to very modest sample sizes. For a large sample from a normal distribution, 2.219144465985075864722Q_n is approximately unbiased for the population standard deviation. For small or moderate samples, the expected value of Q_n under a normal distribution depends markedly on the sample size, so finite-sample correction factors are used to calibrate the scale of Q_n.

The biweight midvariance

Like S_n and Q_n, the biweight midvariance aims to be robust without sacrificing too much efficiency. It is defined as
where I is the indicator function, Q is the sample median of the X_i, and
Its square root is a robust estimator of scale, since data points are downweighted as their distance from the median increases, with points more than 9 MAD units from the median having no influence at all.

Extensions

propose a robust depth-based estimator for location and scale simultaneously.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...