Sample mean and covariance

The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables.
The sample mean and sample covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the sample was taken.
The sample mean is a vector each of whose elements is the sample mean of one of the random variablesthat is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose i, j element is the sample covariance between the sets of observed values of two of the variables and whose i, i element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number and the sample covariance matrix is also simply a single value.
Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics and applications to numerically represent the location and dispersion, respectively, of a distribution.

Sample mean

Let be the i^th independently drawn observation on the j^th random variable. These observations can be arranged into N
column vectors, each with K entries, with the K ×1 column vector giving the i^th observations of all variables being denoted .
The sample mean vector is a column vector whose j^th element is the average value of the N observations of the j^th variable:
Thus, the sample mean vector contains the average of the observations for each variable, and is written

Sample covariance

The sample covariance matrix is a K-by-K matrix with entries
where is an estimate of the covariance between the ^th
variable and the ^th variable of the population underlying the data.
In terms of the observation vectors, the sample covariance is
Alternatively, arranging the observation vectors as the columns of a matrix, so that
which is a matrix of K rows and N columns.
Here, the sample covariance matrix can be computed as
where is an N by vector of ones.
If the observations are arranged as rows instead of columns, so is now a 1×K row vector and is an N×K matrix whose column j is the vector of N observations on variable j, then applying transposes
in the appropriate places yields
Like covariance matrices for random vector, sample covariance matrices are positive semi-definite. To prove it, note that for any matrix the matrix is positive semi-definite. Furthermore, a covariance matrix is positive definite if and only if the rank of the vectors is K.

Unbiasedness

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector, a row vector whose j^th element is one of the random variables. The sample covariance matrix has in the denominator rather than due to a variant of Bessel's correction: In short, the sample covariance relies on the difference between each observation and the sample mean, but the sample mean is slightly correlated with each observation since it is defined in terms of all observations. If the population mean is known, the analogous unbiased estimate
using the population mean, has in the denominator. This is an example of why in probability and statistics it is essential to distinguish between random variables and realizations of the random variables.
The maximum likelihood estimate of the covariance
for the Gaussian distribution case has N in the denominator as well. The ratio of 1/N to 1/ approaches 1 for large N, so the maximum likelihood estimate approximately equals the unbiased estimate when the sample is large.

Variance of the sampling distribution of the sample mean

For each random variable, the sample mean is a good estimator of the population mean, where a "good" estimator is defined as being efficient and unbiased. Of course the estimator will likely not be the true value of the population mean since different samples drawn from the same distribution will give different sample means and hence different estimates of the true mean. Thus the sample mean is a random variable, not a constant, and consequently has its own distribution. For a random sample of N observations on the j^th random variable, the sample mean's distribution itself has mean equal to the population mean and variance equal to , where is the population variance.

Weighted samples

In a weighted sample, each vector is assigned a weight. Without loss of generality, assume that the weights are normalized:
.
Then the weighted mean vector is given by
and the elements of the weighted covariance matrix are
If all weights are the same,, the weighted mean and covariance reduce to the sample mean and covariance mentioned above.

Criticism

The sample mean and sample covariance are not robust statistics, meaning that they are sensitive to outliers. As robustness is often a desired trait, particularly in real-world applications, robust alternatives may prove desirable, notably quantile-based statistics such as the sample median for location, and interquartile range for dispersion. Other alternatives include trimming and Winsorising, as in the trimmed mean and the Winsorized mean.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...