In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators. The theory of U-statistics allows a minimum-variance unbiased estimator to be derived from each unbiased estimator of an estimable parameter for large classes of probability distributions. An estimable parameter is a measurable function of the population's cumulative probability distribution: For example, for every probability distribution, the population median is an estimable parameter. The theory of U-statistics applies to general classes of probability distributions. Many statistics originally derived for particular parametric families have been recognized as U-statistics for general distributions. In non-parametric statistics, the theory of U-statistics is used to establish for statistical procedures and estimators relating to the asymptotic normality and to the variance of such quantities. The theory has been used to study more general statistics as well as stochastic processes, such as random graphs. Suppose that a problem involves independent and identically-distributed random variables and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average of the basic estimator applied to the sub-samples. Sen provides a review of the paper by Wassily Hoeffding, which introduced U-statistics and set out the theory relating to them, and in doing so Sen outlines the importance U-statistics have in statistical theory. Sen says "The impact of Hoeffding is overwhelming at the present time and is very likely to continue in the years to come". Note that the theory of U-statistics is not limited to the case of independent and identically-distributed random variables or to scalar random-variables.
Definition
The term U-statistic, due to Hoeffding, is defined as follows. Let be a real-valued or complex-valued function of variables. For each the associated U-statistic is equal to the average over ordered samples of size of the sample values. In other words,, the average being taken over distinct ordered samples of size taken from. Each U-statistic is necessarily a symmetric function. U-statistics are very natural in statistical work, particularly in Hoeffding's context of independent and identically-distributed random variables, or more generally for exchangeable sequences, such as in simple random sampling from a finite population, where the defining property is termed 'inheritance on the average'. Fisher's k-statistics and Tukey's polykays are examples of homogeneous polynomial U-statistics . For a simple random sampleφ of size n taken from a population of size N, the U-statistic has the property that the average over sample values ƒn is exactly equal to the population value ƒN.
Examples
Some examples: If the U-statistic is the sample mean. If, the U-statistic is the mean pairwise deviation , defined for. If, the U-statistic is the sample variance with divisor, defined for. The third -statistic, the sample skewness defined for, is a U-statistic. The following case highlights an important point. If is the median of three values, is not the median of values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values, not the median of the population. Similar estimates play a central role where the parameters of a family of probability distributions are being estimated by probability weighted moments or L-moments.