Family-wise error rate


In statistics, family-wise error rate is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

History

coined the terms experimentwise error rate and "error rate per-experiment" to indicate error rates that the researcher could use as a control level in a multiple hypothesis experiment.

Background

Within the statistical framework, there are several definitions for the term "family":
  1. To take into account the selection effect due to data dredging
  2. To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision
To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made.

Classification of multiple hypothesis tests

Definition

The FWER is the probability of making at least one type I error in the family,
or equivalently,
Thus, by assuring, the probability of making one or more type I errors in the family is controlled at level.
A procedure controls the FWER in the weak sense if the FWER control at level is guaranteed only when all null hypotheses are true.
A procedure controls the FWER in the strong sense if the FWER control at level is guaranteed for any configuration of true and non-true null hypotheses.

Controlling procedures

Some classical solutions that ensure strong level FWER control, and some newer solutions exist.

The Bonferroni procedure

This procedure is uniformly more powerful than the Bonferroni procedure.
The reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense is, because it is a closed testing procedure. As such, each intersection is tested using the simple Bonferroni test.

Hochberg's step-up procedure

Hochberg's step-up procedure is performed using the following steps:
Hochberg's procedure is more powerful than Holms'. Nevertheless, while Holm’s is a closed testing procedure, Hochberg’s is based on the Simes test, so it holds only under non-negative dependence.

Dunnett's correction

described an alternative alpha error adjustment when k groups are compared to the same control group. Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.

Scheffé's method

Resampling procedures

The procedures of Bonferroni and Holm control the FWER under any dependence structure of the p-values. Essentially, this is achieved by accommodating a `worst-case' dependence structure. But such an approach is conservative if dependence is actually positive. To give an extreme example, under perfect positive dependence, there is effectively only one test and thus, the FWER is uninflated.
Accounting for the dependence structure of the p-values produces more powerful procedures. This can be achieved by applying resampling methods, such as bootstrapping and permutations methods. The procedure of Westfall and Young requires a certain condition that does not always hold in practice. The procedures of Romano and Wolf dispense with this condition and are thus more generally valid.

Harmonic mean ''p''-value procedure

The harmonic mean p-value procedure provides a multilevel test that improves on the power of Bonferroni correction by assessing the significance of groups of hypotheses while controlling the strong-sense family-wise error rate. The significance of any subset of the tests is assessed by calculating the HMP for the subset, where are weights that sum to one. An approximate procedure that controls the strong-sense family-wise error rate at level approximately rejects the null hypothesis that none of the p-values in subset are significant when . This approximation is reasonable for small and becomes arbitrarily good as approaches zero. An asymptotically exact test is also available.

Alternative approaches

FWER control exerts a more stringent control over false discovery compared to false discovery rate procedures. FWER control limits the probability of at least one false discovery, whereas FDR control limits the expected proportion of false discoveries. Thus, FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting null hypotheses that are actually true.
On the other hand, FWER control is less stringent than per-family error rate control, which limits the expected number of errors per family. Because FWER control is concerned with at least one false discovery, unlike per-family error rate control it does not treat multiple simultaneous false discoveries as any worse than one false discovery. The Bonferroni correction is often considered as merely controlling the FWER, but in fact also controls the per-family error rate.