Separation (statistics)


In statistics, separation is a phenomenon associated with models for dichotomous or categorical outcomes, including logistic and probit regression. Separation occurs if the predictor is associated with only one outcome value when the predictor is greater than some constant.
For example, if the predictor X is continuous, and the outcome y = 1 for all observed x > 2. If the outcome values are perfectly determined by the predictor then the condition "complete separation" is said to occur. If instead there is some overlap then "quasi-complete separation" occurs. A 2 × 2 table with an empty cell is an example of quasi-complete separation.
This observed form of the data is important because it causes problems with estimated regression coefficients. Loosely, a parameter in the model "wants" to be infinite, if complete separation is observed. If quasi-complete separation is the case, the likelihood is maximized at a very large but not infinite value for that parameter. Computer programs will often output an arbitrarily large parameter estimate with a very large standard error. Methods to fit these models include exact logistic regression and Firth logistic regression, a bias-reduction method based on a penalized likelihood.