In statistics, the variance inflation factor is the quotient of the variance in a model with multiple terms by the variance of a model with one term alone. It quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity. Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.
Definition
Consider the following linear model with k independent variables: The standard error of the estimate of βj is the square root of the j + 1 diagonal element ofs2−1, where s is the root mean squared error ; X is the regressiondesign matrix — a matrix such that Xi, j+1 is the value of the jthindependent variable for the ith case or observation, and such that Xi,1, the predictor vector associated with the intercept term, equals 1 for all i. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expressed as: where Rj2 is the multiple R2 for the regression of Xj on the other covariates. This identity separates the influences of several distinct factors on the variance of the coefficient estimate:
s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates
n: greater sample size results in proportionately less variance in the coefficient estimates
: greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate
The remaining term, 1 / is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector Xj is orthogonal to each column of the design matrix for the regression of Xj on the other covariates. By contrast, the VIF is greater than 1 when the vector Xj is not orthogonal to all columns of the design matrix for the regression of Xj on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables. Now let, and without losing generality, we reorder the columns of X to set the first column to be By using Schur complement, the element in the first row and first column in is, Then we have, Here is the coefficient of regression of dependent variable over covariate. is the corresponding residual sum of squares.
Calculation and analysis
We can calculate k different VIFs in three steps:
Step one
First we run an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation. If i = 1, for example, equation would be where is a constant and e is the error term.
Analyze the magnitude of multicollinearity by considering the size of the. A rule of thumb is that if then multicollinearity is high. Some software instead calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference.
Interpretation
The square root of the variance inflation factor indicates how much larger the standard error increases compared to if that variable had 0 correlation to other predictor variables in the model. Example
If the variance inflation factor of a predictor variable were 5.27, this means that the standard error for the coefficient of that predictor variable is 2.3 times larger than if that predictor variable had 0 correlation with the other predictor variables.
Implementation
vif function in the R package
ols_vif_tol function in the R package
PROC REG in SAS
variance_inflation_factor function in Python package