Variance Inflation Factor (VIF) filter

This component implements a backward elimination process to remove collinear variables in a regression. It calculates Variance Inflation Factor (VIF) across all numeric variables and filters out numeric columns with VIF greater than a given threshold, to prevent multicollinearity. It repeats the elimination of the column with highest VIF until all variables are below threshold.

Multicollinearity occurs when two or more columns are correlated among each other and provide redundant information when jointly considered as predictors of a model. VIF is used to diagnose the extent of multicollinearity within predictors of a model. For instance, a VIF of 3 tells us that the variance of a column is 3 times larger than it would be if that column was fully uncorrelated with all other predictors.
As a rule of thumb, columns with VIF higher than 5 should be removed as predictors of a model in order to reduce dimensionality while minimizing collinearity (James et al., 2014).

The interactive view of the component shows the latest VIF value for each numeric variable, flagging the ones that have been filtered out during the backward elimination process.

References:
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.

This component is free to use and modify.
Author: Andrea De Mauro, aboutbigdata.net

Options

Columns to be included
Select columns to be kept. Only numeric columns will be considered for VIF calculation and filtering.
Target Variable
Select target variable (will be kept untouched):
VIF Threshold (default: 5)
Maximum Variance Inflation Factor (VIF) for columns to be kept.

Input Ports

Icon
Input table

Output Ports

Icon
Input table with filtered columns

Nodes

Extensions

Links