##### Citations
Title Text Both

## Relative importance of predictors

It is very commonly asked which of the predictors are most important, though different people may have different meanings for 'importance' here. Some of the methods used to determine importance of predictors are:

Coefficients of regression using scaled predictors: For this, continuous or numeric variables should be scaled so that the absolute numbers do not affect the coefficients.

Scaling: For scaling each number is reduced by mean and divided by standard deviation. So the mean and SD of entire series become 0 and 1, respectively. The values obtained are also called z-scores. Each value indicates how many standard deviations it is away from the mean.]

If scaling is done for all numeric variables, then their coefficients become comparable to each other. For example, following is the output of regression without and with scaling of predictors:

Code:

> mod_scaled = lm(bwt~., data=data.frame(scale(birthwt[-1]))) #'low' is removed
> summary(mod_scaled)

Call:
lm(formula = bwt ~ ., data = data.frame(scale(birthwt[-1])))

Residuals:
Min       1Q   Median       3Q      Max
-2.49104 -0.58528  0.02234  0.67479  2.26820

Coefficients:
Estimate             Std. Error t value Pr(>|t|)
(Intercept) -0.0000000000000001216  0.0655292090228068308   0.000  1.00000
age         -0.0019314510221109923  0.0697181015050541281  -0.028  0.97793
lwt          0.1440511878086048747  0.0712847440204966709   2.021  0.04478 *
race        -0.2373758768039808398  0.0727076693963745746  -3.265  0.00131 **
smoke       -0.2405662228364832678  0.0721568953892822995  -3.334  0.00104 **
ptl         -0.0346067011967166049  0.0696837035395908994  -0.497  0.62006
ht          -0.2013869038725481231  0.0685136586364621242  -2.939  0.00372 **
ui          -0.2497246074479218259  0.0685204480986914971  -3.645  0.00035 ***
ftv         -0.0225679283419899825  0.0681802240163703610  -0.331  0.74103
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9009 on 180 degrees of freedom
Multiple R-squared:  0.223,     Adjusted R-squared:  0.1884
F-statistic: 6.456 on 8 and 180 DF,  p-value: 0.0000002232

Note that the coefficients are reasonably close to each other.

Coefficients of regression without scaling: However, comparing coefficients of numeric with those of factor variables can be tricky. Moreover, most people find it difficult to understand the z-score as compared with the raw values. Absolute coefficients obtained from non-scaled data have the advantage of easier understandability.

Code:

> mod = lm(bwt~., data=birthwt[-1])
> summary(mod)

Call:
lm(formula = bwt ~ ., data = birthwt[-1])

Residuals:
Min       1Q   Median       3Q      Max
-1816.51  -426.79    16.29   492.06  1654.01

Coefficients:
Estimate Std. Error t value             Pr(>|t|)
(Intercept) 3129.4594   344.2424   9.091 < 0.0000000000000002 ***
age           -0.2658     9.5947  -0.028              0.97793
lwt            3.4351     1.6999   2.021              0.04478 *
race        -188.4895    57.7339  -3.265              0.00131 **
smoke       -358.4552   107.5172  -3.334              0.00104 **
ptl          -51.1526   103.0003  -0.497              0.62006
ht          -600.6465   204.3454  -2.939              0.00372 **
ui          -511.2513   140.2792  -3.645              0.00035 ***
ftv          -15.5358    46.9354  -0.331              0.74103
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 656.9 on 180 degrees of freedom
Multiple R-squared:  0.223,     Adjusted R-squared:  0.1884
F-statistic: 6.456 on 8 and 180 DF,  p-value: 0.0000002232

Here the coefficients of different predictor vary very widely. Especially predictors for factor columns are very large, since only a unit change there make it a different group, whereas a unit change in age is a very small change. The coefficients are not comparable (though they are clearly understandable).
RelaImpo package: This dedicated package calculates relative importance of predictors by many techniques:

Code:

> bwdf = mybwdf(F)
> mod = lm(bwt~., data=bwdf)
> library(relaimpo)
> res_relimp = calc.relimp(mod, type=c('lmg','last','first'), rela=T)
> plot(res_relimp)

Output graph:

Eta-squared:

This is also called partial R-squared. These can be easily obtained using heplots package of R:

Code:

> library(heplots)
> eta_values = etasq(lm(bwt~., bwdf))[-length(bwdf),]
> names(eta_values) = names(bwdf)[-9]
> round(eta_values,4)
age    lwt   race  smoke    ptl     ht     ui    ftv
0.0008 0.0340 0.0800 0.0576 0.0013 0.0458 0.0716 0.0005

> barplot(eta_values)

Output graph:

randomForest package:

Importance estimates can also be obtained using randomForest package:

Code:

> bwdf = mybwdf(F)
> library(randomForest)
> fit = randomForest(bwt~., data=bwdf, importance=TRUE)
> importance(fit)
%IncMSE IncNodePurity
age    0.9660325      14684397
lwt   10.5645799      16485245
race   6.6502585       6260713
smoke 10.9118832       4582938
ptl    5.1527217       5100336
ht    -1.4076003       2945358
ui    15.0868715       7420857
ftv    0.2590884       4759060

> varImpPlot(fit)

Output graph:

References:
Ulrike Grömping (2006). Relative Importance for Linear Regression in R: The Package relaimpo. Journal of Statistical Software, 17(1), 1–27.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18–22.