R language Access Menu

Title Text Both  

Ridge, Elastic net and Lasso regression

All these can be performed using glmnet package. They are especially useful when multicollinearity is hindering usefulness of linear regression. 

Code:

> myglmnet = function(mydf, ynum, alpha=1, fam='gaussian', legendpos='bottomleft'){
# Note: all variables of mydf should be numeric.
    library(glmnet)
    fit = glmnet(as.matrix(mydf[-ynum]), mydf[,ynum], family=fam)
    fitcv = cv.glmnet(as.matrix(mydf[-ynum]), mydf[,ynum], family=fam)                          
    thetitle = ifelse(alpha==0, 'Ridge Regression', ifelse(alpha==1, 'Lasso Regression', 'Elastic Net Regression'))
    plot(fit, xvar='lambda', col=1:dim(coef(fit))[1], main=thetitle)
    legend(legendpos, legend=names(mydf[-ynum]), col=1:length(mydf[-ynum]), lty=1)       
    abline(v=log(fitcv$lambda.min), col='blue')
    abline(v=log(fitcv$lambda.1se), col='red')
    fit = glmnet(as.matrix(mydf[-ynum]), mydf[,ynum],lambda=cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.min, family=fam)
    coef(fit)
}

> myglmnet(birthwt[-1], 9) # 9th column is the outcome variable
9 x 1 sparse Matrix of class "dgCMatrix"
                      s0
(Intercept) 3128.8603167
age           -0.2455524
lwt            3.4322992
race        -188.3474808
smoke       -358.2180263
ptl          -51.1025809
ht          -600.1910589
ui          -511.0521062
ftv          -15.4442818


The plot shows coefficients of different variables as they are 'stressed'. Those that remain positive at greatest stress are most important. Blue and red vertical lines are two common levels at which the coefficients are assessed. The coefficients printed out by above function correspond to the blue line.

The alpha value in above function is used to perform ridge regression (alpha=0), elastic net (alpha between 0 and 1, e.g. 0.5) or lasso regression (alpha=1) since all these are closely related.  

Output graph:

 

                     

Ridge regression and elastic net regression plots are very similar. 


References:
glmnet package: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/. 
 

 


    Comments & Feedback