R language Access Menu

Title Text Both  

Multiple regression for binary outcome

Logistic or logit regression is the technique used if dependent or outcome variable is binary in nature. For example we can find the effect of different predictors on 'low' variable of bwdf dataset using glm function as follows:

Code:

> bwdf = mybwdf()           
> str(bwdf)
'data.frame':   189 obs. of  9 variables:
 $ low  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
 $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
 $ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
 $ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
 $ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ht   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ui   : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
 $ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...

> res = glm(low~., data=bwdf, family=binomial)
> summary(res)

Call:
glm(formula = low ~ ., family = binomial, data = bwdf)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8946  -0.8212  -0.5316   0.9818   2.2125  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.480623   1.196888   0.402  0.68801   
age         -0.029549   0.037031  -0.798  0.42489   
lwt         -0.015424   0.006919  -2.229  0.02580 * 
race2        1.272260   0.527357   2.413  0.01584 * 
race3        0.880496   0.440778   1.998  0.04576 * 
smoke1       0.938846   0.402147   2.335  0.01957 * 
ptl          0.543337   0.345403   1.573  0.11571   
ht1          1.863303   0.697533   2.671  0.00756 **
ui1          0.767648   0.459318   1.671  0.09467 . 
ftv          0.065302   0.172394   0.379  0.70484   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 201.28  on 179  degrees of freedom
AIC: 221.28

Number of Fisher Scoring iterations: 4 

The predictors are similar to linear regression performed above (since low is the categorical variable derived from continuous variable bwt). For logistic regression, odds ratios can be calculated from estimates of each predictor since these estimates indicate log of odds ratios. These odds ratios can be plotted on a graph using following function: 

Code:

rnglmcoefplot = function(mod, stitle='Odds ratios from Multiple regression'){
    library(broom)
    tt = tidy(mod)
    tt = tt[-1,]
    tt$estimate = exp(tt$estimate)

    cc = exp(confint(mod, level=0.95))
    cc = data.frame(cc)
    cc = cc[-1,]
    cc = na.omit(cc)
    print(cc)
    cat('----------------------------------------\n')

    names(tt) = c('term','estimate','std.error','statistic','p.value')
    tt$p.value = round(tt$p.value,9)
    tt$upper = cc[,2]
    tt$lower = cc[,1]
    
    print(tt)
    cat('----------------------------------------------------\n')
    pstr = paste0('OR=',round(tt$estimate,2),' (',rnpstr(tt$p.value),')')
    
    dd = data.frame(var=tt$term, value=tt$estimate, lower=tt$lower, upper=tt$upper, text=pstr)
    print(dd)
    cat('====================================================\n')
    rnggforest(dd, stitle, vertical_at=1, logscale=T)
}

Output graph:

 

               

Odds ratio of 1 indicates no effect, less than 1 indicates lower risk while greater than 1 indicates increased risk.

 

 


    Comments & Feedback