R language Access Menu

Title Text Both  

Regression diagnostics

It is important to check whether all assumptions are being met for the regression and a number of tests are available to check this: 

1. check the distribution of the residuals

Code:

    > shapiro.test (glm_data$residuals)
    Shapiro-Wilk normality test

    data:  glm_data$residuals
    W = 0.2666, p-value < 2.2e-16

2. check the autocorrelation of errors using Durbin-Watson test (durbinWatsonTest {car}). This tests if all observations are independent of each other or not.

Code:

    > durbinWatsonTest (glm_data)

    lag Autocorrelation D-W Statistic p-value
    1       0.1091337      1.780399   0.066
    Alternative hypothesis: rho != 0

3. check the heteroscedasticity using Breusch-Pagan test (bptest{lmtest})

Code:

    > bptest (glm_data)

    studentized Breusch-Pagan test

    data:  glm_data
    BP = 3.8858, df = 3, p-value = 0.2741

4. check correlation in my independent variables with vif{car}:

Code:

    > vif (glm_data)

     IV1      IV2      IV3 
    6.078988 1.607718 5.236179 

5. Graphical analysis:

Code:
> bwdf = mybwdf(F)
> str(bwdf)
'data.frame':   189 obs. of  9 variables:
 $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
 $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
 $ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
 $ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
 $ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ht   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ui   : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
 $ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
 $ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

> lm.out = lm(bwt~., data=bwdf) 
> par(mfrow=c(2,2))
> plot(lm.out)

Output graph:

                

 

These plots should show random distribution of residuals a straight QQ line indicating their normal distribution. The rownames of outliers are shown on these plots.


    Comments & Feedback