##### Citations
Title Text Both

## Regression diagnostics

It is important to check whether all assumptions are being met for the regression and a number of tests are available to check this:

1. check the distribution of the residuals

Code:

> shapiro.test (glm_data\$residuals)
Shapiro-Wilk normality test

data:  glm_data\$residuals
W = 0.2666, p-value < 2.2e-16

2. check the autocorrelation of errors using Durbin-Watson test (durbinWatsonTest {car}). This tests if all observations are independent of each other or not.

Code:

> durbinWatsonTest (glm_data)

lag Autocorrelation D-W Statistic p-value
1       0.1091337      1.780399   0.066
Alternative hypothesis: rho != 0

3. check the heteroscedasticity using Breusch-Pagan test (bptest{lmtest})

Code:

> bptest (glm_data)

studentized Breusch-Pagan test

data:  glm_data
BP = 3.8858, df = 3, p-value = 0.2741

4. check correlation in my independent variables with vif{car}:

Code:

> vif (glm_data)

IV1      IV2      IV3
6.078988 1.607718 5.236179

5. Graphical analysis:

Code:
> bwdf = mybwdf(F)
> str(bwdf)
'data.frame':   189 obs. of  9 variables:
\$ age  : int  19 33 20 21 18 21 22 17 29 26 ...
\$ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
\$ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
\$ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
\$ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
\$ ht   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
\$ ui   : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
\$ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
\$ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

> lm.out = lm(bwt~., data=bwdf)
> par(mfrow=c(2,2))
> plot(lm.out)

Output graph:

These plots should show random distribution of residuals a straight QQ line indicating their normal distribution. The rownames of outliers are shown on these plots.