##### Citations
Title Text Both

## Stepwise regression

This is often used to determine the most significant predictors amongst a large number of predictors to have a small predictor set:

Code:

> bwdf = mybwdf(F)
> str(bwdf)
'data.frame':   189 obs. of  9 variables:
\$ age  : int  19 33 20 21 18 21 22 17 29 26 ...
\$ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
\$ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
\$ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
\$ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
\$ ht   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
\$ ui   : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
\$ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
\$ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

> mod = lm(bwt~., data=bwdf)
> summary(step(mod, trace=0))

Call:
lm(formula = bwt ~ lwt + race + smoke + ht + ui, data = bwdf)

Residuals:
Min       1Q   Median       3Q      Max
-1842.14  -433.19    67.09   459.21  1631.03

Coefficients:
Estimate Std. Error t value             Pr(>|t|)
(Intercept) 2837.264    243.676  11.644 < 0.0000000000000002 ***
lwt            4.242      1.675   2.532             0.012198 *
race2       -475.058    145.603  -3.263             0.001318 **
race3       -348.150    112.361  -3.099             0.002254 **
smoke1      -356.321    103.444  -3.445             0.000710 ***
ht1         -585.193    199.644  -2.931             0.003810 **
ui1         -525.524    134.675  -3.902             0.000134 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 645.9 on 182 degrees of freedom
Multiple R-squared:  0.2404,    Adjusted R-squared:  0.2154
F-statistic:   9.6 on 6 and 182 DF,  p-value: 0.000000003601

Note that the model has only significant predictors.