R language Access Menu

Title Text Both  

Bootstrap

Bootstrapping is a new technique developed with the availability of fast computers. It is based on repeated analysis (can be many thousand times) of data sampled from a dataset with replacement permitted. Hence, in each sample, some of the rows are duplicated while others are left out. Hence, a different dataset is used for analysis each time. The mean and variance (and hence confidence intervals) are determined from the results of these analyses. 

The general code in its simplest form can be as follows: 

Code:

mylist = list()
 for(i in 1:1000) {
         tempdf =  mydf[sample(nrow(mydf), replace = TRUE), ]
         mylist[[length(mylist)+1]] = myfunction(tempdf)
 }

Now the list mylist has 1000 myfunction results from mydf dataset. These can be used to determine the actual mean and variance of myfunction result of the dataset. 

For example, confidence intervals are generally not available with principal component analysis. However, bootstrap method can be used to determine such intervals (see mypcaboot function in section on principal component analysis). 

Package boot is also available for performing bootstrapping with many options. Package car has a 'Boot' function which makes using boot package even easier:

Code:

> library(boot)
> library(car)
> mod = lm(low ~ ., bwdf)
> mod.boot = Boot(mod, R=1000) 
There were 50 or more warnings (use warnings() to see the first 50)
> summary(mod.boot)
               R   original     bootBias    bootSE    bootMed
(Intercept) 1000  1.5106943 -0.013401478 0.2071224  1.4924100
age         1000 -0.0036980  0.000082902 0.0059805 -0.0034782
lwt         1000 -0.0025496  0.000096212 0.0011884 -0.0024901
race2       1000  0.2215816 -0.006814764 0.1080091  0.2177325
race3       1000  0.1443237 -0.001985767 0.0745454  0.1415686
smoke1      1000  0.1598754 -0.002031205 0.0697808  0.1559668
ptl         1000  0.1158016  0.012812215 0.0942253  0.1193426
ht1         1000  0.3663673  0.003270341 0.1467292  0.3651986
ui1         1000  0.1565582  0.003981608 0.1115804  0.1634127
ftv         1000  0.0063247 -0.001549833 0.0325056  0.0049427

> confint(mod.boot)
Bootstrap quantiles, type =  bca 

                    2.5 %        97.5 %
(Intercept)  1.1386882788  1.9373171990
age         -0.0160641588  0.0075915188
lwt         -0.0049298027 -0.0001912868
race2       -0.0009300709  0.4460068226
race3        0.0023235135  0.2995383027
smoke1       0.0373356539  0.2974037232
ptl         -0.0584370222  0.3008824600
ht1          0.0812545047  0.6364267984
ui1         -0.0767000625  0.3758219808
ftv         -0.0554541961  0.0716990599

> hist(mod.boot)

Output graph:

          

References:
Angelo Canty and Brian Ripley (2015). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-16.

car package: John Fox and Sanford Weisberg (2011). An {R} Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. URL: http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
 


    Comments & Feedback