Pipe-friendly bootstrapping with list-variables in #rstats

A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely bootstrap(), boot_ci(), boot_se() and boot_p(). In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

library(dplyr)
library(sjstats)

Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:

data(efc)
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)

I demonstrate the boot_ci()-function, so the confidence intervals are of interest here:

confint(fit)

>                  2.5 %    97.5 %
> (Intercept)  5.3374378 7.7794162
> e42dep       1.2929451 1.7964296
> c161sex     -0.1193198 0.9871336

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the bootstrap()-function generates bootstrap replicates and returns a data-frame with just one column, $strap, which is a list-variable with bootstrap samples:

bootstrap(efc, 1000)

This is how the list-variable looks like:

# A tibble: 1,000 x 1
                     strap
                    <list>
1  <data.frame [908 x 26]>
2  <data.frame [908 x 26]>
3  <data.frame [908 x 26]>
4  <data.frame [908 x 26]>
5  <data.frame [908 x 26]>
6  <data.frame [908 x 26]>
7  <data.frame [908 x 26]>
8  <data.frame [908 x 26]>
9  <data.frame [908 x 26]>
10 <data.frame [908 x 26]>
# ... with 990 more rows

Since all data frames are saved in a list, you can use lapply() to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using lapply() again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the boot_ci()-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

efc %>% 
  # generate bootstrape replicates, saved in
  # the list-variable 'strap'
  bootstrap(1000) %>% 
  # run linear model on all bootstrap samples
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  # extract coefficient for "e42dep" (dependency) variable
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  # compute boostrapped confidence intervals
  boot_ci(dependency)

And the result (depending on your seed()) is:

conf.low conf.high 
1.303847  1.790724

The complete code, put together:

library(dplyr)
library(sjstats)
data(efc)

fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)

efc %>% 
  bootstrap(1000) %>% 
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  boot_ci(dependency)

7 Kommentare zu „Pipe-friendly bootstrapping with list-variables in #rstats

  1. Hello Daniel!

    Could you please explain one simple thing I just didn’t got the trick:
    How did you make the dataset „efc“ include not only column names, but some descriptions as well? I mean first variable for example has a column name „c12hour“, and additional description „average number of hours of care per week“ – how’s this description created and maintained?
    I don’t see such descriptions when exporting the data into .csv

    Thanks a lot!

    1. Found the answer! Something new for me

      > attributes(efc$c12hour)
      $label
      [1] „average number of hours of care per week“

  2. It’s even more readable using the purrr package.

    efc %>%
    bootstrap(1000) %>% .$strap %>% map(~lm(neg_c_7 ~ e42dep + c161sex, .)) %>%
    map(~coef(.)) %>% map(~boot_ci(.))

Kommentare sind geschlossen.