Beautiful tables for linear model summaries #rstats

Beautiful HTML tables of linear models

In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN.

There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors. There are examples below that demonstrate these features.

The sjt.lm function prints results and summaries of linear models as HTML-table. These tables can be viewed in the RStudio Viewer pane, web browser or easily exported to office applications. See also my former posts on the table printing functions of my package here and here.

Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. The original online-manual for this function can be found here.

All following tables can be reproduced with the sjPlot package and the sample data set from this package.

Linear model summaries as HTML table

The sjt.lm function prints summaries of linear models (fitted with the lm function) as nicely formatted html-tables.

Before starting, sample data is loaded and sample models are fitted:

# sample data
data(efc)
# set variable labels
efc <- set_var_labels(efc, get_var_labels(efc))
# fit first model
fit1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + c172code, data=efc)
# Note that both models share the same predictors and only differ 
# in their dependent variable. See examples of stepwise models 
# later...

The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2.

sjt.lm(fit1, fit2)
Model 1 Model 2
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Custom labels

You can specify the ‘model’ label via labelDependentVariables parameter:

sjt.lm(fit1,
       fit2,
       labelDependentVariables = c("Barthel-Index",
                                   "Negative Impact"))
Barthel-Index Negative Impact
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

More custom labels

Here is an example how to change the other labels. Note that showHeaderStrings makes the two labels on top and top left corner appear in the table.

sjt.lm(fit1,
       fit2,
       showHeaderStrings = TRUE,
       stringB = "Estimate",
       stringCI = "Conf. Int.",
       stringP = "p-value",
       stringDependentVariables = "Response",
       stringPredictors = "Coefficients",
       stringIntercept = "Konstante",
       labelDependentVariables = c("Barthel-Index",
                                   "Negative Impact"))
Coefficients Response
Barthel-Index Negative Impact
  Estimate Conf. Int. p-value Estimate Conf. Int. p-value
Konstante 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Changing summary style and content

You can change the table style with specific parameters, e.g. to include CI into the same table cell as the estimates, print asterisks instead of numeric p-values etc.

sjt.lm(fit1, fit2,
       separateConfColumn = FALSE, # ci in same cell as estimates
       showStdBeta = TRUE,         # also show standardized beta values
       pvaluesAsNumbers = FALSE)   # "*" instead of numeric values
Model 1 Model 2
  B (CI) std. Beta (CI) B (CI) std. Beta (CI)
(Intercept) 90.06
(77.95 – 102.18) ***
8.46
(6.67 – 10.24) ***
carer’ age -0.22
(-0.36 – -0.08) **
-0.10
(-0.16 – -0.04)
0.01
(-0.01 – 0.03)
0.05
(-0.03 – 0.12)
average number of hours of care for the elder in a week -0.28
(-0.31 – -0.24) ***
-0.48
(-0.54 – -0.41)
0.02
(0.01 – 0.02) ***
0.25
(0.18 – 0.32)
carer’s gender -0.26
(-4.36 – 3.83)
-0.00
(-0.06 – 0.06)
0.57
(-0.03 – 1.17)
0.06
(-0.00 – 0.13)
carer’s level of education: recoding of variable c172edu1 -0.76
(-3.55 – 2.02)
-0.02
(-0.08 – 0.04)
0.44
(0.03 – 0.86) *
0.07
(0.01 – 0.14)
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075
Notes * p<0.05   ** p<0.01   *** p<0.001

Custom variable labels

In the above example, the original variable labels are long and not much pretty. You can change variable labels either with set_var_labels (see this page for more detaila), which will affect all future plots and tables, or pass own labels via labelPredictors.

sjt.lm(fit1, fit2,
       labelPredictors = c("Carer's Age",
                           "Hours of Care",
                           "Carer's Sex",
                           "Educational Status"))
Model 1 Model 2
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
Carer’s Age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
Hours of Care -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
Carer’s Sex -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
Educational Status -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Compare models with different predictors

In some cases, for instance stepwise regressions, you have different predictors on the same response. The proper grouping of predictors, resp. rows, is done automatically.

First, let’s fit some example models.

# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit second model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

Note that printing tables with fitted models, which have different predictors do not automatically detect variable labels (maybe this will be implemented in a future package version).

sjt.lm(fit1, fit2, fit3, 
       separateConfColumn = FALSE,
       showAIC = TRUE,
       showFStat = TRUE)
Model 1 Model 2 Model 3
  B (CI) p B (CI) p B (CI) p
(Intercept) 7.82
(6.00 – 9.65)
< 0.001 8.46
(6.67 – 10.24)
< 0.001 6.23
(4.76 – 7.69)
< 0.001
c160age 0.04
(0.02 – 0.06)
< 0.001 0.01
(-0.01 – 0.03)
0.206 0.01
(-0.01 – 0.03)
0.271
c172code 0.39
(-0.03 – 0.81)
0.071 0.44
(0.03 – 0.86)
0.034 0.24
(-0.15 – 0.64)
0.230
c161sex 0.69
(0.07 – 1.31)
0.028 0.57
(-0.03 – 1.17)
0.061
c12hour 0.02
(0.01 – 0.02)
< 0.001
e42dep 1.50
(1.23 – 1.77)
< 0.001
tot_sc_e 0.21
(0.01 – 0.41)
0.038
Observations 832 832 833
R2 / adj. R2 0.025 / 0.022 0.079 / 0.075 0.153 / 0.148
F-statistics 7.107*** 17.730*** 37.250***
AIC 4611.921 4566.622 4502.333

Automatic grouping of categorical predictors

In case you have categorical variables with more than two factor levels, the sjt.lm function automatically groups the category levels to give a better overview of predictors in the table.

By default, automatic grouping is activated. To disable this feature, use group.pred = FALSE as parameter.

To demonstrate this feature, we first convert two predictors to factors (what they actually are, indeed). To do this, we use the to_fac function, which converts numerical variables into factors, however, does not remove the variable and value label attributes.

# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make dependency categorical
efc$e42dep <- to_fac(efc$e42dep)
# fit first model again (with c172code as factor)
fit1 <- lm(barthtot ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)
# fit second model again (with c172code as factor)
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)

Now we can print the table.

sjt.lm(fit1, fit2)
Model 1 Model 2
  B CI p B CI p
(Intercept) 97.17 88.37 – 105.97 < 0.001 7.76 5.97 – 9.55 < 0.001
carer’ age -0.06 -0.16 – 0.03 0.203 0.00 -0.02 – 0.02 0.683
average number of hours of care for the elder in a week -0.07 -0.10 – -0.04 < 0.001 0.01 0.00 – 0.01 0.015
carer’s level of education: recoding of variable c172edu1
intermediate level of education 1.50 -1.60 – 4.60 0.343 0.13 -0.50 – 0.76 0.689
high level of education 0.66 -3.20 – 4.52 0.738 0.72 -0.07 – 1.51 0.074
carer’s gender 0.09 -2.74 – 2.93 0.949 0.56 -0.02 – 1.13 0.058
how dependent is the elder? – subjective perception of carer
slightly dependent -7.85 -12.86 – -2.83 0.002 1.11 0.09 – 2.13 0.033
moderately dependent -19.49 -24.42 – -14.57 < 0.001 2.37 1.37 – 3.37 < 0.001
severely dependent -56.87 -62.12 – -51.63 < 0.001 3.92 2.86 – 4.99 < 0.001
Observations 821 832
R2 / adj. R2 0.653 / 0.650 0.160 / 0.152

Removing estimates from the output

With remove.estmates, specific estimates can be removed from the table output. This may make sense in case you have stepwise regression models and only want to compare the varying predictors but not the controls. remove.estmates either accepts the row indices of the rows of the table output that should be removed, or the coefficient’s names.

When using numeric indices, the estimates’ index number relates to the same order as coef(fit). Note that currently the intercept cannot be removed from the model output!

data(efc)
# attach variable labels to each variable of the data
# frame - useful for automatic label detection
efc <- set_var_labels(efc, get_var_labels(efc))
# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make education categorical
efc$e42dep <- to_fac(efc$e42dep)
# make prettier variable labels
efc$c172code <- set_var_labels(efc$c172code, "Education")
efc$e42dep <- set_var_labels(efc$e42dep, "Dependency")
# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit third model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

Example 1: Complete table output

Here you have the complete table output. This helps you identify the row index numbers. Especially when you have multiple models with different predictors, the estimate’s position in the last model may differ from this estimate’s position in the table output.

sjt.lm(fit1, fit2, fit3)
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
c160age 0.04 0.02 – 0.06 < 0.001 0.01 -0.01 – 0.03 0.306 0.01 -0.01 – 0.03 0.384
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 2: Remove first coefficient (after intercept)

sjt.lm(fit1, fit2, fit3,
       remove.estimates = 2)
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 3: Remove age and sex

sjt.lm(fit1, fit2, fit3,
       remove.estimates = c("c160age", "c161sex"))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 4: Remove many esimates

sjt.lm(fit1, fit2, fit3,
       remove.estimates = c(2,5,6,10))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 5: Custom predictor labels

In most cases you need to define your own labels when removing estimates, especially when you have grouped categorical predictors, because automatic label detection in quite tricky in such situations. If you provide own labels, please note that grouped predictors’ headings (the variable name of the grouped, categorical variable) are still automatically set by the sjt.lm function (variable labels are used, so use set_var_labels for those categorical predictors). All data rows in the table, i.e. for each coefficient appearing in the model, you need to specify a label string.

In the next example, we have seven table rows with data (excluding intercept): mid and hi education (categories of the variable Education), Hours of Care, slight, moderate and severe dependency (categories of the variable Dependency) and Service Usage. These ‘rows’ need to be labelled.

sjt.lm(fit1, fit2, fit3,
       labelPredictors = c("mid education", 
                           "hi education", 
                           "Hours of Care", 
                           "slight dependency", 
                           "moderate dependency", 
                           "severe dependency", 
                           "Service Usage"),
       remove.estimates = c("c160age", "c161sex"))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
mid education 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
hi education 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
Hours of Care 0.02 0.01 – 0.02 < 0.001
Dependency
slight dependency 1.18 0.16 – 2.20 0.024
moderate dependency 2.53 1.53 – 3.52 < 0.001
severe dependency 4.32 3.31 – 5.33 < 0.001
Service Usage 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147
Advertisements

16 Kommentare zu „Beautiful tables for linear model summaries #rstats

  1. Moin, Daniel! Viele Grüße von nebenan aus W37..!

    Sehr schön Funktionen! 🙂

    Als „feature request“ vielleicht noch folgendes (habe mir aber Deinen Code nicht auf github angesehen, also weiß ich nicht, wieviel Aufwand das wäre):

    Da Du ja sowieso schon die p-Werte formatierst (also als < 0.001), wäre da APA-style-konform, die führende Null wegzumachen, also sowas wie

    paste0(gsub('0\\.', '.', round(p.num, dig = 3)))

    und als Option vielleicht noch "trailing zeros" für die anderen Zahlen um die gleiche Länge zu haben, also sowas wie:

    sprintf("%.3f", round(zahl, dig = 3))

    Viele Grüße, Ulf

    1. Moin Ulf, vielen Dank für den Hinweis mit der führenden Null. Da sprintf meines Wissens die Null immer setzt, müsste ich die Strings anschließend noch mal mit gsub o.ä. bearbeiten, um die Null zu entfernen.
      Die Anzahl der Kommastellen kann bereits via Parameter definiert werden. Übrigens würde ich nicht mit round arbeiten, da 0.100 dann in 0.1 umgewandelt wird, also nicht einheitlich. Du kannst direkt im sprintf die Kommastellen für das Runden angeben: sprintf("%.*f", kommastellen, pwert).
      Viele Grüße
      Daniel

    1. To display the code use following chunk in the document:

      ```{r eval=FALSE}
      sjt.lm(fit1, fit2, fit3)
      ```

      To display the output, you need an inline-chunk:
      `r sjt.lm(fit1, fit2, fit3, no.output=TRUE)$knitr`

      Note that you have to use no.output=TRUE in your function call, so the table is not printed. Instead, you paste the HTML code returned by the function ($knitr). See comprehensive manuals here: http://www.strengejacke.de/sjPlot/
      And for table outputs and knitr-integration see: http://www.strengejacke.de/sjPlot/sjtbasics/

  2. I also have a question. How can i align a table to the center? Like using
    width: auto;
    margin-left: auto;
    margin-right: auto;

    1. You can use the CSS parameter. See online-manual. With cat(sjt.lm(fit, no.output = TRUE)$page.style) you can print the style sheet to the console, so you know, which CSS attributes you can change. By default, the table-style is table { border-collapse:collapse; border:none; }. So you can use sjt.lm(fit, CSS = list(css.table = "+width: auto; margin-left: auto; margin-right: auto;") to add these attributes to the table style (or you leave out the + sign, but see online manual for details).
      However, I’m not sure if this is supported by knitr and / or pandoc, you have to test it.

  3. Thanks for sharing.

    I installed the sjPlot package thus:

    library(devtools)
    devtools::install_github(„sjPlot/devel“)

    I get:

    library(sjPlot)
    data(efc)
    efc <- set_var_labels(efc, get_var_labels(efc))
    fit1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data=efc)
    fit2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + c172code, data=efc)
    sjt.lm(fit1, fit2)

    Error: 'full_join' is not an exported object from 'namespace:dplyr'

  4. Hi Daniel, in `sjt.lm()`, the output table no longer groups my categorical variable (the predictor has 5 categories and is defined as factor). Everything was working fine until last week, but now even when I define ‚group.pred = TRUE‘ the predictors are not grouped. I’m using R3.2.4, sjmisc 1.6, sjPlot 1.9.4, dplyr 0.4.3. Any idea why this could be happening?

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s