CRAN download statistics of any packages #rstats

Hadley Wickham announced at Twitter that RStudio now provides CRAN package download logs. I was wondering about the download numbers of my package and wrote some code to extract that information from the logs…

The first code snippet is taken from the log website itself:

# Here's an easy way to get all the URLs in R
start <- as.Date('2013-11-28')
today <- as.Date('2015-03-04')

all_days <- seq(start, today, by = 'day')

year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')

Then I downloaded all files into a folder:

for (i in 1:length(urls)) {
  download.file(urls[i], sprintf("~/Desktop/rstats/temp%i.csv.gz", i))
}

Unzipping did not work with unzip, so I just “opened” all files with the OS X unarchiver, which was quite convenient.

Than I read all csv-files and extracted the information for my package, sjPlot, from each csv-file and merged everything into one data frame:

sjPlot.df <- data.frame()
library(dplyr)
pb <- txtProgressBar(min=0, max=length(urls), style=3)

for (i in 1:length(urls)) {
  df.csv <- read.csv(sprintf("~/Desktop/rstats/temp%i.csv", i))
  pack <- tolower(as.character(df.csv$package))
  my.package <- which(pack == "sjplot")
  if (length(my.package) > 0 ) {
    dummy.df <- df.csv %>% dplyr::slice(my.package) %>% dplyr::select(date, package, version, country)
    sjPlot.df <- dplyr::bind_rows(sjPlot.df, dummy.df)
  }
  setTxtProgressBar(pb, i)
}
close(pb)
sjPlot.df$date.short <- strftime(sjPlot.df$date, format="%Y-%m")

Finally, the download-stats as plot:

library(sjPlot)
library(ggplot2)

mydf <- sjPlot.df %>% dplyr::count(date.short)

sjp.setTheme(theme = "539", axis.angle.x = 90)
ggplot(mydf, aes(x = date.short, y = n)) +
  geom_bar(stat = "identity", width = .5, alpha = .5, fill = "#3399cc") +
  scale_y_continuous(expand = c(0, 0), breaks = seq(250, 1500, 250)) +
  labs(x = sprintf("Monthly CRAN-downloads of sjPlot package since first release until 4th March (total download: %i)", sum(mydf$n)), y = NULL)

sjPlot-downloads

By the way, there’s already a shiny app for this…

CRAN download statistics of any packages #rstats

Beautiful tables for linear model summaries #rstats

Beautiful HTML tables of linear models

In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN.

There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors. There are examples below that demonstrate these features.

The sjt.lm function prints results and summaries of linear models as HTML-table. These tables can be viewed in the RStudio Viewer pane, web browser or easily exported to office applications. See also my former posts on the table printing functions of my package here and here.

Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. The original online-manual for this function can be found here.

All following tables can be reproduced with the sjPlot package and the sample data set from this package.

Linear model summaries as HTML table

The sjt.lm function prints summaries of linear models (fitted with the lm function) as nicely formatted html-tables.

Before starting, sample data is loaded and sample models are fitted:

# sample data
data(efc)
# set variable labels
efc <- set_var_labels(efc, get_var_labels(efc))
# fit first model
fit1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + c172code, data=efc)
# Note that both models share the same predictors and only differ 
# in their dependent variable. See examples of stepwise models 
# later...

The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2.

sjt.lm(fit1, fit2)
Model 1 Model 2
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Custom labels

You can specify the ‘model’ label via labelDependentVariables parameter:

sjt.lm(fit1,
       fit2,
       labelDependentVariables = c("Barthel-Index",
                                   "Negative Impact"))
Barthel-Index Negative Impact
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

More custom labels

Here is an example how to change the other labels. Note that showHeaderStrings makes the two labels on top and top left corner appear in the table.

sjt.lm(fit1,
       fit2,
       showHeaderStrings = TRUE,
       stringB = "Estimate",
       stringCI = "Conf. Int.",
       stringP = "p-value",
       stringDependentVariables = "Response",
       stringPredictors = "Coefficients",
       stringIntercept = "Konstante",
       labelDependentVariables = c("Barthel-Index",
                                   "Negative Impact"))
Coefficients Response
Barthel-Index Negative Impact
  Estimate Conf. Int. p-value Estimate Conf. Int. p-value
Konstante 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Changing summary style and content

You can change the table style with specific parameters, e.g. to include CI into the same table cell as the estimates, print asterisks instead of numeric p-values etc.

sjt.lm(fit1, fit2,
       separateConfColumn = FALSE, # ci in same cell as estimates
       showStdBeta = TRUE,         # also show standardized beta values
       pvaluesAsNumbers = FALSE)   # "*" instead of numeric values
Model 1 Model 2
  B (CI) std. Beta (CI) B (CI) std. Beta (CI)
(Intercept) 90.06
(77.95 – 102.18) ***
8.46
(6.67 – 10.24) ***
carer’ age -0.22
(-0.36 – -0.08) **
-0.10
(-0.16 – -0.04)
0.01
(-0.01 – 0.03)
0.05
(-0.03 – 0.12)
average number of hours of care for the elder in a week -0.28
(-0.31 – -0.24) ***
-0.48
(-0.54 – -0.41)
0.02
(0.01 – 0.02) ***
0.25
(0.18 – 0.32)
carer’s gender -0.26
(-4.36 – 3.83)
-0.00
(-0.06 – 0.06)
0.57
(-0.03 – 1.17)
0.06
(-0.00 – 0.13)
carer’s level of education: recoding of variable c172edu1 -0.76
(-3.55 – 2.02)
-0.02
(-0.08 – 0.04)
0.44
(0.03 – 0.86) *
0.07
(0.01 – 0.14)
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075
Notes * p<0.05   ** p<0.01   *** p<0.001

Custom variable labels

In the above example, the original variable labels are long and not much pretty. You can change variable labels either with set_var_labels (see this page for more detaila), which will affect all future plots and tables, or pass own labels via labelPredictors.

sjt.lm(fit1, fit2,
       labelPredictors = c("Carer's Age",
                           "Hours of Care",
                           "Carer's Sex",
                           "Educational Status"))
Model 1 Model 2
  B CI p B CI p
(Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001
Carer’s Age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206
Hours of Care -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001
Carer’s Sex -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061
Educational Status -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034
Observations 821 832
R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

Compare models with different predictors

In some cases, for instance stepwise regressions, you have different predictors on the same response. The proper grouping of predictors, resp. rows, is done automatically.

First, let’s fit some example models.

# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit second model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

Note that printing tables with fitted models, which have different predictors do not automatically detect variable labels (maybe this will be implemented in a future package version).

sjt.lm(fit1, fit2, fit3, 
       separateConfColumn = FALSE,
       showAIC = TRUE,
       showFStat = TRUE)
Model 1 Model 2 Model 3
  B (CI) p B (CI) p B (CI) p
(Intercept) 7.82
(6.00 – 9.65)
< 0.001 8.46
(6.67 – 10.24)
< 0.001 6.23
(4.76 – 7.69)
< 0.001
c160age 0.04
(0.02 – 0.06)
< 0.001 0.01
(-0.01 – 0.03)
0.206 0.01
(-0.01 – 0.03)
0.271
c172code 0.39
(-0.03 – 0.81)
0.071 0.44
(0.03 – 0.86)
0.034 0.24
(-0.15 – 0.64)
0.230
c161sex 0.69
(0.07 – 1.31)
0.028 0.57
(-0.03 – 1.17)
0.061
c12hour 0.02
(0.01 – 0.02)
< 0.001
e42dep 1.50
(1.23 – 1.77)
< 0.001
tot_sc_e 0.21
(0.01 – 0.41)
0.038
Observations 832 832 833
R2 / adj. R2 0.025 / 0.022 0.079 / 0.075 0.153 / 0.148
F-statistics 7.107*** 17.730*** 37.250***
AIC 4611.921 4566.622 4502.333

Automatic grouping of categorical predictors

In case you have categorical variables with more than two factor levels, the sjt.lm function automatically groups the category levels to give a better overview of predictors in the table.

By default, automatic grouping is activated. To disable this feature, use group.pred = FALSE as parameter.

To demonstrate this feature, we first convert two predictors to factors (what they actually are, indeed). To do this, we use the to_fac function, which converts numerical variables into factors, however, does not remove the variable and value label attributes.

# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make dependency categorical
efc$e42dep <- to_fac(efc$e42dep)
# fit first model again (with c172code as factor)
fit1 <- lm(barthtot ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)
# fit second model again (with c172code as factor)
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)

Now we can print the table.

sjt.lm(fit1, fit2)
Model 1 Model 2
  B CI p B CI p
(Intercept) 97.17 88.37 – 105.97 < 0.001 7.76 5.97 – 9.55 < 0.001
carer’ age -0.06 -0.16 – 0.03 0.203 0.00 -0.02 – 0.02 0.683
average number of hours of care for the elder in a week -0.07 -0.10 – -0.04 < 0.001 0.01 0.00 – 0.01 0.015
carer’s level of education: recoding of variable c172edu1
intermediate level of education 1.50 -1.60 – 4.60 0.343 0.13 -0.50 – 0.76 0.689
high level of education 0.66 -3.20 – 4.52 0.738 0.72 -0.07 – 1.51 0.074
carer’s gender 0.09 -2.74 – 2.93 0.949 0.56 -0.02 – 1.13 0.058
how dependent is the elder? – subjective perception of carer
slightly dependent -7.85 -12.86 – -2.83 0.002 1.11 0.09 – 2.13 0.033
moderately dependent -19.49 -24.42 – -14.57 < 0.001 2.37 1.37 – 3.37 < 0.001
severely dependent -56.87 -62.12 – -51.63 < 0.001 3.92 2.86 – 4.99 < 0.001
Observations 821 832
R2 / adj. R2 0.653 / 0.650 0.160 / 0.152

Removing estimates from the output

With remove.estmates, specific estimates can be removed from the table output. This may make sense in case you have stepwise regression models and only want to compare the varying predictors but not the controls. remove.estmates either accepts the row indices of the rows of the table output that should be removed, or the coefficient’s names.

When using numeric indices, the estimates’ index number relates to the same order as coef(fit). Note that currently the intercept cannot be removed from the model output!

data(efc)
# attach variable labels to each variable of the data
# frame - useful for automatic label detection
efc <- set_var_labels(efc, get_var_labels(efc))
# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make education categorical
efc$e42dep <- to_fac(efc$e42dep)
# make prettier variable labels
efc$c172code <- set_var_labels(efc$c172code, "Education")
efc$e42dep <- set_var_labels(efc$e42dep, "Dependency")
# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit third model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

Example 1: Complete table output

Here you have the complete table output. This helps you identify the row index numbers. Especially when you have multiple models with different predictors, the estimate’s position in the last model may differ from this estimate’s position in the table output.

sjt.lm(fit1, fit2, fit3)
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
c160age 0.04 0.02 – 0.06 < 0.001 0.01 -0.01 – 0.03 0.306 0.01 -0.01 – 0.03 0.384
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 2: Remove first coefficient (after intercept)

sjt.lm(fit1, fit2, fit3,
       remove.estimates = 2)
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 3: Remove age and sex

sjt.lm(fit1, fit2, fit3,
       remove.estimates = c("c160age", "c161sex"))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
c12hour 0.02 0.01 – 0.02 < 0.001
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
tot_sc_e 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 4: Remove many esimates

sjt.lm(fit1, fit2, fit3,
       remove.estimates = c(2,5,6,10))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
Dependency
e42dep2 1.18 0.16 – 2.20 0.024
e42dep3 2.53 1.53 – 3.52 < 0.001
e42dep4 4.32 3.31 – 5.33 < 0.001
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Example 5: Custom predictor labels

In most cases you need to define your own labels when removing estimates, especially when you have grouped categorical predictors, because automatic label detection in quite tricky in such situations. If you provide own labels, please note that grouped predictors’ headings (the variable name of the grouped, categorical variable) are still automatically set by the sjt.lm function (variable labels are used, so use set_var_labels for those categorical predictors). All data rows in the table, i.e. for each coefficient appearing in the model, you need to specify a label string.

In the next example, we have seven table rows with data (excluding intercept): mid and hi education (categories of the variable Education), Hours of Care, slight, moderate and severe dependency (categories of the variable Dependency) and Service Usage. These ‘rows’ need to be labelled.

sjt.lm(fit1, fit2, fit3,
       labelPredictors = c("mid education", 
                           "hi education", 
                           "Hours of Care", 
                           "slight dependency", 
                           "moderate dependency", 
                           "severe dependency", 
                           "Service Usage"),
       remove.estimates = c("c160age", "c161sex"))
Model 1 Model 2 Model 3
  B CI p B CI p B CI p
(Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001
Education
mid education 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806
hi education 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203
Hours of Care 0.02 0.01 – 0.02 < 0.001
Dependency
slight dependency 1.18 0.16 – 2.20 0.024
moderate dependency 2.53 1.53 – 3.52 < 0.001
severe dependency 4.32 3.31 – 5.33 < 0.001
Service Usage 0.21 0.01 – 0.41 0.042
Observations 832 832 833
R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147
Beautiful tables for linear model summaries #rstats

sjPlot package and related online manuals updated #rstats # ggplot

My sjPlot package for data visualization has just been updated on CRAN. I’ve added some features to existing function, which I want to introduce here.

Plotting linear models

So far, plotting model assumptions of linear models or plotting slopes for each estimate of linear models were spread over several functions. Now, these plot types have been integrated into the sjp.lm function, where you can select the plot type with the type parameter. Furthermore, plotting standardized coefficients now also plot the related confidence intervals.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lm

Plotting generalized linear models

Beside odds ratios, you now can also plot the predicted probabilities of the outcome for each predictor of generalized linear models. In case you have continuous variables, these kind of plots may be more intuitive than an odds ratio value.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.glm

Plotting (generalized) linear mixed effects models

The plotting function for creating plots of (generalized) linear mixed effects models (sjp.lmer and sjp.glmer) also got new plot types over the course of the last weeks.

For sjp.lmer, we have

  • re (default) for estimates of random effects
  • fe for estimates of fixed effects
  • fe.std for standardized estimates of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.ri for fixed effects slopes depending on the random intercept.

and for sjp.glmer, we have

  • re (default) for odds ratios of random effects
  • fe for odds ratios of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.pc or fe.prob to plot probability curves (predicted probabilities) of all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.
  • ri.pc or ri.prob to plot probability curves (predicted probabilities) of random intercept variances for all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lmer and www.strengejacke.de/sjPlot/sjp.glmer

Plotting interaction terms of (generalized) linear (mixed effects) models

Another function, where new features were added, is sjp.int (formerly known as sjp.lm.int). This function is now kind of generic and can plot interactions of

  • linar models (lm)
  • generalized linar models (glm)
  • linar mixed effects models (lme4::lmer)
  • generalized linar mixed effects models (lme4::glmer)

For linear models (both normal and mixed effects), slopes of interaction terms are plotted. For generalized linear models, the predicted probabilities of the outcome towards the interaction terms is plotted.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.int

Plotting Likert scales

Finally, a comprehensive documentation for the sjp.likert function is finsihed, which can be found here:
www.strengejacke.de/sjPlot/sjp.likert

sjPlot package and related online manuals updated #rstats # ggplot

Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4

In the first part on visualizing (generalized) linear mixed effects models, I showed examples of the new functions in the sjPlot package to visualize fixed and random effects (estimates and odds ratios) of (g)lmer results. Meanwhile, I added further features to the functions, which I like to introduce here. This posting is based on the online manual of the sjPlot package.

In this posting, I’d like to give examples for diagnostic and probability plots of odds ratios. The latter examples, of course, only refer to the sjp.glmer function (generalized mixed models). To reproduce these examples, you need the version 1.59 (or higher) of the package, which can be found at GitHub. A submission to CRAN is planned for the next days…

Fitting example models

The following examples are based on two fitted mixed models:

# fit model
library(lme4)
# create binary response
sleepstudy$Reaction.dicho <- sju.dicho(sleepstudy$Reaction, 
                                       dichBy = "md")
# fit first model
fit <- glmer(Reaction.dicho ~ Days + (Days | Subject),
             sleepstudy,
             family = binomial("logit"))

data(efc)
# create binary response
efc$hi_qol <- sju.dicho(efc$quol_5)
# prepare group variable
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- sji.getValueLabels(efc$e15relat)
# data frame for 2nd fitted model
mydf <- na.omit(data.frame(hi_qol = as.factor(efc$hi_qol),
                           sex = as.factor(efc$c161sex),
                           c12hour = as.numeric(efc$c12hour),
                           neg_c_7 = as.numeric(efc$neg_c_7),
                           grp = efc$grp))
# fit 2nd model
fit2 <- glmer(hi_qol ~ sex + c12hour + neg_c_7 + (1|grp),
              data = mydf,
              family = binomial("logit"))

Summary fit1

Formula: Reaction.dicho ~ Days + (Days | Subject)
   Data: sleepstudy

     AIC      BIC   logLik deviance df.resid 
   158.7    174.7    -74.4    148.7      175 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2406 -0.2726 -0.0198  0.2766  2.9705 

Random effects:
 Groups  Name        Variance Std.Dev. Corr 
 Subject (Intercept) 8.0287   2.8335        
         Days        0.2397   0.4896   -0.19
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.8159     1.1728  -3.254 0.001139 ** 
Days          0.8908     0.2347   3.796 0.000147 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
Days -0.694

Summary fit2

Formula: hi_qol ~ sex + c12hour + neg_c_7 + (1 | grp)
   Data: mydf

     AIC      BIC   logLik deviance df.resid 
  1065.3   1089.2   -527.6   1055.3      881 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7460 -0.8139 -0.2688  0.7706  6.6464 

Random effects:
 Groups Name        Variance Std.Dev.
 grp    (Intercept) 0.08676  0.2945  
Number of obs: 886, groups:  grp, 8

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.179036   0.333940   9.520  < 2e-16 ***
sex2        -0.545282   0.178974  -3.047  0.00231 ** 
c12hour     -0.005148   0.001720  -2.992  0.00277 ** 
neg_c_7     -0.219586   0.024108  -9.109  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
        (Intr) sex2   c12hor
sex2    -0.410              
c12hour -0.057 -0.048       
neg_c_7 -0.765 -0.009 -0.116

Diagnostic plots

Two new functions are added to both sjp.lmer and sjp.glmer, hence they apply to linear and generalized linear mixed models, fitted with the lme4 package. The examples only refer to the sjp.glmer function.

Currently, there are two type options to plot diagnostic plots: type = "fe.cor" to plot a correlation matrix between fixed effects and type = "re.qq" to plot a qq-plot of random effects.

Correlation matrix of fixed effects

To plot a correlation matrix of the fixed effects, use type = "fe.cor".

# plot fixed effects correlation matrix
sjp.glmer(fit2, type = "fe.cor")

unnamed-chunk-11-1

qq-plot of random effects

Another diagnostic plot is the qq-plot for random effects. Use type = "re.qq" to plot random against standard quantiles. The dots should be plotted along the line.

# plot qq-plot of random effects
sjp.glmer(fit, type = "re.qq")

unnamed-chunk-13-1

Probability curves of odds ratios

These plotting functions have been implemented to easier interprete odds ratios, especially for continuous covariates, by plotting the probabilities of predictors.

Probabilities of fixed effects

With type = "fe.pc" (or type = "fe.prob"), probability plots for each covariate can be plotted. These probabilties are based on the fixed effects intercept. One plot per covariate is plotted.

The model fit2 has one binary and two continuous covariates:

# plot probability curve of fixed effects
sjp.glmer(fit2, type = "fe.pc")

unnamed-chunk-9-1

Probabilities of fixed effects depending on grouping level (random intercept)

With type = "ri.pc" (or type = "ri.prob"), probability plots for each covariate can be plotted, depending on the grouping level from the random intercept. Thus, for each covariate a plot for each grouping levels is plotted. Furthermore, with the show.se the standard error of probabilities can be shown. In this example, only the plot for one covariate is shown, not for all.

# plot probability curves for each covariate
# grouped by random intercepts
sjp.glmer(fit2,
          type = "ri.pc",
          show.se = TRUE)

unnamed-chunk-8-2

Instead of faceting plots, all grouping levels can be shown in one plot:

# plot probability curves for each covariate
# grouped by random intercepts
sjp.glmer(fit2,
          type = "ri.pc",
          facet.grid = FALSE)

unnamed-chunk-10-2

Outlook

These will be the new features for the next package update. For later updates, I’m also planning to plot interaction terms of (generalized) linear mixed models, similar to the existing function for visualizing interaction terms in linear models.

Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4

Visualizing (generalized) linear mixed effects models with ggplot #rstats #lme4

In the past week, colleagues of mine and me started using the lme4-package to compute multi level models. This inspired me doing two new functions for visualizing random effects (as retrieved by ranef()) and fixed effects (as retrieved by fixef()) of (generalized) linear mixed effect models.

The upcoming version of my sjPlot package will contain two new functions to plot fitted lmer and glmer models from the lme4 package: sjp.lmer and sjp.glmer (not that surprising function names). Since I’m new to mixed effects models, I would appreciate any suggestions on how to improve the functions, which results are important to report (plot) and so on. Furthermore, I’m not sure whether my approach of computing confident intervals for random effects is the best?

I have used following code to compute confident intervals for the estimates returned by the lme4::ranef() function (bases on this stackoverflow answer):

coev <- as.matrix(lme4::vcov.merMod(fit))
tmp <- as.data.frame(cbind(OR = exp(mydf.ef[,i]),
                     lower.CI = exp(mydf.ef[,i] - (1.96 * sqrt(diag(coev))[i])),
                     upper.CI = exp(mydf.ef[,i] + (1.96 * sqrt(diag(coev))[i]))))

The update to version 1.6 of sjPlot is still in development (feature-freeze, mostly fixes now), however, you can download the latest snapshot from GitHub (see also this post for further information). Now to some examples. First, an example model is fitted and the random effects (default) for each predictor are plotted as “forest plot”:

# fit model
library(lme4)
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
# simple plot
sjp.lmer(fit)

lmer1

Sorting a predictor (i.e. estimates of a facet) is done by specifying the predictor’s name as sort parameter.

sjp.lmer(fit, sort = "Days")

lmer2

Each facet plot can also be plotted as single plot, when facet.grid is set to FALSE. In this case, it is possible to sort the estimates for each plots. See following example from the sjp.glmer function:

library(lme4)
# create binary response
sleepstudy$Reaction.dicho <- sju.dicho(sleepstudy$Reaction, 
                                       dichBy = "md")
# fit model
fit <- glmer(Reaction.dicho ~ Days + (Days | Subject),
             sleepstudy,
             family = binomial("logit"))
sjp.setTheme(theme = "forest")
sjp.glmer(fit, 
          facet.grid = FALSE, 
          sort = "sort.all")

glmer1 glmer2

Plotting the fixed effects is not much spectacular, because we only have one estimate beside intercept here.

sjp.glmer(fit, 
          type = "fe", 
          sort = TRUE)

glmer3

To summarize, you can plot random and fixed effects in the way as shown above. Are there any other or better plot options for visualizing mixed effects models?

Any suggestions are welcome…

Disclaimer: all misspellings belong to Safari’s autocorrect feature!

Visualizing (generalized) linear mixed effects models with ggplot #rstats #lme4

sjPlot 1.6 – major revisions, anyone for beta testing? #rstats

In the last couple of weeks I have rewritten some core parts of my sjPlot-package and also revised the package- and online documentation.

Most notably are the changes that affect theming and appearance of plots and figures. There’s a new function called sjp.setTheme which now sets theme-options for all sjp-functions, which means

  1. you only need to specify theme / appearance option once and no longer need to repeat these parameter for each sjp-function call you make
  2. due to this change, all sjp-functions have much less parameters, making the functions and documentation clearer

Furthermore, due to some problems with connecting / updating to the RPubs server, I decided to upload my online documentation for the package to my own site. You will now find the latest, comprehensive documentation and examples for various functions of the sjPlot package at www.strengejacke.de/sjPlot/. For instance, take a look at customizing plot appearance and see how the new theming feature of the package allows both easier customization of plots as well as better integration of theming packages like ggthemr or ggthemes.

Updating the sjPlot package to CRAN is planned soon, however, I kindly ask you to test the current development snapshot, which is hosted on GitHub. You can easily install the package from there using the devtools-package (devtools::install_github("devel", "sjPlot")). The current snapshot is (very) stable and I appreciate any feedbacks or bug reports (if possible, use the issue tracker from GitHub).

The current change log with all new function, changes and bug fixes can also be found on GitHub.

sjPlot 1.6 – major revisions, anyone for beta testing? #rstats

Visualize pre-post comparison of intervention #rstats

My sjPlot-package was just updated on CRAN, introducing a new function called sjp.emm.int to plot estimated marginal means (least-squares means) of linear models with interaction terms. Or: plotting adjusted means of an ANCOVA.

The idea to this function came up when we wanted to analyze the effect of an intervention (an educational programme on knowledge about mental disorders and associated stigma) between two groups: a “treatmeant” group (city) where a campaign on mental disorders was conducted and another city without this campaign. People from both cities were asked about their attitudes and knowledge about specific mental disorders at t0 before the campaign started in the one city. Some month later (t1), again people from both cities were asked the same questions. The intention was to see a) whether there were differences in knowledge and pro-social attidutes of people towards mental disorders and b) if the compaign successfully reduces stigma and increases knowledge.

To analyse these questions, we used an ANCOVA with knowledge and stigma score as dependent variables, “city” and “time” (t0 versus t1) as predictors and adjusted for covariates like age, sex, education etc. The estimated marginal means (or least-squares means) show you the differences of the dependent variable.

Here’s an example plot, quickly done with the sjp.emm.int function:
sjpemmint

Since the data is not publicly available, I’ve set an an documentation with reproducable examples (though those example do not fit very well…).

The latest development snapshot of my package is available on GitHub.

BTW: You may have noticed that this function is quite similar to the sjp.lm.int function for visually interpreting interaction terms in linear models…

Visualize pre-post comparison of intervention #rstats

sjPlot: New options for creating beautiful tables and documentation #rstats

A new update of my sjPlot package was just released on CRAN. This release focused on improving existing functions and bug fixes again. Especially the table output functions (see my previous blog posts on table output functions here and here) improved a lot. Tables now have more and better possibilities for style customization and knitr integration. A basic introduction into the new features is given in this document.

To make it easier to understand all features, I started to setup comprehensive documentations for all sjPlot functions on strengejacke.de.

sjPlot: New options for creating beautiful tables and documentation #rstats

Beautiful table outputs in R, part 2 #rstats #sjPlot

First of all, I’d like to thank my readers for the lots of feedback on my last post on beautiful outputs in R. I tried to consider all suggestions, updated the existing table-output-functions and added some new ones, which will be described in this post. The updated package is already available on CRAN.

This posting is divided in two major parts:

  1. the new functions are described, and
  2. the new features of all table-output-functions are introduced (including knitr-integration and office-import)

Read on …

Beautiful table outputs in R, part 2 #rstats #sjPlot

No need for SPSS – beautiful output in R #rstats

Note: There’s a second part of this series here.

About one year ago, I seriously started migrating from SPSS to R. Though I’m still using SPSS (because I have to in some situations), I’m quite comfortable and happy with R now and learnt a lot in the past months. But since SPSS is still very wide spread in social sciences, I get asked every now and then, whether I really needed to learn R, because SPSS meets all my needs…

Well, learning R had at least two major benefits for me: 1.) I could improve my statistical knowledge a lot, simply by using formulas, asking why certain R commands do not automatically give the same results like SPSS, reading R resources and papers etc. and 2.) the possibilities of data visualization are way better in R than in SPSS (though SPSS can do well as well…). Of course, there are even many more reasons to use R.

Still, one thing I often miss in R is a beautiful output of simple statistics or maybe even advanced statistics. Not always as plot or graph, but neither as “cryptic” console output. I’d like to have a simple table view, just like the SPSS output window (though the SPSS output is not “beautiful”). That’s why I started writing functions that put the results of certain statistics in HTML tables. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane (like in RStudio viewer pane).

All of the following functions are available in my sjPlot-package on CRAN.

Read on …

No need for SPSS – beautiful output in R #rstats