Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4

In the first part on visualizing (generalized) linear mixed effects models, I showed examples of the new functions in the sjPlot package to visualize fixed and random effects (estimates and odds ratios) of (g)lmer results. Meanwhile, I added further features to the functions, which I like to introduce here. This posting is based on the online manual of the sjPlot package.

In this posting, I’d like to give examples for diagnostic and probability plots of odds ratios. The latter examples, of course, only refer to the sjp.glmer function (generalized mixed models). To reproduce these examples, you need the version 1.59 (or higher) of the package, which can be found at GitHub. A submission to CRAN is planned for the next days…

Fitting example models

The following examples are based on two fitted mixed models:

# fit model
library(lme4)
# create binary response
sleepstudy$Reaction.dicho <- sju.dicho(sleepstudy$Reaction, 
                                       dichBy = "md")
# fit first model
fit <- glmer(Reaction.dicho ~ Days + (Days | Subject),
             sleepstudy,
             family = binomial("logit"))

data(efc)
# create binary response
efc$hi_qol <- sju.dicho(efc$quol_5)
# prepare group variable
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- sji.getValueLabels(efc$e15relat)
# data frame for 2nd fitted model
mydf <- na.omit(data.frame(hi_qol = as.factor(efc$hi_qol),
                           sex = as.factor(efc$c161sex),
                           c12hour = as.numeric(efc$c12hour),
                           neg_c_7 = as.numeric(efc$neg_c_7),
                           grp = efc$grp))
# fit 2nd model
fit2 <- glmer(hi_qol ~ sex + c12hour + neg_c_7 + (1|grp),
              data = mydf,
              family = binomial("logit"))

Summary fit1

Formula: Reaction.dicho ~ Days + (Days | Subject)
   Data: sleepstudy

     AIC      BIC   logLik deviance df.resid 
   158.7    174.7    -74.4    148.7      175 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2406 -0.2726 -0.0198  0.2766  2.9705 

Random effects:
 Groups  Name        Variance Std.Dev. Corr 
 Subject (Intercept) 8.0287   2.8335        
         Days        0.2397   0.4896   -0.19
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.8159     1.1728  -3.254 0.001139 ** 
Days          0.8908     0.2347   3.796 0.000147 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
Days -0.694

Summary fit2

Formula: hi_qol ~ sex + c12hour + neg_c_7 + (1 | grp)
   Data: mydf

     AIC      BIC   logLik deviance df.resid 
  1065.3   1089.2   -527.6   1055.3      881 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7460 -0.8139 -0.2688  0.7706  6.6464 

Random effects:
 Groups Name        Variance Std.Dev.
 grp    (Intercept) 0.08676  0.2945  
Number of obs: 886, groups:  grp, 8

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.179036   0.333940   9.520  < 2e-16 ***
sex2        -0.545282   0.178974  -3.047  0.00231 ** 
c12hour     -0.005148   0.001720  -2.992  0.00277 ** 
neg_c_7     -0.219586   0.024108  -9.109  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
        (Intr) sex2   c12hor
sex2    -0.410              
c12hour -0.057 -0.048       
neg_c_7 -0.765 -0.009 -0.116

Diagnostic plots

Two new functions are added to both sjp.lmer and sjp.glmer, hence they apply to linear and generalized linear mixed models, fitted with the lme4 package. The examples only refer to the sjp.glmer function.

Currently, there are two type options to plot diagnostic plots: type = "fe.cor" to plot a correlation matrix between fixed effects and type = "re.qq" to plot a qq-plot of random effects.

Correlation matrix of fixed effects

To plot a correlation matrix of the fixed effects, use type = "fe.cor".

# plot fixed effects correlation matrix
sjp.glmer(fit2, type = "fe.cor")

unnamed-chunk-11-1

qq-plot of random effects

Another diagnostic plot is the qq-plot for random effects. Use type = "re.qq" to plot random against standard quantiles. The dots should be plotted along the line.

# plot qq-plot of random effects
sjp.glmer(fit, type = "re.qq")

unnamed-chunk-13-1

Probability curves of odds ratios

These plotting functions have been implemented to easier interprete odds ratios, especially for continuous covariates, by plotting the probabilities of predictors.

Probabilities of fixed effects

With type = "fe.pc" (or type = "fe.prob"), probability plots for each covariate can be plotted. These probabilties are based on the fixed effects intercept. One plot per covariate is plotted.

The model fit2 has one binary and two continuous covariates:

# plot probability curve of fixed effects
sjp.glmer(fit2, type = "fe.pc")

unnamed-chunk-9-1

Probabilities of fixed effects depending on grouping level (random intercept)

With type = "ri.pc" (or type = "ri.prob"), probability plots for each covariate can be plotted, depending on the grouping level from the random intercept. Thus, for each covariate a plot for each grouping levels is plotted. Furthermore, with the show.se the standard error of probabilities can be shown. In this example, only the plot for one covariate is shown, not for all.

# plot probability curves for each covariate
# grouped by random intercepts
sjp.glmer(fit2,
          type = "ri.pc",
          show.se = TRUE)

unnamed-chunk-8-2

Instead of faceting plots, all grouping levels can be shown in one plot:

# plot probability curves for each covariate
# grouped by random intercepts
sjp.glmer(fit2,
          type = "ri.pc",
          facet.grid = FALSE)

unnamed-chunk-10-2

Outlook

These will be the new features for the next package update. For later updates, I’m also planning to plot interaction terms of (generalized) linear mixed models, similar to the existing function for visualizing interaction terms in linear models.

Visualizing (generalized) linear mixed effects models with ggplot #rstats #lme4

In the past week, colleagues of mine and me started using the lme4-package to compute multi level models. This inspired me doing two new functions for visualizing random effects (as retrieved by ranef()) and fixed effects (as retrieved by fixef()) of (generalized) linear mixed effect models.

The upcoming version of my sjPlot package will contain two new functions to plot fitted lmer and glmer models from the lme4 package: sjp.lmer and sjp.glmer (not that surprising function names). Since I’m new to mixed effects models, I would appreciate any suggestions on how to improve the functions, which results are important to report (plot) and so on. Furthermore, I’m not sure whether my approach of computing confident intervals for random effects is the best?

I have used following code to compute confident intervals for the estimates returned by the lme4::ranef() function (bases on this stackoverflow answer):

coev <- as.matrix(lme4::vcov.merMod(fit))
tmp <- as.data.frame(cbind(OR = exp(mydf.ef[,i]),
                     lower.CI = exp(mydf.ef[,i] - (1.96 * sqrt(diag(coev))[i])),
                     upper.CI = exp(mydf.ef[,i] + (1.96 * sqrt(diag(coev))[i]))))

The update to version 1.6 of sjPlot is still in development (feature-freeze, mostly fixes now), however, you can download the latest snapshot from GitHub (see also this post for further information). Now to some examples. First, an example model is fitted and the random effects (default) for each predictor are plotted as “forest plot”:

# fit model
library(lme4)
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
# simple plot
sjp.lmer(fit)

lmer1

Sorting a predictor (i.e. estimates of a facet) is done by specifying the predictor’s name as sort parameter.

sjp.lmer(fit, sort = "Days")

lmer2

Each facet plot can also be plotted as single plot, when facet.grid is set to FALSE. In this case, it is possible to sort the estimates for each plots. See following example from the sjp.glmer function:

library(lme4)
# create binary response
sleepstudy$Reaction.dicho <- sju.dicho(sleepstudy$Reaction, 
                                       dichBy = "md")
# fit model
fit <- glmer(Reaction.dicho ~ Days + (Days | Subject),
             sleepstudy,
             family = binomial("logit"))
sjp.setTheme(theme = "forest")
sjp.glmer(fit, 
          facet.grid = FALSE, 
          sort = "sort.all")

glmer1 glmer2

Plotting the fixed effects is not much spectacular, because we only have one estimate beside intercept here.

sjp.glmer(fit, 
          type = "fe", 
          sort = TRUE)

glmer3

To summarize, you can plot random and fixed effects in the way as shown above. Are there any other or better plot options for visualizing mixed effects models?

Any suggestions are welcome…

Disclaimer: all misspellings belong to Safari’s autocorrect feature!

sjPlot 1.6 – major revisions, anyone for beta testing? #rstats

In the last couple of weeks I have rewritten some core parts of my sjPlot-package and also revised the package- and online documentation.

Most notably are the changes that affect theming and appearance of plots and figures. There’s a new function called sjp.setTheme which now sets theme-options for all sjp-functions, which means

  1. you only need to specify theme / appearance option once and no longer need to repeat these parameter for each sjp-function call you make
  2. due to this change, all sjp-functions have much less parameters, making the functions and documentation clearer

Furthermore, due to some problems with connecting / updating to the RPubs server, I decided to upload my online documentation for the package to my own site. You will now find the latest, comprehensive documentation and examples for various functions of the sjPlot package at www.strengejacke.de/sjPlot/. For instance, take a look at customizing plot appearance and see how the new theming feature of the package allows both easier customization of plots as well as better integration of theming packages like ggthemr or ggthemes.

Updating the sjPlot package to CRAN is planned soon, however, I kindly ask you to test the current development snapshot, which is hosted on GitHub. You can easily install the package from there using the devtools-package (devtools::install_github("devel", "sjPlot")). The current snapshot is (very) stable and I appreciate any feedbacks or bug reports (if possible, use the issue tracker from GitHub).

The current change log with all new function, changes and bug fixes can also be found on GitHub.

Visualize pre-post comparison of intervention #rstats

My sjPlot-package was just updated on CRAN, introducing a new function called sjp.emm.int to plot estimated marginal means (least-squares means) of linear models with interaction terms. Or: plotting adjusted means of an ANCOVA.

The idea to this function came up when we wanted to analyze the effect of an intervention (an educational programme on knowledge about mental disorders and associated stigma) between two groups: a “treatmeant” group (city) where a campaign on mental disorders was conducted and another city without this campaign. People from both cities were asked about their attitudes and knowledge about specific mental disorders at t0 before the campaign started in the one city. Some month later (t1), again people from both cities were asked the same questions. The intention was to see a) whether there were differences in knowledge and pro-social attidutes of people towards mental disorders and b) if the compaign successfully reduces stigma and increases knowledge.

To analyse these questions, we used an ANCOVA with knowledge and stigma score as dependent variables, “city” and “time” (t0 versus t1) as predictors and adjusted for covariates like age, sex, education etc. The estimated marginal means (or least-squares means) show you the differences of the dependent variable.

Here’s an example plot, quickly done with the sjp.emm.int function:
sjpemmint

Since the data is not publicly available, I’ve set an an documentation with reproducable examples (though those example do not fit very well…).

The latest development snapshot of my package is available on GitHub.

BTW: You may have noticed that this function is quite similar to the sjp.lm.int function for visually interpreting interaction terms in linear models…

sjPlot: New options for creating beautiful tables and documentation #rstats

A new update of my sjPlot package was just released on CRAN. This release focused on improving existing functions and bug fixes again. Especially the table output functions (see my previous blog posts on table output functions here and here) improved a lot. Tables now have more and better possibilities for style customization and knitr integration. A basic introduction into the new features is given in this document.

To make it easier to understand all features, I started to setup comprehensive documentations for all sjPlot functions on strengejacke.de.

sjPlot 1.3 available #rstats #sjPlot

I just submitted my package update (version 1.3) to CRAN. The download is already available (currently source, binaries follow). While the last two updates included new functions for table outputs (see here and here for details on these functions), the current update mostly provides small helper functions. The focus of this update was to improve existing functions and make their handling easier and more comfortable.

Automatic label detection

One major feature is that many functions now automatically detect variables and value labels, if possible. For instance, if you have imported an SPSS dataset (e.g. with the function sji.SPSS), value labels are automatically attached to all variables of the data frame. With the autoAttachVarLabels parameter set to TRUE, even variable labels will be attached to the data frame after importing the SPSS data. These labels are automatically detected by most functions of the package now. But this does not only apply to importet SPSS-data. If you have factors with specified factor levels, these will also automatically be used as value labels. Furthermore, you can manually attach value and variable labels using the new function sji.setVariableLabels and sji.setValueLabels.

But what are the exactly the benefits of this new feature? Let me give an example. To plot a proportional table with axis and legend labels, prior to sjPlot 1.3 you needed following code:

data(efc)
efc.val <- sji.getValueLabels(efc)
efc.var <- sji.getVariableLabels(efc)
sjp.xtab(efc$e16sex,
         efc$e42dep,
         axisLabels.x=efc.val[['e16sex']],
         legendTitle=efc.var['e42dep'],
         legendLabels=efc.val[['e42dep']])

Since version 1.3, you only need to write:

data(efc)
sjp.xtab(efc$e16sex, efc$e42dep)

Reliability check for index scores

One new table output function included in this update is sjt.itemanalysis, which helps performing an item analysis on a scale or data frame if you want to develop index scores.

Let’s say you have several items and you want to compute a principal component analysis in order to identify different components that can be composed to an index score. In such cases, you might want to perform reliability and item discrimination tests. This is shown in the following example, which performs a PCA on the COPE-Index-scale, followed by a reliability and item analysis of each extracted “score”:

data(efc)
# recveive first item of COPE-index scale
start <- which(colnames(efc)=="c82cop1")
# recveive last item of COPE-index scale
end <- which(colnames(efc)=="c90cop9")
# create data frame of cope-index-items
df <- as.data.frame(efc[,c(start:end)])
colnames(df) <- sji.getVariableLabels(efc)[c(start:end)]
# compute PCA on cope index and return
# "group classifications" of factors
factor.groups <- sjt.pca(df, no.output=TRUE)$factor.index
# perform item analysis
sjt.itemanalysis(df, factor.groups)

The result is following table, where two components have been extracted via the PCA, and the variables belonging each component are treated as one “index score” (note that you don’t need to define groups, you can also treat a data frame as one single “index”):
relia

The output of the computed PCA was suppressed by no.output=TRUE. To better understand the above figure, take a look at the PCA results, where two components have been extracted:
pca_item_reli

Beside that, many functions – especially the table output functions – got new parameters to change the appearance of the output (amount of digits, including NA’s, additional information in tables etc.). Refer to the package news to get a complete overview of what was changed since the last version.

The latest developer build can be found on github.

Developer snapshots of #sjPlot-package now on #Github #rstats

Finally, I managed to setup a GitHub repository. From now on, the latest developer snapshot of my sjPlot-package will be published right here: https://github.com/sjPlot/devel.

Please post issues there, download the latest developer build for testing purposes or help developing the wiki-page with examples for package usage etc.

Btw, if somebody knows, why I can’t get GitHub running with RStudio, let me know… I always get this issue, which was already reported by other users. Currently, I’m using the GitHub.app to commit changes.

Beautiful table outputs in R, part 2 #rstats #sjPlot

First of all, I’d like to thank my readers for the lots of feedback on my last post on beautiful outputs in R. I tried to consider all suggestions, updated the existing table-output-functions and added some new ones, which will be described in this post. The updated package is already available on CRAN.

This posting is divided in two major parts:

  1. the new functions are described, and
  2. the new features of all table-output-functions are introduced (including knitr-integration and office-import)

Read on …

Simply creating various scatter plots with ggplot #rstats

Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots.

In my package, there’s a function called sjp.scatter for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set:

data(efc)

The simplest function call is by just providing two variables, one for the x- and one for the y-axis:

sjp.scatter(efc$c160age, efc$e17age)

which plots following graph:
sct_01

If you have continuous variables with a larger scale, you shouldn’t have problems with overplotting or overlaying dots. However, this problem usually occurs, if you have variables with just a few categories (factor levels). The function automatically estimates the amount of overlaying dots and then automatically jitters them, like in following example, which also includes a marginal rug-plot:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE)

sct_02

The same plot, when auto-jittering is turned off, would look like this:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code,
            showRug=TRUE, autojitter=FALSE)

sct_03

You can also add a grouping variable. The scatter plot is then “divided” into as many groups as indicated by the grouping variable. In the next example, two variables (elder’s and carer’s age) are grouped by different dependency levels of the elderly. Additionally, a fitted line for each group is plotted:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE)

sct_04

If the groups are difficult to distinguish in a single plot area, the graph can be faceted by groups. This is shown in the last example, where the same scatter plot as above is plotted with facets for each group:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE, useFacetGrid=TRUE, showSE=TRUE)

sct_05

Find a complete overview of the various function options in the package-help or at inside-r.

No need for SPSS – beautiful output in R #rstats

Note: There’s a second part of this series here.

About one year ago, I seriously started migrating from SPSS to R. Though I’m still using SPSS (because I have to in some situations), I’m quite comfortable and happy with R now and learnt a lot in the past months. But since SPSS is still very wide spread in social sciences, I get asked every now and then, whether I really needed to learn R, because SPSS meets all my needs…

Well, learning R had at least two major benefits for me: 1.) I could improve my statistical knowledge a lot, simply by using formulas, asking why certain R commands do not automatically give the same results like SPSS, reading R resources and papers etc. and 2.) the possibilities of data visualization are way better in R than in SPSS (though SPSS can do well as well…). Of course, there are even many more reasons to use R.

Still, one thing I often miss in R is a beautiful output of simple statistics or maybe even advanced statistics. Not always as plot or graph, but neither as “cryptic” console output. I’d like to have a simple table view, just like the SPSS output window (though the SPSS output is not “beautiful”). That’s why I started writing functions that put the results of certain statistics in HTML tables. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane (like in RStudio viewer pane).

All of the following functions are available in my sjPlot-package on CRAN.

Read on …