A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.
Before we start, we fit some models, including a mediation-object from the mediation-package, which we use for comparison with brms. The functions work with
Weiterlesen „R functions for Bayesian Model Statistics and Summaries #rstats #stan #brms“
I’m pleased to announce an update of my sjstats-package. New features are specifically implemented for the Anova and Bayesian statistic and summary functions. Here’s a short overview of what’s new…
Weiterlesen „Anova-Freak and Bayesian Hipster #rstats“
I’m pleased to announce an update for the sjmisc-package, which was just released on CRAN. Here I want to point out two important changes in the package.
New default option for recoding and transformation functions
First, a small change in the code with major impact on the workflow, as it affects argument defaults and is likely to break your existing code – if you’re using sjmisc: The
append-argument in recode and transformation functions like
row_means() now defaults to
The reason behind this change is that, in my experience and workflow, when transforming or recoding variables, I typically want to add these new variables to an existing data frame by default. Especially in a pipe-workflow, when I start my scripts with importing and basic tidying of my data, I almost always want to append the recoded variables to my existing data, e.g.:
# Example with following steps:
# 1. loading labelled data set
# 2. dropping unused labels
# 3. converting numeric into categorical, using labels as levels
# 4. center some variables
# 5. recode some other variables
center(var7, var9) %>%
rec(var11, rec = "2=0;1=1;else=copy")
Weiterlesen „Data transformation in #tidyverse style: package sjmisc updated #rstats“
Yesterday, at the last meeting of the Hamburg R User Group in this year, I had the pleasure to give a talk about Bayesian modelling and choosing (informative) priors in the rstanarm-package.
You can download the slides of my talk here.
Thanks to the Stan team and Tristan for proof reading my slides prior (<- hoho) to the talk. Disclaimer: Still, I'm fully responsible for the content of the slides, and I'm to blame for any false statements or errors in the code…
I’m pleased to announce the latest update from my sjPlot-package on CRAN. Beside some bug fixes and minor new features, the major update is a new function,
plot_model(), which is both an enhancement and replacement of
sjp.int(). The latter functions will become deprecated in the next updates and removed somewhen in the future.
plot_model() is a „generic“ plot function that accepts many model-objects, like
lmerMod etc. It offers various plotting types, like estimates/coefficient plots (aka forest or dot-whisker plots), marginal effect plots and plotting interaction terms, and sort of diagnostic plots.
In this blog post, I want to describe how to plot estimates as forest plots.
Weiterlesen „„One function to rule them all“ – visualization of regression models in #rstats w/ #sjPlot“
I started writing my first package as collection of various functions that I needed for (almost) daily work. Meanwhile, packages were growing and bit by bit I sourced out functions to put them into new packages. Although this means more work for CRAN members when they have more packages to manage on their network, from a user-perspective it is much better if packages have a clear focus and a well defined set of functions. That’s why I now released a new package on CRAN, sjlabelled, which contains all functions that deal with labelled data. These functions use to live in the sjmisc-package, where they now are deprecated and will be removed in a future update.
My aim is not only to provide packages with a clear focus, but also with a consistent design and philosophy, making it easier and more intuitive to use (see also here) – I prefer to follow the so called „tidyverse“-approach here. It is still work in progress, but so far I think I’m on a good way…
So, what are the packages I use for (almost) daily work?
- sjlabelled – reading, writing and working with labelled data (especially since I collaborate a lot with people who use Stata or SPSS)
- sjmisc – data and variable transformation utilities (the complement to dplyr and tidyr, when it comes down from data frames to variables within the data wrangling process)
- sjstats – out-of-the-box statistics that is not already provided by base R or packages
- sjPlot – to quickly generate tables and plot
- ggeffects – to visualize regression models
The next step is creating cheat sheets for my packages. I think if you can map the scope and idea of your package (functions) on a cheat sheet, its focus is probably well defined.
Btw, if you also use some of the above packages more or less regularly, you can install the „strengejacke“-package to load them all in one step. This package is not on CRAN, because its only purpose is to load other packages.
Disclaimer: Of course I use other packages everyday as well – this posting is just focussing on my packages that I created because I frequently needed these kind of functions.
Aim of the ggeffects-package
The aim of the ggeffects-package is similar to the broom-package: transforming “untidy” input into a tidy data frame, especially for further use with ggplot. However, ggeffects does not return model-summaries; rather, this package computes marginal effects at the mean or average marginal effects from statistical models and returns the result as tidy data frame (as tibbles, to be more precisely).
Since the focus lies on plotting the data (the marginal effects), at least one model term needs to be specified for which the effects are computed. It is also possible to compute marginal effects for model terms, grouped by the levels of another model’s predictor. The package also allows plotting marginal effects for two- or three-way-interactions, or for specific values of a model term only. Examples are shown below.
Weiterlesen „ggeffects: Create Tidy Data Frames of Marginal Effects for ‚ggplot‘ from Model Outputs #rstats“
The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides
svyglm(), to fit generalised linear models to data from a complex survey design.
svyglm() covers all families that are also provided by R’s
glm() – however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic
svymle() to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to
svymle(), to fit negative binomial models for complex samples. So I wrote a small „wrapper“ and implemented a function
svyglm.nb() in my sjstats-package.
# This example reproduces the results from
# Lumley 2010, figure E.7 (Appendix E, p256)
# create survey design
des <- svydesign(
id = ~ SDMVPSU,
strat = ~ SDMVSTRA,
weights = ~ WTINT2YR,
nest = TRUE,
data = nhanes_sample
# fit negative binomial regression
fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des)
# print coefficients and standard errors
round(cbind(coef(fit), survey::SE(fit)), 2)
#> [,1] [,2]
#> theta.(Intercept) 0.81 0.05
#> eta.(Intercept) 2.29 0.16
#> eta.factor(RIAGENDR)2 -0.80 0.18
#> eta.log(age) 1.07 0.23
#> eta.factor(RIDRETH1)2 0.08 0.15
#> eta.factor(RIDRETH1)3 0.09 0.18
#> eta.factor(RIDRETH1)4 0.82 0.30
#> eta.factor(RIDRETH1)5 0.06 0.38
#> eta.factor(RIAGENDR)2:log(age) -1.22 0.27
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)3 0.60 0.19
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)4 0.06 0.37
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)5 0.38 0.44
The functions returns an object of class
svymle, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like
predict() are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.
This is another post of my series about how my packages integrate into a pipe-friendly workflow. The post focusses on my sjmisc-package, which was just updated on CRAN, and highlights some of the new features. Examples are based on data from the European Social Survey, which are freely available.
Please note: The statistical analyses at the end of this post mainly serve the purpose of demonstrating some features of the sjmisc-package that target „real life“ problems! For clarity reasons, I ran a quick-and-dirty model, which is not of high statistical quality or standard!
Weiterlesen „Exploring the European Social Survey (ESS) – pipe-friendly workflow with sjmisc, part 2 #rstats #tidyverse“
The current version 1.8.1 of my sjPlot package has two new functions to easily summarize mixed effects models as HTML-table:
sjt.glmer. Both are very similar, so I focus on showing how to use
# load required packages
library(sjPlot) # table functions
library(sjmisc) # sample data
library(lme4) # fitting models
Linear mixed models summaries as HTML table
sjt.lmer function prints summaries of linear mixed models (fitted with the
lmer function of the lme4-package) as nicely formatted html-tables. First, some sample models are fitted:
# load sample data
# prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_val_labels(efc$e15relat)
efc$care.level <- as.factor(rec(efc$n4pstu, "0=0;1=1;2=2;3:4=4"))
levels(x = efc$care.level) <- c("none", "I", "II", "III")
# data frame for fitted model
mydf <- data.frame(neg_c_7 = as.numeric(efc$neg_c_7),
sex = as.factor(efc$c161sex),
c12hour = as.numeric(efc$c12hour),
barthel = as.numeric(efc$barthtot),
education = as.factor(efc$c172code),
grp = efc$grp,
carelevel = efc$care.level)
# fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1|grp), data = mydf)
fit2 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel + (1|grp), data = mydf)
fit3 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel +
(1|carelevel), data = mydf)
The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2. The resulting table is divided into three parts:
- Fixed parts – the model’s fixed effects coefficients, including confidence intervals and p-values.
- Random parts – the model’s group count (amount of random intercepts) as well as the Intra-Class-Correlation-Coefficient ICC.
- Summary – Observations, AIC etc.
Weiterlesen „Beautiful table-outputs: Summarizing mixed effects models #rstats“