Schlagwort: R

Data visualization in social sciences – what’s new in the sjPlot-package? #rstats

My sjPlot package just reached version 2.0 and got many updates during the couple of last months. The focus was less on adding new functions; rather, I improved existing functions by adding new smaller and bigger features to make working with the package easier and more reliable. In this blog post, I will report some of the new features.

Consistent name style of arguments

Most notably, I tried to give all package functions a consistent naming style or pattern for arguments. In previous versions, mixing different name-styles was sometimes very confusing. For example, some functions used showNA, others na.rm or show.na. Or some functions used hideLegend, some showLegend and others again show.legend.

Now, all argument names are 1) lower case, 2) dot separated for longer words and are 3) grouped according to their function (i.e., if you open the docs for ?sjt.lm, you’ll find all show. arguments, then all string. and finally all digits. arguments). I know that this means that you most likely have to completely re-write your code that uses sjPlot-function calls, but I think, in the long run, this makes working with the sjPlot package easier

Support for different model families and link functions

In previous package versions, functions related to generalized linear models (like sjp.glm or sjp.glmer) were hard coded for binomial model families for most plot types. Some effect or prediction plots only worked for logistic regression, because predictions were based on plogis. Also, automatic entitling of plots always included „probability“, even for count models.

In the past package updates and especially and the last major update, prediction or effect plot are now based on the link-inverse function of the models, so all common model families and link functions should work with sjPlot now.

Predictions and effect plots

In some cases, it is easier to interprete the predicted probabilities, incidents rates or marginal effects instead of the related estimate numbers (odds ratios, incident rate ratios, beta). For linear models (sjp.lm), linear mixed models (sjp.lmer), generalized linear models (sjp.glm) and generalized linear mixed models (sjp.glmer), there are three different plot types to plot predicted values or marginal effects:

  1. type = "slope" (or type = "fe.slope" and type = "ri.slope" for mixed models) to plot unadjusted predicted values, i.e. the relation between model terms and response.
  2. type = "eff" to plot marginal effects, adjusted for all predictors.
  3. type = "pred" (and type = "pred.fe" for mixed models) to plot predicted values against reponse, for particular model terms.

The following examples are taken from the vignette of the sjp.glm-function.

1. Predicted values, unadjusted

The predicted values from this plot type are based on the intercept’s estimate and each specific term’s estimate. All other co-variates are set to zero (i.e. ignored), which corresponds to family(fit)$linkinv(eta = b0 + bi * xi) (where xi is the estimate).

Predicted values, unadjusted

A probability curve of all predictors is plotted, which indicates the probability of the event (indicated by the response) occuring for each value of the predictor (not adjusted for remaining co-variates). In the above example, the first panel in the plot would be interpreted as: with increasing Barthel-Index (which means, better functional / physical status), the probability that caring for a dependent person is negatively perceived, decreases (in short: the less dependent a person I care for is, the less negative is the impact of care).

2. Effect plots

For marginal effects (predicted marginal probabilities resp. predicted marginal incident rates), all remaining co-variates are set to the mean, so this plot type adjusts for co-variates. Obtained results are based on the effects-package.

Marginal effects, adjusted

The effect plots can now also be non-faceted, and for selected model terms only (using the facit.grid and vars arguments).

3. Predicting values

The plot-type for predicting values did not produce any useful results in former package versions, because it just called the predict function without relationship to any predictor, or meaningful data. Now, this plot-type was completely revised. With type = "pred" (formerly, "y.pc"), you can plot predicted values for the response, related to specific model predictors. The predicted values of the response are computed, which corresponds to predict(fit, type = "response"). This plot type requires the vars argument to select specific terms that should be used for the x-axis and – optional – as grouping factor. Hence, vars must be a character vector with the names of one or two model predictors.

Predicting values

Predicting values

Table functions for mixed models

The table functions were also revised, especially for mixed models. You now have more details in the random parts section of the table, which now also shows the variance components of the random parts, or (pseudo-)r2-values.

The tables are created as HTML-page and displayed in your IDE’s viewer or your web browser. You can see many examples at the package vignettes-page. For the following example, I have taken a screenshot, because else the blog’s style sheet would break the table layout. Anyway, this is an example of a quickly produced table:

table

Closing remarks

There have been a lot of improvements made in the sjPlot package during the past update(s). Above you see example of the most obvious user-visible changes. But there were also lots of other smaller and bigger improvements. E.g. plotting functions with different plot types, like sjp.glm, have many arguments; most of them only applied to specific plot types, while they were ignored by other plot types. Now, all plot types support more or mostly all arguments, and the documentation should be clearer about what the functions and their arguments do.

I hope you’ll enjoy the sjPlot-package. Feel free to submit issues or suggestions to the dedicated GitHub-page.

Beautiful table-outputs: Summarizing mixed effects models #rstats

The current version 1.8.1 of my sjPlot package has two new functions to easily summarize mixed effects models as HTML-table: sjt.lmer and sjt.glmer. Both are very similar, so I focus on showing how to use sjt.lmer here.

# load required packages
library(sjPlot) # table functions
library(sjmisc) # sample data
library(lme4) # fitting models

Linear mixed models summaries as HTML table

The sjt.lmer function prints summaries of linear mixed models (fitted with the lmer function of the lme4-package) as nicely formatted html-tables. First, some sample models are fitted:

# load sample data
data(efc)
# prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_val_labels(efc$e15relat)
efc$care.level <- as.factor(rec(efc$n4pstu, "0=0;1=1;2=2;3:4=4"))
levels(x = efc$care.level) <- c("none", "I", "II", "III")

# data frame for fitted model
mydf <- data.frame(neg_c_7 = as.numeric(efc$neg_c_7),
                   sex = as.factor(efc$c161sex),
                   c12hour = as.numeric(efc$c12hour),
                   barthel = as.numeric(efc$barthtot),
                   education = as.factor(efc$c172code),
                   grp = efc$grp,
                   carelevel = efc$care.level)

# fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1|grp), data = mydf)
fit2 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel + (1|grp), data = mydf)
fit3 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel +
              (1|grp) +
              (1|carelevel), data = mydf)

The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2. The resulting table is divided into three parts:

  • Fixed parts – the model’s fixed effects coefficients, including confidence intervals and p-values.
  • Random parts – the model’s group count (amount of random intercepts) as well as the Intra-Class-Correlation-Coefficient ICC.
  • Summary – Observations, AIC etc.

„Beautiful table-outputs: Summarizing mixed effects models #rstats“ weiterlesen

sjmisc – package for working with (labelled) data #rstats

The sjmisc-package

My last posting was about reading and writing data between R and other statistical packages like SPSS, Stata or SAS. After that, I decided to bundle all functions that are not directly related to plotting or printing tables, into a new package called sjmisc.

Basically, this package covers three domains of functionality:

  • reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages; hence, sjmisc also includes function to work with labelled data.
  • frequently used statistical tests, or at least convenient wrappers for such test functions
  • frequently applied recoding and variable conversion tasks

In this posting, I want to give a quick and short introduction into the labeling features.

„sjmisc – package for working with (labelled) data #rstats“ weiterlesen

Reading from and writing to SPSS, SAS and STATA with R #rstats #sjPlot

On CRAN now

My sjPlot-package was updated on CRAN (binaries will be available soon, I guess). This update contains, besides many small improvements and fixes, two major features:

  1. First, new features to print table summaries of linear models and generalized linear models (for sjt.glm, the same new features were added as to sjt.lm – however, the manual page is not finished yet). I have introduced these features in a former posting.
  2. Second, functions for reading data from and writing to other statistical packages like SPSS, SAS or STATA have been revamped or new features have been added. Furthermore, there are improved getters and setters to extract and set variable and value labels. A short introduction is available online.

„Reading from and writing to SPSS, SAS and STATA with R #rstats #sjPlot“ weiterlesen

Beautiful tables for linear model summaries #rstats

Beautiful HTML tables of linear models

In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN.

There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors. There are examples below that demonstrate these features.

The sjt.lm function prints results and summaries of linear models as HTML-table. These tables can be viewed in the RStudio Viewer pane, web browser or easily exported to office applications. See also my former posts on the table printing functions of my package here and here.

Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. The original online-manual for this function can be found here.

„Beautiful tables for linear model summaries #rstats“ weiterlesen

sjPlot package and related online manuals updated #rstats # ggplot

My sjPlot package for data visualization has just been updated on CRAN. I’ve added some features to existing function, which I want to introduce here.

Plotting linear models

So far, plotting model assumptions of linear models or plotting slopes for each estimate of linear models were spread over several functions. Now, these plot types have been integrated into the sjp.lm function, where you can select the plot type with the type parameter. Furthermore, plotting standardized coefficients now also plot the related confidence intervals.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lm

Plotting generalized linear models

Beside odds ratios, you now can also plot the predicted probabilities of the outcome for each predictor of generalized linear models. In case you have continuous variables, these kind of plots may be more intuitive than an odds ratio value.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.glm

Plotting (generalized) linear mixed effects models

The plotting function for creating plots of (generalized) linear mixed effects models (sjp.lmer and sjp.glmer) also got new plot types over the course of the last weeks.

For sjp.lmer, we have

  • re (default) for estimates of random effects
  • fe for estimates of fixed effects
  • fe.std for standardized estimates of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.ri for fixed effects slopes depending on the random intercept.

and for sjp.glmer, we have

  • re (default) for odds ratios of random effects
  • fe for odds ratios of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.pc or fe.prob to plot probability curves (predicted probabilities) of all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.
  • ri.pc or ri.prob to plot probability curves (predicted probabilities) of random intercept variances for all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lmer and www.strengejacke.de/sjPlot/sjp.glmer

Plotting interaction terms of (generalized) linear (mixed effects) models

Another function, where new features were added, is sjp.int (formerly known as sjp.lm.int). This function is now kind of generic and can plot interactions of

  • linar models (lm)
  • generalized linar models (glm)
  • linar mixed effects models (lme4::lmer)
  • generalized linar mixed effects models (lme4::glmer)

For linear models (both normal and mixed effects), slopes of interaction terms are plotted. For generalized linear models, the predicted probabilities of the outcome towards the interaction terms is plotted.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.int

Plotting Likert scales

Finally, a comprehensive documentation for the sjp.likert function is finsihed, which can be found here:
www.strengejacke.de/sjPlot/sjp.likert

Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4

In the first part on visualizing (generalized) linear mixed effects models, I showed examples of the new functions in the sjPlot package to visualize fixed and random effects (estimates and odds ratios) of (g)lmer results. Meanwhile, I added further features to the functions, which I like to introduce here. This posting is based on the online manual of the sjPlot package.

In this posting, I’d like to give examples for diagnostic and probability plots of odds ratios. The latter examples, of course, only refer to the sjp.glmer function (generalized mixed models). To reproduce these examples, you need the version 1.59 (or higher) of the package, which can be found at GitHub. A submission to CRAN is planned for the next days…

„Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4“ weiterlesen

Visualizing (generalized) linear mixed effects models with ggplot #rstats #lme4

In the past week, colleagues of mine and me started using the lme4-package to compute multi level models. This inspired me doing two new functions for visualizing random effects (as retrieved by ranef()) and fixed effects (as retrieved by fixef()) of (generalized) linear mixed effect models.

The upcoming version of my sjPlot package will contain two new functions to plot fitted lmer and glmer models from the lme4 package: sjp.lmer and sjp.glmer (not that surprising function names). Since I’m new to mixed effects models, I would appreciate any suggestions on how to improve the functions, which results are important to report (plot) and so on. Furthermore, I’m not sure whether my approach of computing confident intervals for random effects is the best?

„Visualizing (generalized) linear mixed effects models with ggplot #rstats #lme4“ weiterlesen

sjPlot 1.6 – major revisions, anyone for beta testing? #rstats

In the last couple of weeks I have rewritten some core parts of my sjPlot-package and also revised the package- and online documentation.

Most notably are the changes that affect theming and appearance of plots and figures. There’s a new function called sjp.setTheme which now sets theme-options for all sjp-functions, which means

  1. you only need to specify theme / appearance option once and no longer need to repeat these parameter for each sjp-function call you make
  2. due to this change, all sjp-functions have much less parameters, making the functions and documentation clearer

Furthermore, due to some problems with connecting / updating to the RPubs server, I decided to upload my online documentation for the package to my own site. You will now find the latest, comprehensive documentation and examples for various functions of the sjPlot package at www.strengejacke.de/sjPlot/. For instance, take a look at customizing plot appearance and see how the new theming feature of the package allows both easier customization of plots as well as better integration of theming packages like ggthemr or ggthemes.

Updating the sjPlot package to CRAN is planned soon, however, I kindly ask you to test the current development snapshot, which is hosted on GitHub. You can easily install the package from there using the devtools-package (devtools::install_github("devel", "sjPlot")). The current snapshot is (very) stable and I appreciate any feedbacks or bug reports (if possible, use the issue tracker from GitHub).

The current change log with all new function, changes and bug fixes can also be found on GitHub.