Schlagwort: Statistik

Beautiful table-outputs: Summarizing mixed effects models #rstats

The current version 1.8.1 of my sjPlot package has two new functions to easily summarize mixed effects models as HTML-table: sjt.lmer and sjt.glmer. Both are very similar, so I focus on showing how to use sjt.lmer here.

# load required packages
library(sjPlot) # table functions
library(sjmisc) # sample data
library(lme4) # fitting models

Linear mixed models summaries as HTML table

The sjt.lmer function prints summaries of linear mixed models (fitted with the lmer function of the lme4-package) as nicely formatted html-tables. First, some sample models are fitted:

# load sample data
# prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_val_labels(efc$e15relat)
efc$care.level <- as.factor(rec(efc$n4pstu, "0=0;1=1;2=2;3:4=4"))
levels(x = efc$care.level) <- c("none", "I", "II", "III")

# data frame for fitted model
mydf <- data.frame(neg_c_7 = as.numeric(efc$neg_c_7),
                   sex = as.factor(efc$c161sex),
                   c12hour = as.numeric(efc$c12hour),
                   barthel = as.numeric(efc$barthtot),
                   education = as.factor(efc$c172code),
                   grp = efc$grp,
                   carelevel = efc$care.level)

# fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1|grp), data = mydf)
fit2 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel + (1|grp), data = mydf)
fit3 <- lmer(neg_c_7 ~ sex + c12hour + education + barthel +
              (1|grp) +
              (1|carelevel), data = mydf)

The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2. The resulting table is divided into three parts:

  • Fixed parts – the model’s fixed effects coefficients, including confidence intervals and p-values.
  • Random parts – the model’s group count (amount of random intercepts) as well as the Intra-Class-Correlation-Coefficient ICC.
  • Summary – Observations, AIC etc.

„Beautiful table-outputs: Summarizing mixed effects models #rstats“ weiterlesen

sjmisc – package for working with (labelled) data #rstats

The sjmisc-package

My last posting was about reading and writing data between R and other statistical packages like SPSS, Stata or SAS. After that, I decided to bundle all functions that are not directly related to plotting or printing tables, into a new package called sjmisc.

Basically, this package covers three domains of functionality:

  • reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages; hence, sjmisc also includes function to work with labelled data.
  • frequently used statistical tests, or at least convenient wrappers for such test functions
  • frequently applied recoding and variable conversion tasks

In this posting, I want to give a quick and short introduction into the labeling features.

„sjmisc – package for working with (labelled) data #rstats“ weiterlesen

Visualize pre-post comparison of intervention #rstats

My sjPlot-package was just updated on CRAN, introducing a new function called to plot estimated marginal means (least-squares means) of linear models with interaction terms. Or: plotting adjusted means of an ANCOVA.

The idea to this function came up when we wanted to analyze the effect of an intervention (an educational programme on knowledge about mental disorders and associated stigma) between two groups: a „treatmeant“ group (city) where a campaign on mental disorders was conducted and another city without this campaign. People from both cities were asked about their attitudes and knowledge about specific mental disorders at t0 before the campaign started in the one city. Some month later (t1), again people from both cities were asked the same questions. The intention was to see a) whether there were differences in knowledge and pro-social attidutes of people towards mental disorders and b) if the compaign successfully reduces stigma and increases knowledge.

To analyse these questions, we used an ANCOVA with knowledge and stigma score as dependent variables, „city“ and „time“ (t0 versus t1) as predictors and adjusted for covariates like age, sex, education etc. The estimated marginal means (or least-squares means) show you the differences of the dependent variable.

Here’s an example plot, quickly done with the function:

Since the data is not publicly available, I’ve set an an documentation with reproducable examples (though those example do not fit very well…).

The latest development snapshot of my package is available on GitHub.

BTW: You may have noticed that this function is quite similar to the function for visually interpreting interaction terms in linear models…

Beautiful table outputs in R, part 2 #rstats #sjPlot

First of all, I’d like to thank my readers for the lots of feedback on my last post on beautiful outputs in R. I tried to consider all suggestions, updated the existing table-output-functions and added some new ones, which will be described in this post. The updated package is already available on CRAN.

This posting is divided in two major parts:

  1. the new functions are described, and
  2. the new features of all table-output-functions are introduced (including knitr-integration and office-import)

Read on …

No need for SPSS – beautiful output in R #rstats

Note: There’s a second part of this series here.

About one year ago, I seriously started migrating from SPSS to R. Though I’m still using SPSS (because I have to in some situations), I’m quite comfortable and happy with R now and learnt a lot in the past months. But since SPSS is still very wide spread in social sciences, I get asked every now and then, whether I really needed to learn R, because SPSS meets all my needs…

Well, learning R had at least two major benefits for me: 1.) I could improve my statistical knowledge a lot, simply by using formulas, asking why certain R commands do not automatically give the same results like SPSS, reading R resources and papers etc. and 2.) the possibilities of data visualization are way better in R than in SPSS (though SPSS can do well as well…). Of course, there are even many more reasons to use R.

Still, one thing I often miss in R is a beautiful output of simple statistics or maybe even advanced statistics. Not always as plot or graph, but neither as „cryptic“ console output. I’d like to have a simple table view, just like the SPSS output window (though the SPSS output is not „beautiful“). That’s why I started writing functions that put the results of certain statistics in HTML tables. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane (like in RStudio viewer pane).

All of the following functions are available in my sjPlot-package on CRAN.

Read on …

Emotional reactions toward people with dementia

Our paper on Emotional reactions toward people with dementia was accepted and is published online (though I don’t know and can’t check whether it’s behind a paywall – perhaps just visit me on ResearchGate). Here’s the abstract:

Emotional reactions toward people with disorders are an important component of stigma process. In this study, emotional reactions of the German public toward people with dementia were analyzed.

Analyses are based on a national mail survey conducted in 2012. Sample consists of persons aged 18 to 79 years living in private households in Germany. In all 1,795 persons filled out the questionnaire, reflecting a response rate of 78%. Respondents were asked about their emotional reactions and beliefs about dementia.

A vast majority of the respondents expressed pro-social reactions, i.e. they felt pity, sympathy, and the need to help a person with dementia. Dementia patients rarely evoked anger (10% or less). Between 25% and 50% of the population showed reactions indicating fear. Respondents who had contacts with a person having dementia or had cared for a dementia patient tended to show less negative reactions (fear, anger) and more pro-social reactions. Respondents who showed pronounced fearful reactions were less likely to believe that dementia patients had a high quality of life, were less willing to care for a family member with dementia at home, and were more skeptical about early detection of dementia. Comparison with the results of another study suggests that fearful reactions toward persons with dementia are much more pronounced than in the case of depression, and less pronounced than in the case of schizophrenia.

Fearful reactions toward people with dementia are quite common in the German general public. To reduce fear, educational programs and contact-based approaches should be considered.

Print glm-output to HTML table #rstats

We often use logistic regression models in our analyses and we also often need to publish the results as tables in our papers. And, we always use MS Word since this is the standard office application in our department. So I thought about an easy way of how to transfer the results of fitted (generalized) linear models from R to Word. An appropriate way – for me – is to create HTML tables, simply open them in Word and copy’n’paste them into my document. This works much better than all things I have tried with SPSS tables (if someone has an easier solution, let me know!).

I wrote two small functions called sjt.lm resp. sjt.glm, which are included in my sjPlot-R-package. These functions require at least one or more fitted (g)lm-objects. It’s recommended to supply labels of predictor and dependent variables as further parameters. Here are some examples of different table styles…

First, compute two fitted models and create labels:

y1 <- ifelse(swiss$Fertility<median(swiss$Fertility), 0, 1)
y2 <- ifelse(swiss$Agriculture<median(swiss$Agriculture), 0, 1)

fitOR1 <- glm(y1 ~ swiss$Education +
              swiss$Examination + 
              swiss$Infant.Mortality + 

fitOR2 <- glm(y2 ~ swiss$Education +
              swiss$Examination + 
              swiss$Infant.Mortality + 

lab <- c("Education", "Examination", "Infant Mortality", "Catholic")
labdep <- c("Fertility", "Agriculture")

Now, generate the tables:

sjt.glm(fitOR1, fitOR2,

Default table style

sjt.glm(fitOR1, fitOR2,

Table with p-values as numbers

sjt.glm(fitOR1, fitOR2,

Table with separated column for CI

sjt.glm(fitOR1, fitOR2,

Table with p-values as numbers and separated column for CI

These html-files can be opened with many word processors and the table can be copied’n’pasted into your own document. If you don’t specify the file parameter, the table will be shown in your default browser or in the viewer pane of your R-IDE (for instance, the RStudio viewer pane).

Examples for sjPlotting functions, including correlations and proportional tables with ggplot #rstats

Sometimes people ask me how the examples of my plotting functions I show here can be reproduced without having a SPSS data set (or at least, without having the data set I use because it’s not public yet). So I started to write some examples that run „out of the box“ and which I want to present you here. Furthermore, two new plotting functions are introduced: plotting correlations and plotting proportional tables on a percentage scale.

As always, you can find the latest version of my R scripts on my download page.

Following plotting functions will be described in this posting:

  • Plotting proportional tables: sjPlotPropTable.R
  • Plotting correlations: sjPlotCorr.R
  • Plotting frequencies: sjPlotFrequencies.R
  • Plotting grouped frequencies: sjPlotGroupFrequencies.R
  • Plotting linear model: sjPlotLinreg.R
  • Plotting generalized linear models: sjPlotOdds.R

Please note that I have changed function and parameter names in order to have consistent, logical names across all functions!

At the end of this posting you will find some explanation on the different parameters that allow you to fit the plotting results to your needs…

Continue reading this post…

Plotting lm and glm models with ggplot #rstats

I followed the advice from Tim’s comment and changed the scaling in the sjPlotOdds-function to logarithmic scaling. The screenshots below showing the plotted glm’s have been updated.

In this posting I will show how to plot results from linear and logistic regression models (lm and glm) with ggplot. As in my previous postings on ggplot, the main idea is to have a highly customizable function for representing data. You can download all my scripts from my script page.

The inspiration source
My following two functions are based on an idea which I saw at the Sustainable Research Blog. Actually, this was a kind of starting point for me to get started with R and learn more about its data visualization facilities. After playing around some time with ggplot, I built my own function based on the script posted at Sustainable Research.

Plotting odds ratios
Plotting odds ratios gives you mainly two display styles: bars or plots (dots). First, let me show you the dot-style. Assuming you have a glm-object (in my examples, it’s called logreg) and have loaded the function sjPlotOdds.R (see my script page for downloads), you can plot the results like this (I have used oddsLabels=lab , a vector with label-strings, which are used as axis-labels. If you leave out this parameter, the variable-names from the model will be taken.):

Odds ratios as dots, with confidence intervals, "positive" effects (> 1) in blue.
Odds ratios as dots, with confidence intervals, „positive“ effects (> 1) in blue.

In the above example, if you do not specifiy axis limits, the boundaries will be calculated according to the lowest and highest confidence interval, thus fitting the diagram to the highest possible „zoom“. The next example demonstrates this with bar charts:

Odds ratios with confidence intervals, fitting the axes to maximum "zoom", too.
Odds ratios with confidence intervals, fitting the axes to maximum „zoom“, too.

Both diagrams contain model summaries in the lower right corner. You can change many visual parameters, for instance hiding the summary, changing bar colors, changing border or background colors, line and bar size etc.

If you dislike the grid bars to become narrower with increasing odds ratio values, you can use the transformTicks parameter, which uses exponential distances between the tick marks. This results in grid bars with (almost) equal distances. However, the tick values, of course, are accordingly set:

Odds ratios, grid bars with exponential distance, thicker bars and no error bars at bar ends.
Odds ratios, grid bars with exponential distance, thicker bars and no error bars at bar ends.


Plotting betas and standardized betas of linear regressions
Quite similar is my function sjPlotLinreg.R which visualizes the results of linear regressions. Thus, it requires a lm-object.

       axisLimits=c(-0.5, 0.9),
       axisTitle.x="beta (blue) and std. beta (red)",
Linear regression, with beta-values and confidence intervals (in blue) as well as standardized beta values (in red)
Linear regression, with beta-values and confidence intervals (in blue) as well as standardized beta values (in red)

As you can see, I have used predictorLabelSize=1 and breakLabelsAt=30 due to the long variable labels. By default, each label at the left axis would break into more lines, thus being narrower and worse to read. Then I used sort="std" to sort the predictors according to their standardized beta values (default would be ordering according to the beta values).

Linear regression, only beta values shown
Linear regression, only beta values shown

The showStandardBeta=FALSE makes the red dots (standardized beta values) and their connecting line disappear.

Linear regression, beta and standardized beta values are shown, value labels hidden.
Linear regression, beta and standardized beta values are shown, value labels hidden.

This last example shows how to hide the value labels inside the diagram, so you only have the dots for beta and standardized beta coefficients.

Last remark
In between I have also updated my other scripts. For instance, the sjPlotGroupFrequencies.R function can now also plot box plots or violin plots (see examples at the end of that posting). So make sure you have the latest version from my script page.

Easily plotting grouped bars with ggplot #rstats

This tutorial shows how to create diagrams with grouped bar charts or dot plots with ggplot. The groups can also be displayed as facet grids.

Importing the data from SPSS
All following examples are based on an imported SPSS data set. Refer to this posting for more details on how to do that and to my script page to download the scripts. This is important to know because the way the variable and value labels are accessed may depend on whether you use an imported SPSS dataset or not (i.e. you may have to change parameters to get the sample running).

You can, for instance, import your SPSS data like this, if you are using my script:

efc <- importSPSS("GER_Services_FU_PV_dt.sav")
efc_vars <- getVariableLabels(efc)
efc_labels <- getValueLabels(efc)

The R script
You can download the script from my script page. I will not describe the code in detail because the source code is (hopefully) well commented. Basically, the script just transforms the data from two variables (one count variable with categories and one grouping variables) to fit into the ggplot-requirements for plotting bar charts. You can use a lot of parameters to change the style of the output, e.g. you can plot bars or dots, dodged or stacked bars, change colors etc. and you don’t need to know how this works in ggplot. You simply pass your „preferred settings“ as parameters.

You can include the script via this single line:


Continue reading this post…