sjPlot 1.6 – major revisions, anyone for beta testing? #rstats

In the last couple of weeks I have rewritten some core parts of my sjPlot-package and also revised the package- and online documentation.

Most notably are the changes that affect theming and appearance of plots and figures. There’s a new function called sjp.setTheme which now sets theme-options for all sjp-functions, which means

  1. you only need to specify theme / appearance option once and no longer need to repeat these parameter for each sjp-function call you make
  2. due to this change, all sjp-functions have much less parameters, making the functions and documentation clearer

Furthermore, due to some problems with connecting / updating to the RPubs server, I decided to upload my online documentation for the package to my own site. You will now find the latest, comprehensive documentation and examples for various functions of the sjPlot package at www.strengejacke.de/sjPlot/. For instance, take a look at customizing plot appearance and see how the new theming feature of the package allows both easier customization of plots as well as better integration of theming packages like ggthemr or ggthemes.

Updating the sjPlot package to CRAN is planned soon, however, I kindly ask you to test the current development snapshot, which is hosted on GitHub. You can easily install the package from there using the devtools-package (devtools::install_github("devel", "sjPlot")). The current snapshot is (very) stable and I appreciate any feedbacks or bug reports (if possible, use the issue tracker from GitHub).

The current change log with all new function, changes and bug fixes can also be found on GitHub.

Visualize pre-post comparison of intervention #rstats

My sjPlot-package was just updated on CRAN, introducing a new function called sjp.emm.int to plot estimated marginal means (least-squares means) of linear models with interaction terms. Or: plotting adjusted means of an ANCOVA.

The idea to this function came up when we wanted to analyze the effect of an intervention (an educational programme on knowledge about mental disorders and associated stigma) between two groups: a “treatmeant” group (city) where a campaign on mental disorders was conducted and another city without this campaign. People from both cities were asked about their attitudes and knowledge about specific mental disorders at t0 before the campaign started in the one city. Some month later (t1), again people from both cities were asked the same questions. The intention was to see a) whether there were differences in knowledge and pro-social attidutes of people towards mental disorders and b) if the compaign successfully reduces stigma and increases knowledge.

To analyse these questions, we used an ANCOVA with knowledge and stigma score as dependent variables, “city” and “time” (t0 versus t1) as predictors and adjusted for covariates like age, sex, education etc. The estimated marginal means (or least-squares means) show you the differences of the dependent variable.

Here’s an example plot, quickly done with the sjp.emm.int function:
sjpemmint

Since the data is not publicly available, I’ve set an an documentation with reproducable examples (though those example do not fit very well…).

The latest development snapshot of my package is available on GitHub.

BTW: You may have noticed that this function is quite similar to the sjp.lm.int function for visually interpreting interaction terms in linear models…

sjPlot: New options for creating beautiful tables and documentation #rstats

A new update of my sjPlot package was just released on CRAN. This release focused on improving existing functions and bug fixes again. Especially the table output functions (see my previous blog posts on table output functions here and here) improved a lot. Tables now have more and better possibilities for style customization and knitr integration. A basic introduction into the new features is given in this document.

To make it easier to understand all features, I started to setup comprehensive documentations for all sjPlot functions on strengejacke.de.

sjPlot 1.3 available #rstats #sjPlot

I just submitted my package update (version 1.3) to CRAN. The download is already available (currently source, binaries follow). While the last two updates included new functions for table outputs (see here and here for details on these functions), the current update mostly provides small helper functions. The focus of this update was to improve existing functions and make their handling easier and more comfortable.

Automatic label detection

One major feature is that many functions now automatically detect variables and value labels, if possible. For instance, if you have imported an SPSS dataset (e.g. with the function sji.SPSS), value labels are automatically attached to all variables of the data frame. With the autoAttachVarLabels parameter set to TRUE, even variable labels will be attached to the data frame after importing the SPSS data. These labels are automatically detected by most functions of the package now. But this does not only apply to importet SPSS-data. If you have factors with specified factor levels, these will also automatically be used as value labels. Furthermore, you can manually attach value and variable labels using the new function sji.setVariableLabels and sji.setValueLabels.

But what are the exactly the benefits of this new feature? Let me give an example. To plot a proportional table with axis and legend labels, prior to sjPlot 1.3 you needed following code:

data(efc)
efc.val <- sji.getValueLabels(efc)
efc.var <- sji.getVariableLabels(efc)
sjp.xtab(efc$e16sex,
         efc$e42dep,
         axisLabels.x=efc.val[['e16sex']],
         legendTitle=efc.var['e42dep'],
         legendLabels=efc.val[['e42dep']])

Since version 1.3, you only need to write:

data(efc)
sjp.xtab(efc$e16sex, efc$e42dep)

Reliability check for index scores

One new table output function included in this update is sjt.itemanalysis, which helps performing an item analysis on a scale or data frame if you want to develop index scores.

Let’s say you have several items and you want to compute a principal component analysis in order to identify different components that can be composed to an index score. In such cases, you might want to perform reliability and item discrimination tests. This is shown in the following example, which performs a PCA on the COPE-Index-scale, followed by a reliability and item analysis of each extracted “score”:

data(efc)
# recveive first item of COPE-index scale
start <- which(colnames(efc)=="c82cop1")
# recveive last item of COPE-index scale
end <- which(colnames(efc)=="c90cop9")
# create data frame of cope-index-items
df <- as.data.frame(efc[,c(start:end)])
colnames(df) <- sji.getVariableLabels(efc)[c(start:end)]
# compute PCA on cope index and return
# "group classifications" of factors
factor.groups <- sjt.pca(df, no.output=TRUE)$factor.index
# perform item analysis
sjt.itemanalysis(df, factor.groups)

The result is following table, where two components have been extracted via the PCA, and the variables belonging each component are treated as one “index score” (note that you don’t need to define groups, you can also treat a data frame as one single “index”):
relia

The output of the computed PCA was suppressed by no.output=TRUE. To better understand the above figure, take a look at the PCA results, where two components have been extracted:
pca_item_reli

Beside that, many functions – especially the table output functions – got new parameters to change the appearance of the output (amount of digits, including NA’s, additional information in tables etc.). Refer to the package news to get a complete overview of what was changed since the last version.

The latest developer build can be found on github.

Developer snapshots of #sjPlot-package now on #Github #rstats

Finally, I managed to setup a GitHub repository. From now on, the latest developer snapshot of my sjPlot-package will be published right here: https://github.com/sjPlot/devel.

Please post issues there, download the latest developer build for testing purposes or help developing the wiki-page with examples for package usage etc.

Btw, if somebody knows, why I can’t get GitHub running with RStudio, let me know… I always get this issue, which was already reported by other users. Currently, I’m using the GitHub.app to commit changes.

Beautiful table outputs in R, part 2 #rstats #sjPlot

First of all, I’d like to thank my readers for the lots of feedback on my last post on beautiful outputs in R. I tried to consider all suggestions, updated the existing table-output-functions and added some new ones, which will be described in this post. The updated package is already available on CRAN.

This posting is divided in two major parts:

  1. the new functions are described, and
  2. the new features of all table-output-functions are introduced (including knitr-integration and office-import)

Read on …

Simply creating various scatter plots with ggplot #rstats

Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots.

In my package, there’s a function called sjp.scatter for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set:

data(efc)

The simplest function call is by just providing two variables, one for the x- and one for the y-axis:

sjp.scatter(efc$c160age, efc$e17age)

which plots following graph:
sct_01

If you have continuous variables with a larger scale, you shouldn’t have problems with overplotting or overlaying dots. However, this problem usually occurs, if you have variables with just a few categories (factor levels). The function automatically estimates the amount of overlaying dots and then automatically jitters them, like in following example, which also includes a marginal rug-plot:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE)

sct_02

The same plot, when auto-jittering is turned off, would look like this:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code,
            showRug=TRUE, autojitter=FALSE)

sct_03

You can also add a grouping variable. The scatter plot is then “divided” into as many groups as indicated by the grouping variable. In the next example, two variables (elder’s and carer’s age) are grouped by different dependency levels of the elderly. Additionally, a fitted line for each group is plotted:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE)

sct_04

If the groups are difficult to distinguish in a single plot area, the graph can be faceted by groups. This is shown in the last example, where the same scatter plot as above is plotted with facets for each group:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE, useFacetGrid=TRUE, showSE=TRUE)

sct_05

Find a complete overview of the various function options in the package-help or at inside-r.

No need for SPSS – beautiful output in R #rstats

Note: There’s a second part of this series here.

About one year ago, I seriously started migrating from SPSS to R. Though I’m still using SPSS (because I have to in some situations), I’m quite comfortable and happy with R now and learnt a lot in the past months. But since SPSS is still very wide spread in social sciences, I get asked every now and then, whether I really needed to learn R, because SPSS meets all my needs…

Well, learning R had at least two major benefits for me: 1.) I could improve my statistical knowledge a lot, simply by using formulas, asking why certain R commands do not automatically give the same results like SPSS, reading R resources and papers etc. and 2.) the possibilities of data visualization are way better in R than in SPSS (though SPSS can do well as well…). Of course, there are even many more reasons to use R.

Still, one thing I often miss in R is a beautiful output of simple statistics or maybe even advanced statistics. Not always as plot or graph, but neither as “cryptic” console output. I’d like to have a simple table view, just like the SPSS output window (though the SPSS output is not “beautiful”). That’s why I started writing functions that put the results of certain statistics in HTML tables. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane (like in RStudio viewer pane).

All of the following functions are available in my sjPlot-package on CRAN.

Read on …

Comparing multiple (g)lm in one graph #rstats

It’s been a while since a user of my plotting-functions asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a function that plots several (g)lm-objects in a single ggplot-graph.

The following examples are take from my sjPlot package which is available on CRAN. Once you’ve installed the package, you can run one of the examples provided in the function’s documentation:

# prepare dummy variables for binary logistic regression
y1 <- ifelse(swiss$Fertility<median(swiss$Fertility), 0, 1)
y2 <- ifelse(swiss$Infant.Mortality<median(swiss$Infant.Mortality), 0, 1)
y3 <- ifelse(swiss$Agriculture<median(swiss$Agriculture), 0, 1)

# Now fit the models. Note that all models share the same predictors
# and only differ in their dependent variable (y1, y2 and y3)
fitOR1 <- glm(y1 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR2 <- glm(y2 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR3 <- glm(y3 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))

# plot multiple models
sjp.glmm(fitOR1, fitOR2, fitOR3)

multiodds1

Thanks to the help of a stackoverflow user, I now know that the order of aes-parameters matters in case you have dodged positioning of geoms on a discrete scale. An example: I use following code in my function ggplot(finalodds, aes(y=OR, x=xpos, colour=grp, alpha=pa)) to apply different colours to each model and setting an alpha-level for geoms depending on the p-level. If the alpha-aes would appear before the colour-aes, the order of lines representing a model may be different for different x-values (see stackoverflow example).

Another more appealing example (not reproducable, since it relies on data from a current research project):
multiodds2

And finally an example where p-levels are represented by different shapes and non-significant odds have a lower alpha-level:
multiodds3

sjPlot 0.9 (data visualization package) now on CRAN #rstats

Since version 0.8, my package for data visualization using ggplot has been released on the Comprehensive R Archive Network (CRAN), which means you can simply install the package with install.packages("sjPlot").

Last week, version 0.9 was released. Binaries are already available for OS X and Windows, and source code for Linux. Further updates will no longer be announced on this blog (except for new functions which may be described in dedicated blog postings), so please use the update function in order make sure you are using the latest package version.