Schlagwort: ggplot

Data visualization in social sciences – what’s new in the sjPlot-package? #rstats

My sjPlot package just reached version 2.0 and got many updates during the couple of last months. The focus was less on adding new functions; rather, I improved existing functions by adding new smaller and bigger features to make working with the package easier and more reliable. In this blog post, I will report some of the new features.

Consistent name style of arguments

Most notably, I tried to give all package functions a consistent naming style or pattern for arguments. In previous versions, mixing different name-styles was sometimes very confusing. For example, some functions used showNA, others na.rm or show.na. Or some functions used hideLegend, some showLegend and others again show.legend.

Now, all argument names are 1) lower case, 2) dot separated for longer words and are 3) grouped according to their function (i.e., if you open the docs for ?sjt.lm, you’ll find all show. arguments, then all string. and finally all digits. arguments). I know that this means that you most likely have to completely re-write your code that uses sjPlot-function calls, but I think, in the long run, this makes working with the sjPlot package easier

Support for different model families and link functions

In previous package versions, functions related to generalized linear models (like sjp.glm or sjp.glmer) were hard coded for binomial model families for most plot types. Some effect or prediction plots only worked for logistic regression, because predictions were based on plogis. Also, automatic entitling of plots always included „probability“, even for count models.

In the past package updates and especially and the last major update, prediction or effect plot are now based on the link-inverse function of the models, so all common model families and link functions should work with sjPlot now.

Predictions and effect plots

In some cases, it is easier to interprete the predicted probabilities, incidents rates or marginal effects instead of the related estimate numbers (odds ratios, incident rate ratios, beta). For linear models (sjp.lm), linear mixed models (sjp.lmer), generalized linear models (sjp.glm) and generalized linear mixed models (sjp.glmer), there are three different plot types to plot predicted values or marginal effects:

  1. type = "slope" (or type = "fe.slope" and type = "ri.slope" for mixed models) to plot unadjusted predicted values, i.e. the relation between model terms and response.
  2. type = "eff" to plot marginal effects, adjusted for all predictors.
  3. type = "pred" (and type = "pred.fe" for mixed models) to plot predicted values against reponse, for particular model terms.

The following examples are taken from the vignette of the sjp.glm-function.

1. Predicted values, unadjusted

The predicted values from this plot type are based on the intercept’s estimate and each specific term’s estimate. All other co-variates are set to zero (i.e. ignored), which corresponds to family(fit)$linkinv(eta = b0 + bi * xi) (where xi is the estimate).

Predicted values, unadjusted

A probability curve of all predictors is plotted, which indicates the probability of the event (indicated by the response) occuring for each value of the predictor (not adjusted for remaining co-variates). In the above example, the first panel in the plot would be interpreted as: with increasing Barthel-Index (which means, better functional / physical status), the probability that caring for a dependent person is negatively perceived, decreases (in short: the less dependent a person I care for is, the less negative is the impact of care).

2. Effect plots

For marginal effects (predicted marginal probabilities resp. predicted marginal incident rates), all remaining co-variates are set to the mean, so this plot type adjusts for co-variates. Obtained results are based on the effects-package.

Marginal effects, adjusted

The effect plots can now also be non-faceted, and for selected model terms only (using the facit.grid and vars arguments).

3. Predicting values

The plot-type for predicting values did not produce any useful results in former package versions, because it just called the predict function without relationship to any predictor, or meaningful data. Now, this plot-type was completely revised. With type = "pred" (formerly, "y.pc"), you can plot predicted values for the response, related to specific model predictors. The predicted values of the response are computed, which corresponds to predict(fit, type = "response"). This plot type requires the vars argument to select specific terms that should be used for the x-axis and – optional – as grouping factor. Hence, vars must be a character vector with the names of one or two model predictors.

Predicting values

Predicting values

Table functions for mixed models

The table functions were also revised, especially for mixed models. You now have more details in the random parts section of the table, which now also shows the variance components of the random parts, or (pseudo-)r2-values.

The tables are created as HTML-page and displayed in your IDE’s viewer or your web browser. You can see many examples at the package vignettes-page. For the following example, I have taken a screenshot, because else the blog’s style sheet would break the table layout. Anyway, this is an example of a quickly produced table:

table

Closing remarks

There have been a lot of improvements made in the sjPlot package during the past update(s). Above you see example of the most obvious user-visible changes. But there were also lots of other smaller and bigger improvements. E.g. plotting functions with different plot types, like sjp.glm, have many arguments; most of them only applied to specific plot types, while they were ignored by other plot types. Now, all plot types support more or mostly all arguments, and the documentation should be clearer about what the functions and their arguments do.

I hope you’ll enjoy the sjPlot-package. Feel free to submit issues or suggestions to the dedicated GitHub-page.

sjPlot package and related online manuals updated #rstats # ggplot

My sjPlot package for data visualization has just been updated on CRAN. I’ve added some features to existing function, which I want to introduce here.

Plotting linear models

So far, plotting model assumptions of linear models or plotting slopes for each estimate of linear models were spread over several functions. Now, these plot types have been integrated into the sjp.lm function, where you can select the plot type with the type parameter. Furthermore, plotting standardized coefficients now also plot the related confidence intervals.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lm

Plotting generalized linear models

Beside odds ratios, you now can also plot the predicted probabilities of the outcome for each predictor of generalized linear models. In case you have continuous variables, these kind of plots may be more intuitive than an odds ratio value.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.glm

Plotting (generalized) linear mixed effects models

The plotting function for creating plots of (generalized) linear mixed effects models (sjp.lmer and sjp.glmer) also got new plot types over the course of the last weeks.

For sjp.lmer, we have

  • re (default) for estimates of random effects
  • fe for estimates of fixed effects
  • fe.std for standardized estimates of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.ri for fixed effects slopes depending on the random intercept.

and for sjp.glmer, we have

  • re (default) for odds ratios of random effects
  • fe for odds ratios of fixed effects
  • fe.cor for correlation matrix of fixed effects
  • re.qq for a QQ-plot of random effects (random effects quantiles against standard normal quantiles)
  • fe.pc or fe.prob to plot probability curves (predicted probabilities) of all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.
  • ri.pc or ri.prob to plot probability curves (predicted probabilities) of random intercept variances for all fixed effects coefficients. Use facet.grid to decide whether to plot each coefficient as separate plot or as integrated faceted plot.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.lmer and www.strengejacke.de/sjPlot/sjp.glmer

Plotting interaction terms of (generalized) linear (mixed effects) models

Another function, where new features were added, is sjp.int (formerly known as sjp.lm.int). This function is now kind of generic and can plot interactions of

  • linar models (lm)
  • generalized linar models (glm)
  • linar mixed effects models (lme4::lmer)
  • generalized linar mixed effects models (lme4::glmer)

For linear models (both normal and mixed effects), slopes of interaction terms are plotted. For generalized linear models, the predicted probabilities of the outcome towards the interaction terms is plotted.

Detailed examples can be found here:
www.strengejacke.de/sjPlot/sjp.int

Plotting Likert scales

Finally, a comprehensive documentation for the sjp.likert function is finsihed, which can be found here:
www.strengejacke.de/sjPlot/sjp.likert

Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4

In the first part on visualizing (generalized) linear mixed effects models, I showed examples of the new functions in the sjPlot package to visualize fixed and random effects (estimates and odds ratios) of (g)lmer results. Meanwhile, I added further features to the functions, which I like to introduce here. This posting is based on the online manual of the sjPlot package.

In this posting, I’d like to give examples for diagnostic and probability plots of odds ratios. The latter examples, of course, only refer to the sjp.glmer function (generalized mixed models). To reproduce these examples, you need the version 1.59 (or higher) of the package, which can be found at GitHub. A submission to CRAN is planned for the next days…

„Visualizing (generalized) linear mixed effects models, part 2 #rstats #lme4“ weiterlesen

Visualize pre-post comparison of intervention #rstats

My sjPlot-package was just updated on CRAN, introducing a new function called sjp.emm.int to plot estimated marginal means (least-squares means) of linear models with interaction terms. Or: plotting adjusted means of an ANCOVA.

The idea to this function came up when we wanted to analyze the effect of an intervention (an educational programme on knowledge about mental disorders and associated stigma) between two groups: a „treatmeant“ group (city) where a campaign on mental disorders was conducted and another city without this campaign. People from both cities were asked about their attitudes and knowledge about specific mental disorders at t0 before the campaign started in the one city. Some month later (t1), again people from both cities were asked the same questions. The intention was to see a) whether there were differences in knowledge and pro-social attidutes of people towards mental disorders and b) if the compaign successfully reduces stigma and increases knowledge.

To analyse these questions, we used an ANCOVA with knowledge and stigma score as dependent variables, „city“ and „time“ (t0 versus t1) as predictors and adjusted for covariates like age, sex, education etc. The estimated marginal means (or least-squares means) show you the differences of the dependent variable.

Here’s an example plot, quickly done with the sjp.emm.int function:
sjpemmint

Since the data is not publicly available, I’ve set an an documentation with reproducable examples (though those example do not fit very well…).

The latest development snapshot of my package is available on GitHub.

BTW: You may have noticed that this function is quite similar to the sjp.lm.int function for visually interpreting interaction terms in linear models…

Simply creating various scatter plots with ggplot #rstats

Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots.

In my package, there’s a function called sjp.scatter for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set:

data(efc)

The simplest function call is by just providing two variables, one for the x- and one for the y-axis:

sjp.scatter(efc$c160age, efc$e17age)

which plots following graph:
sct_01

If you have continuous variables with a larger scale, you shouldn’t have problems with overplotting or overlaying dots. However, this problem usually occurs, if you have variables with just a few categories (factor levels). The function automatically estimates the amount of overlaying dots and then automatically jitters them, like in following example, which also includes a marginal rug-plot:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE)

sct_02

The same plot, when auto-jittering is turned off, would look like this:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code,
            showRug=TRUE, autojitter=FALSE)

sct_03

You can also add a grouping variable. The scatter plot is then „divided“ into as many groups as indicated by the grouping variable. In the next example, two variables (elder’s and carer’s age) are grouped by different dependency levels of the elderly. Additionally, a fitted line for each group is plotted:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE)

sct_04

If the groups are difficult to distinguish in a single plot area, the graph can be faceted by groups. This is shown in the last example, where the same scatter plot as above is plotted with facets for each group:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE, useFacetGrid=TRUE, showSE=TRUE)

sct_05

Find a complete overview of the various function options in the package-help or at inside-r.

Comparing multiple (g)lm in one graph #rstats

It’s been a while since a user of my plotting-functions asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a function that plots several (g)lm-objects in a single ggplot-graph.

The following examples are take from my sjPlot package which is available on CRAN. Once you’ve installed the package, you can run one of the examples provided in the function’s documentation:

# prepare dummy variables for binary logistic regression
y1 <- ifelse(swiss$Fertility<median(swiss$Fertility), 0, 1)
y2 <- ifelse(swiss$Infant.Mortality<median(swiss$Infant.Mortality), 0, 1)
y3 <- ifelse(swiss$Agriculture<median(swiss$Agriculture), 0, 1)

# Now fit the models. Note that all models share the same predictors
# and only differ in their dependent variable (y1, y2 and y3)
fitOR1 <- glm(y1 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR2 <- glm(y2 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR3 <- glm(y3 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))

# plot multiple models
sjp.glmm(fitOR1, fitOR2, fitOR3)

multiodds1

Thanks to the help of a stackoverflow user, I now know that the order of aes-parameters matters in case you have dodged positioning of geoms on a discrete scale. An example: I use following code in my function ggplot(finalodds, aes(y=OR, x=xpos, colour=grp, alpha=pa)) to apply different colours to each model and setting an alpha-level for geoms depending on the p-level. If the alpha-aes would appear before the colour-aes, the order of lines representing a model may be different for different x-values (see stackoverflow example).

Another more appealing example (not reproducable, since it relies on data from a current research project):
multiodds2

And finally an example where p-levels are represented by different shapes and non-significant odds have a lower alpha-level:
multiodds3

sjPlot 0.9 (data visualization package) now on CRAN #rstats

Since version 0.8, my package for data visualization using ggplot has been released on the Comprehensive R Archive Network (CRAN), which means you can simply install the package with install.packages("sjPlot").

Last week, version 0.9 was released. Binaries are already available for OS X and Windows, and source code for Linux. Further updates will no longer be announced on this blog (except for new functions which may be described in dedicated blog postings), so please use the update function in order make sure you are using the latest package version.

sjPlot – data visualization for statistics (in social science) #rstats

I’d like to announce the release of version 0.7 of my R package for data visualization and give a small overview of this package (download and installation instructions can be found on the package page, detailed examples on RPubs).

What does this package do?
In short, the functions in this package mostly do two things:

  1. compute basic or advanced statistical analyses
  2. either plot the results as ggplot-diagram or print them as html-table

However, meanwhile the amount of functions has increased, hence you’ll also find some utility functions beside the plotting functions.

How does this package help me?
Basically, this package either helps those users…

  • who have difficulties using and/or understanding all possibilities that ggplot offers to create plots, simply by providing intuitive function parameters, which allow for manipulating the appearance of plots; or
  • who don’t want to set up complex ggplot-object each time from the scratch; or
  • like quick inspections of (basic) statistics via (html-)tables that are shown in the GUI viewer pane or default browser; or
  • want easily create beautiful table outputs that can be imported in office applications.

Furthermore, for advanced users, each functions returns either the prepared ggplot-object (in case of sjp-plotting functions) or the HTML-tables (in case of sjt-table-output functions), which than can be manipulated even further (for instance, for ggplot-objects, you can specify certain parameters that cannot be modified via the sjPlot package or html-tables could be integrated into knitr-documents).

What are all these functions about?
There’s a certain naming convention for the functions:

  • sjc – collection of functions useful for carrying out cluster analyses
  • sji – collection of functions for data import and manipulation
  • sjp – collection plotting functions, the „core“ of this package
  • sjt – collection of function that create (HTML) table outputs (instead of ggplot-graphics
  • sju – collection of statistical utility functions

Use cases?

  • You can plot results of Anova, correlations, histograms, box plots, bar plots, (generalized) linear models, likert scales, PCA, proportional tables as bar chart etc.
  • You can create plots to analyse model assumptions (lm, glm), predictor interactions, multiple contigency tables etc.
  • You can create table outputs instead of graphs for most plotting functions
  • With the import and utility functions, you can, for instance, extract beta coefficients of linear models, convert numeric scales into grouped factors, perform statistical tests, import SPSS data sets (and retrieve variable and value labels from the importet data), convert factors to numeric variables (and vice versa)…

Final remarks
At the bottom of my package page you’ll find some examples of selected functions that have been published on this blog before I created the package. Furthermore, the package includes a sample dataset from one of my research projects. Once the package is installed, you can test each function by running the examples. All news and recent changes can be found in the NEWS section of the package help (type ?sjPlot to access the help file inside R).

I tried to write a very comprehensive documentation for each function and their parameters, hopefully this will help you using my package…

Any comments, suggestions etc. are very welcome!

sjPlotting functions now as package available #rstats

This weekend I had some time to deal with package building in R. After some struggling, I now managed to setup RStudio, Roxygen and MikTex properly so I can compile my collection of R-scripts into a package that even succeeds the package check.

Downloads (package and manual) as well as package description are available at the package information page!

Since the packages successfully passed the package check and a manual could also be created, I’ll probably submit my package to the CRAN. Currently, I’m only able to compile the source and the Windows binaries of the package, because at home I use RStudio on my Mac with OS X 10.9 Mavericks. It seems that there’s an issue with the GNU Tar on Mavericks, which is needed to compile the OS X binaries… I’m not sure whether it’s enough to just submit the source the the CRAN.

Anyway, please check out my package and let me know if you encounter any problems or if you have suggestions on improving the documentation etc.

Open questions

  • How do I write an „ü“ in the R-documentation (needed for my family name in author information)? The documentation is inside the R-files, the RD-files are created using Roxygen.
  • How do I include datasets inside an R-package? I would like to include an SPSS-dataset (.sav-File), so I can make the examples of my sji.XYZ functions running… (currently they’re outcommented so the package will compile and pass its check properly)
  • How to include a change log inside R-packages?