sjPlot 1.3 available #rstats #sjPlot

I just submitted my package update (version 1.3) to CRAN. The download is already available (currently source, binaries follow). While the last two updates included new functions for table outputs (see here and here for details on these functions), the current update mostly provides small helper functions. The focus of this update was to improve existing functions and make their handling easier and more comfortable.

Automatic label detection

One major feature is that many functions now automatically detect variables and value labels, if possible. For instance, if you have imported an SPSS dataset (e.g. with the function sji.SPSS), value labels are automatically attached to all variables of the data frame. With the autoAttachVarLabels parameter set to TRUE, even variable labels will be attached to the data frame after importing the SPSS data. These labels are automatically detected by most functions of the package now. But this does not only apply to importet SPSS-data. If you have factors with specified factor levels, these will also automatically be used as value labels. Furthermore, you can manually attach value and variable labels using the new function sji.setVariableLabels and sji.setValueLabels.

But what are the exactly the benefits of this new feature? Let me give an example. To plot a proportional table with axis and legend labels, prior to sjPlot 1.3 you needed following code:

data(efc)
efc.val <- sji.getValueLabels(efc)
efc.var <- sji.getVariableLabels(efc)
sjp.xtab(efc$e16sex,
         efc$e42dep,
         axisLabels.x=efc.val[['e16sex']],
         legendTitle=efc.var['e42dep'],
         legendLabels=efc.val[['e42dep']])

Since version 1.3, you only need to write:

data(efc)
sjp.xtab(efc$e16sex, efc$e42dep)

Reliability check for index scores

One new table output function included in this update is sjt.itemanalysis, which helps performing an item analysis on a scale or data frame if you want to develop index scores.

Let’s say you have several items and you want to compute a principal component analysis in order to identify different components that can be composed to an index score. In such cases, you might want to perform reliability and item discrimination tests. This is shown in the following example, which performs a PCA on the COPE-Index-scale, followed by a reliability and item analysis of each extracted “score”:

data(efc)
# recveive first item of COPE-index scale
start <- which(colnames(efc)=="c82cop1")
# recveive last item of COPE-index scale
end <- which(colnames(efc)=="c90cop9")
# create data frame of cope-index-items
df <- as.data.frame(efc[,c(start:end)])
colnames(df) <- sji.getVariableLabels(efc)[c(start:end)]
# compute PCA on cope index and return
# "group classifications" of factors
factor.groups <- sjt.pca(df, no.output=TRUE)$factor.index
# perform item analysis
sjt.itemanalysis(df, factor.groups)

The result is following table, where two components have been extracted via the PCA, and the variables belonging each component are treated as one “index score” (note that you don’t need to define groups, you can also treat a data frame as one single “index”):
relia

The output of the computed PCA was suppressed by no.output=TRUE. To better understand the above figure, take a look at the PCA results, where two components have been extracted:
pca_item_reli

Beside that, many functions – especially the table output functions – got new parameters to change the appearance of the output (amount of digits, including NA’s, additional information in tables etc.). Refer to the package news to get a complete overview of what was changed since the last version.

The latest developer build can be found on github.

sjPlot – data visualization for statistics (in social science) #rstats

I’d like to announce the release of version 0.7 of my R package for data visualization and give a small overview of this package (download and installation instructions can be found on the package page, detailed examples on RPubs).

What does this package do?
In short, the functions in this package mostly do two things:

  1. compute basic or advanced statistical analyses
  2. either plot the results as ggplot-diagram or print them as html-table

However, meanwhile the amount of functions has increased, hence you’ll also find some utility functions beside the plotting functions.

How does this package help me?
Basically, this package either helps those users…

  • who have difficulties using and/or understanding all possibilities that ggplot offers to create plots, simply by providing intuitive function parameters, which allow for manipulating the appearance of plots; or
  • who don’t want to set up complex ggplot-object each time from the scratch; or
  • like quick inspections of (basic) statistics via (html-)tables that are shown in the GUI viewer pane or default browser; or
  • want easily create beautiful table outputs that can be imported in office applications.

Furthermore, for advanced users, each functions returns either the prepared ggplot-object (in case of sjp-plotting functions) or the HTML-tables (in case of sjt-table-output functions), which than can be manipulated even further (for instance, for ggplot-objects, you can specify certain parameters that cannot be modified via the sjPlot package or html-tables could be integrated into knitr-documents).

What are all these functions about?
There’s a certain naming convention for the functions:

  • sjc – collection of functions useful for carrying out cluster analyses
  • sji – collection of functions for data import and manipulation
  • sjp – collection plotting functions, the “core” of this package
  • sjt – collection of function that create (HTML) table outputs (instead of ggplot-graphics
  • sju – collection of statistical utility functions

Use cases?

  • You can plot results of Anova, correlations, histograms, box plots, bar plots, (generalized) linear models, likert scales, PCA, proportional tables as bar chart etc.
  • You can create plots to analyse model assumptions (lm, glm), predictor interactions, multiple contigency tables etc.
  • You can create table outputs instead of graphs for most plotting functions
  • With the import and utility functions, you can, for instance, extract beta coefficients of linear models, convert numeric scales into grouped factors, perform statistical tests, import SPSS data sets (and retrieve variable and value labels from the importet data), convert factors to numeric variables (and vice versa)…

Final remarks
At the bottom of my package page you’ll find some examples of selected functions that have been published on this blog before I created the package. Furthermore, the package includes a sample dataset from one of my research projects. Once the package is installed, you can test each function by running the examples. All news and recent changes can be found in the NEWS section of the package help (type ?sjPlot to access the help file inside R).

I tried to write a very comprehensive documentation for each function and their parameters, hopefully this will help you using my package…

Any comments, suggestions etc. are very welcome!

sjPlotting functions now as package available #rstats

This weekend I had some time to deal with package building in R. After some struggling, I now managed to setup RStudio, Roxygen and MikTex properly so I can compile my collection of R-scripts into a package that even succeeds the package check.

Downloads (package and manual) as well as package description are available at the package information page!

Since the packages successfully passed the package check and a manual could also be created, I’ll probably submit my package to the CRAN. Currently, I’m only able to compile the source and the Windows binaries of the package, because at home I use RStudio on my Mac with OS X 10.9 Mavericks. It seems that there’s an issue with the GNU Tar on Mavericks, which is needed to compile the OS X binaries… I’m not sure whether it’s enough to just submit the source the the CRAN.

Anyway, please check out my package and let me know if you encounter any problems or if you have suggestions on improving the documentation etc.

Open questions

  • How do I write an “ü” in the R-documentation (needed for my family name in author information)? The documentation is inside the R-files, the RD-files are created using Roxygen.
  • How do I include datasets inside an R-package? I would like to include an SPSS-dataset (.sav-File), so I can make the examples of my sji.XYZ functions running… (currently they’re outcommented so the package will compile and pass its check properly)
  • How to include a change log inside R-packages?