Reading from and writing to SPSS, SAS and STATA with R #rstats #sjPlot

On CRAN now

My sjPlot-package was updated on CRAN (binaries will be available soon, I guess). This update contains, besides many small improvements and fixes, two major features:

  1. First, new features to print table summaries of linear models and generalized linear models (for sjt.glm, the same new features were added as to sjt.lm – however, the manual page is not finished yet). I have introduced these features in a former posting.
  2. Second, functions for reading data from and writing to other statistical packages like SPSS, SAS or STATA have been revamped or new features have been added. Furthermore, there are improved getters and setters to extract and set variable and value labels. A short introduction is available online.

The haven package

There are two reasons why this update focuses on reading and writing data as well as getting and setting value and variable labels. First, I wanted to rename all functions who formerly had the prefixes sji. or sju. in order to have more „intuitive“ function names, so people better understand what these functions may do.

The second reason is the release of the haven package, which supports fast reading and writing from or to different file formats (like SPSS, SAS or STATA). I believe, this package will become frequently used when reading or writing data from/to other formats, so I wanted to ensure compatibility between sjPlot and haven imported data.

The haven package reads data to a data frame where all variables (vectors) are of class type labelled, which means these variables are atomic (i.e. they have numeric values, even if they are categorical or factors, see this introduction on RStudio) and each variable has – where applicable – a variable label and value labels attribute.
An example:

## Class 'labelled'  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##   ..- attr(*, "label")= chr "how dependent is the elder?"
##   ..- attr(*, "labels")= Named int [1:4] 1 2 3 4
##   .. ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

Until recently, the sjPlot package solely used the read.spss function from the foreign package to read data from SPSS. The foreign package uses following structure to import value and variable labels:

##  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##  - attr(*, "value.labels")= Named chr [1:4] "1" "2" "3" "4"
##   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
##  - attr(*, "variable.label")= chr "how dependent is the elder?"

Since version 1.7, sjPlot can also read data using the haven read-functions (simply use my_dataframe <- read_spss("path/to/spss-file.sav", option = "haven")).

These kind of attributes, whether from haven or foreign, provide huge advantages in case you want to plot or print (summaries of) variables and don’t want to manually set axis labels or titles, because you can extract these information from any variable’s attributes. This is one of the core functionality of all sjPlot plotting and table printing functions:

# load sample data
# set plot theme
sjp.setTheme(theme = "539")
# plot frequencies


The new sjPlot update can now deal with both structures of either haven or foreign imported data. It doesn’t matter whether efc2$e42dep from the above example was read with foreign, or is a labelled class vector from haven.

Also, reading value and variable labels works for both vector types. get_var_labels() and get_val_labels() extract variable and value labels from both haven-data and foreign-data.

Writing data

The constructor of the labelled class only supports creating value labels, not variable labels. Thus, writing data back to SPSS or STATA do not write variable labels by default (at least for new created variables – variables that have been read with haven and already have the variable label attribute label will correctly save back variable labels).

So I wrote a wrapper class to write data, called write_spss and write_stata. These functions convert your data, independent whether it was imported with the foreign or haven package, or if you manually created new variables, into a format that will keep value and variable labels when writing data to SPSS or STATA.

When you create new variables, make sure you use set_val_labels and set_var_labels to add the necessary label attributes to new variables:

# create dummy variable
dummy <- sample(1:4, 40, replace=TRUE)
# manually attach value and variable labels
dummy <- set_val_labels(dummy, c("very low", "low", "mid", "hi"))
dummy <- set_var_labels(dummy, "This is a dummy")
# check structure of dummy
##  atomic [1:40] 2 2 2 3 3 2 1 4 4 2 ...
##  - attr(*, "value.labels")= Named chr [1:4] "1" "2" "3" "4"
##   ..- attr(*, "names")= chr [1:4] "very low" "low" "mid" "hi"
##  - attr(*, "variable.label")= chr "This is a dummy"

Finally, I just like to mention convenient conversion functions, e.g. to convert atomic variables into factors without losing the label attributes. These are to_fac, to_label or to_value. Further notes on the read and write functions of the sjPlot package are in the online manual.


Ein Kommentar zu „Reading from and writing to SPSS, SAS and STATA with R #rstats #sjPlot

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden / Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s