Simplify your R workflow with functions #rstats

Update/ Thanks to Bernd I could improve the function of how to import the data, so here’s the updated script! /Update

In R, you often may have scripts or code snippets that will be reused. In such cases, you can write functions for your every-day-tasks. For instance, importing and converting data is such a task. I have written a small function importSPSS.R to do this:

importSPSS <- function(path, enc=NA) {
  # init foreign package
  # import data as data frame
  data.spss <- read.spss(path,, use.value.labels=FALSE, reencode=enc)
  # return data frame
getValueLabels <- function(dat) {
  a <- lapply(dat, FUN = getValLabels)
  return (a)
getValLabels <- function(x){
  rev(names(attr(x, "value.labels")))
getVariableLabels <- function(dat) {
  return(attr(dat, "variable.labels"))

This small function only gives little benefits regarding the saved typing effort. Referring to the code example under Migration, step 3: Importing (SPSS) variable and value labels, following things will change:

# Use "source" instead of "library"
# load data as data frame (function call)
myDat <- importSPSS("SPSS-dataset.sav")
# copy all variable labels in separated list
myDat_vars <- getVariableLabels(myDat)
# copy all value labels as separated list (function call)
myDat_labels <- getValueLabels(myDat)

The benefit especially lies in getting access to value labels. Instead of

hist(myDat[,86], main=myDat_vars[86], labels=rev(attr(myDat_labels[[86]], "names")), breaks=c(0:4), ylim=c(0,400), xlab=NULL, ylab=NULL)

we can now write

hist(myDat[,86], main=myDat_vars[86], labels=myDat_labels[[86]], breaks=c(0:4), ylim=c(0,400), xlab=NULL, ylab=NULL)

so we don’t need to call the attr-function nor remember to reverse the label order for plotting.


6 Kommentare zu „Simplify your R workflow with functions #rstats

  1. Daniel, it is good to hear that you are switching to R. So, please allow me to make a few suggestions:

    1. You might want to put your frequently-used-functions in your file. See or for more information. Or, in the long run, you might want to learn how to write your own package, which, actually, is not that difficult.

    2. The getValueLabel() function does not look very R-like. You almost never have to use loops in R. Instead, you are looking for a vectorized solution („implicit loops“) (google for „R vectorization“).

    Given that you have loaded the spss dataset via

    … = TRUE, use.value.labels = FALSE…

    you can access the value labels for variable „f111463“ (I am using the Generations and Gender Survey in my example) via

    names(attr(a$f111463, „value.labels“))

    Sure, you might want to write a function to avoid the ugly names(attr(…)) expression.

    3. Converting a list to a data.frame may work in this case…

    # convert list to data frame
    efc <-

    but may fail in many other cases. I suggest that you better use

    read.spss(… = TRUE …)

    1. Hello Bernd,
      thank you for your comments! As you’ve noticed, I’m not comprehensively familiar with all R functions right now, that’s why I still often come back to loops instead of possible more elegant and „R-like“ solutions. So, it’s still learn-in-progress for me, and I appreciate any comments like yours to improve my R knowledge. 🙂
      If you recommend setting the parameter to TRUE, I might consider writing another function which copies the value labels to a list (to avoid the names(attr) expression).

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden / Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s