Simplify your R workflow with functions #rstats

Update/ Thanks to Bernd I could improve the function of how to import the data, so here’s the updated script! /Update

In R, you often may have scripts or code snippets that will be reused. In such cases, you can write functions for your every-day-tasks. For instance, importing and converting data is such a task. I have written a small function importSPSS.R to do this:

importSPSS <- function(path, enc=NA) {
  # init foreign package
  require("foreign")
  # import data as data frame
  data.spss <- read.spss(path, to.data.frame=TRUE, use.value.labels=FALSE, reencode=enc)
  # return data frame
  return(data.spss)
}
getValueLabels <- function(dat) {
  a <- lapply(dat, FUN = getValLabels)
  return (a)
}
getValLabels <- function(x){
  rev(names(attr(x, "value.labels")))
}
getVariableLabels <- function(dat) {
  return(attr(dat, "variable.labels"))
}

This small function only gives little benefits regarding the saved typing effort. Referring to the code example under Migration, step 3: Importing (SPSS) variable and value labels, following things will change:

# Use "source" instead of "library"
source("lib/importSPSS.R")
# load data as data frame (function call)
myDat <- importSPSS("SPSS-dataset.sav")
# copy all variable labels in separated list
myDat_vars <- getVariableLabels(myDat)
# copy all value labels as separated list (function call)
myDat_labels <- getValueLabels(myDat)

The benefit especially lies in getting access to value labels. Instead of

hist(myDat[,86], main=myDat_vars[86], labels=rev(attr(myDat_labels[[86]], "names")), breaks=c(0:4), ylim=c(0,400), xlab=NULL, ylab=NULL)

we can now write

hist(myDat[,86], main=myDat_vars[86], labels=myDat_labels[[86]], breaks=c(0:4), ylim=c(0,400), xlab=NULL, ylab=NULL)

so we don’t need to call the attr-function nor remember to reverse the label order for plotting.

About these ads

6 Gedanken zu “Simplify your R workflow with functions #rstats

  1. [...] ← Neue Veröffentlichung: Inanspruchnahme von Unterstützungsleistungen Simplify your R workflow with functions #rstats → [...]

  2. Daniel, it is good to hear that you are switching to R. So, please allow me to make a few suggestions:

    1. You might want to put your frequently-used-functions in your Rprofile.site file. See http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html or http://www.statmethods.net/interface/customizing.html for more information. Or, in the long run, you might want to learn how to write your own package, which, actually, is not that difficult.

    2. The getValueLabel() function does not look very R-like. You almost never have to use loops in R. Instead, you are looking for a vectorized solution (“implicit loops”) (google for “R vectorization”).

    Given that you have loaded the spss dataset via

    … to.data.frame = TRUE, use.value.labels = FALSE…

    you can access the value labels for variable “f111463″ (I am using the Generations and Gender Survey in my example) via

    names(attr(a$f111463, “value.labels”))

    Sure, you might want to write a function to avoid the ugly names(attr(…)) expression.

    3. Converting a list to a data.frame may work in this case…

    # convert list to data frame
    efc <- as.data.frame(data.spss)

    but may fail in many other cases. I suggest that you better use

    read.spss(… to.data.frame = TRUE …)

    • Hello Bernd,
      thank you for your comments! As you’ve noticed, I’m not comprehensively familiar with all R functions right now, that’s why I still often come back to loops instead of possible more elegant and “R-like” solutions. So, it’s still learn-in-progress for me, and I appreciate any comments like yours to improve my R knowledge. :-)
      If you recommend setting the to.data.frame parameter to TRUE, I might consider writing another function which copies the value labels to a list (to avoid the names(attr) expression).

  3. [...] places and reloading libraries and user defined functions.  As an example, recently R bloggers Daniel Liidecke and Andrew Landgraf discussed custom functions that they use frequently .  By placing these in the [...]

  4. [...] ← Simplify your R workflow with functions #rstats [...]

  5. [...] the data from SPSS All following examples are based on an imported SPSS data set. Refer to this posting for more details on how to do that and to my script page to download the scripts. This is important [...]

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ photo

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s