Organizational Behaviour im Kooperationsnetzwerk

eine systemtheoretisch-qualitative Analyse von Kooperationsnetzwerken unter den Bedingungen polykontexturaler Verhältnisse

Das ist das Thema meines Beitrags zum frisch bewilligten DFG-Netzwerkantrag Organizational Behaviour in health care institutions in Germany – theoretical approaches, methods and empirical results im Rahmen der DGMS-AG-Versorgungsforschung.

Weiter zum Abstract…

sjPlot 1.3 available #rstats #sjPlot

I just submitted my package update (version 1.3) to CRAN. The download is already available (currently source, binaries follow). While the last two updates included new functions for table outputs (see here and here for details on these functions), the current update mostly provides small helper functions. The focus of this update was to improve existing functions and make their handling easier and more comfortable.

Automatic label detection

One major feature is that many functions now automatically detect variables and value labels, if possible. For instance, if you have imported an SPSS dataset (e.g. with the function sji.SPSS), value labels are automatically attached to all variables of the data frame. With the autoAttachVarLabels parameter set to TRUE, even variable labels will be attached to the data frame after importing the SPSS data. These labels are automatically detected by most functions of the package now. But this does not only apply to importet SPSS-data. If you have factors with specified factor levels, these will also automatically be used as value labels. Furthermore, you can manually attach value and variable labels using the new function sji.setVariableLabels and sji.setValueLabels.

But what are the exactly the benefits of this new feature? Let me give an example. To plot a proportional table with axis and legend labels, prior to sjPlot 1.3 you needed following code:

data(efc)
efc.val <- sji.getValueLabels(efc)
efc.var <- sji.getVariableLabels(efc)
sjp.xtab(efc$e16sex,
         efc$e42dep,
         axisLabels.x=efc.val[['e16sex']],
         legendTitle=efc.var['e42dep'],
         legendLabels=efc.val[['e42dep']])

Since version 1.3, you only need to write:

data(efc)
sjp.xtab(efc$e16sex, efc$e42dep)

Reliability check for index scores

One new table output function included in this update is sjt.itemanalysis, which helps performing an item analysis on a scale or data frame if you want to develop index scores.

Let’s say you have several items and you want to compute a principal component analysis in order to identify different components that can be composed to an index score. In such cases, you might want to perform reliability and item discrimination tests. This is shown in the following example, which performs a PCA on the COPE-Index-scale, followed by a reliability and item analysis of each extracted “score”:

data(efc)
# recveive first item of COPE-index scale
start <- which(colnames(efc)=="c82cop1")
# recveive last item of COPE-index scale
end <- which(colnames(efc)=="c90cop9")
# create data frame of cope-index-items
df <- as.data.frame(efc[,c(start:end)])
colnames(df) <- sji.getVariableLabels(efc)[c(start:end)]
# compute PCA on cope index and return
# "group classifications" of factors
factor.groups <- sjt.pca(df, no.output=TRUE)$factor.index
# perform item analysis
sjt.itemanalysis(df, factor.groups)

The result is following table, where two components have been extracted via the PCA, and the variables belonging each component are treated as one “index score” (note that you don’t need to define groups, you can also treat a data frame as one single “index”):
relia

The output of the computed PCA was suppressed by no.output=TRUE. To better understand the above figure, take a look at the PCA results, where two components have been extracted:
pca_item_reli

Beside that, many functions – especially the table output functions – got new parameters to change the appearance of the output (amount of digits, including NA’s, additional information in tables etc.). Refer to the package news to get a complete overview of what was changed since the last version.

The latest developer build can be found on github.

Developer snapshots of #sjPlot-package now on #Github #rstats

Finally, I managed to setup a GitHub repository. From now on, the latest developer snapshot of my sjPlot-package will be published right here: https://github.com/sjPlot/devel.

Please post issues there, download the latest developer build for testing purposes or help developing the wiki-page with examples for package usage etc.

Btw, if somebody knows, why I can’t get GitHub running with RStudio, let me know… I always get this issue, which was already reported by other users. Currently, I’m using the GitHub.app to commit changes.

Beautiful table outputs in R, part 2 #rstats #sjPlot

First of all, I’d like to thank my readers for the lots of feedback on my last post on beautiful outputs in R. I tried to consider all suggestions, updated the existing table-output-functions and added some new ones, which will be described in this post. The updated package is already available on CRAN.

This posting is divided in two major parts:

  1. the new functions are described, and
  2. the new features of all table-output-functions are introduced (including knitr-integration and office-import)

Read on …

Simply creating various scatter plots with ggplot #rstats

Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots.

In my package, there’s a function called sjp.scatter for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set:

data(efc)

The simplest function call is by just providing two variables, one for the x- and one for the y-axis:

sjp.scatter(efc$c160age, efc$e17age)

which plots following graph:
sct_01

If you have continuous variables with a larger scale, you shouldn’t have problems with overplotting or overlaying dots. However, this problem usually occurs, if you have variables with just a few categories (factor levels). The function automatically estimates the amount of overlaying dots and then automatically jitters them, like in following example, which also includes a marginal rug-plot:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code, showRug=TRUE)

sct_02

The same plot, when auto-jittering is turned off, would look like this:

sjp.scatter(efc$e16sex,efc$neg_c_7, efc$c172code,
            showRug=TRUE, autojitter=FALSE)

sct_03

You can also add a grouping variable. The scatter plot is then “divided” into as many groups as indicated by the grouping variable. In the next example, two variables (elder’s and carer’s age) are grouped by different dependency levels of the elderly. Additionally, a fitted line for each group is plotted:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE)

sct_04

If the groups are difficult to distinguish in a single plot area, the graph can be faceted by groups. This is shown in the last example, where the same scatter plot as above is plotted with facets for each group:

sjp.scatter(efc$c160age,efc$e17age, efc$e42dep, title="Scatter Plot",
            legendTitle=sji.getVariableLabels(efc)['e42dep'],
            legendLabels=sji.getValueLabels(efc)[['e42dep']],
            axisTitle.x=sji.getVariableLabels(efc)['c160age'],
            axisTitle.y=sji.getVariableLabels(efc)['e17age'],
            showGroupFitLine=TRUE, useFacetGrid=TRUE, showSE=TRUE)

sct_05

Find a complete overview of the various function options in the package-help or at inside-r.

No need for SPSS – beautiful output in R #rstats

Note: There’s a second part of this series here.

About one year ago, I seriously started migrating from SPSS to R. Though I’m still using SPSS (because I have to in some situations), I’m quite comfortable and happy with R now and learnt a lot in the past months. But since SPSS is still very wide spread in social sciences, I get asked every now and then, whether I really needed to learn R, because SPSS meets all my needs…

Well, learning R had at least two major benefits for me: 1.) I could improve my statistical knowledge a lot, simply by using formulas, asking why certain R commands do not automatically give the same results like SPSS, reading R resources and papers etc. and 2.) the possibilities of data visualization are way better in R than in SPSS (though SPSS can do well as well…). Of course, there are even many more reasons to use R.

Still, one thing I often miss in R is a beautiful output of simple statistics or maybe even advanced statistics. Not always as plot or graph, but neither as “cryptic” console output. I’d like to have a simple table view, just like the SPSS output window (though the SPSS output is not “beautiful”). That’s why I started writing functions that put the results of certain statistics in HTML tables. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane (like in RStudio viewer pane).

All of the following functions are available in my sjPlot-package on CRAN.

Read on …

Comparing multiple (g)lm in one graph #rstats

It’s been a while since a user of my plotting-functions asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a function that plots several (g)lm-objects in a single ggplot-graph.

The following examples are take from my sjPlot package which is available on CRAN. Once you’ve installed the package, you can run one of the examples provided in the function’s documentation:

# prepare dummy variables for binary logistic regression
y1 <- ifelse(swiss$Fertility<median(swiss$Fertility), 0, 1)
y2 <- ifelse(swiss$Infant.Mortality<median(swiss$Infant.Mortality), 0, 1)
y3 <- ifelse(swiss$Agriculture<median(swiss$Agriculture), 0, 1)

# Now fit the models. Note that all models share the same predictors
# and only differ in their dependent variable (y1, y2 and y3)
fitOR1 <- glm(y1 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR2 <- glm(y2 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))
fitOR3 <- glm(y3 ~ swiss$Education+swiss$Examination+swiss$Catholic,
              family=binomial(link="logit"))

# plot multiple models
sjp.glmm(fitOR1, fitOR2, fitOR3)

multiodds1

Thanks to the help of a stackoverflow user, I now know that the order of aes-parameters matters in case you have dodged positioning of geoms on a discrete scale. An example: I use following code in my function ggplot(finalodds, aes(y=OR, x=xpos, colour=grp, alpha=pa)) to apply different colours to each model and setting an alpha-level for geoms depending on the p-level. If the alpha-aes would appear before the colour-aes, the order of lines representing a model may be different for different x-values (see stackoverflow example).

Another more appealing example (not reproducable, since it relies on data from a current research project):
multiodds2

And finally an example where p-levels are represented by different shapes and non-significant odds have a lower alpha-level:
multiodds3

Arbeiten mit Zettelkästen – das Netzkartenprinzip

Kürzlich erhielt ich eine E-Mail mit Feedback zu meinem Zettelkasten, in der eine – wie ich finde – ganz interessante Arbeitsweise mit dem Zettelkasten beschrieben wurde. Mit Erlaubnis des “Urhebers”, der ungenannt bleiben möchte, möchte ich diese Vorgehensweise hier zeigen, ist sie doch eine gute Ergänzung zu anderen hier bereits beschriebenen Methoden des Umgangs mit dem Zettelkasten (z.B. hier und hier).

Im Folgenden also der Auszug aus der E-Mail inklusive veranschaulichender Screenhots.

Das Netzkartenprinzip

Die Vorteile deines Programms liegen für mich in der Möglichkeit der direkten, freien Vernetzung (nicht von den Zetteln als ganzen, sondern von Begriffen und Ideen in diesen Zetteln). Ich habe eine Indexkarte (Abb. 1) mit Begriffen:

Abb 1

die auf Netzkarten verweisen, hier zB Formalisierung:

Abb. 2

die dann wiederum auf die einzelnen Karten verweisen:

Abb. 3

Jeder der Links auf dieser Karte verweist auf eine Netzkarte. Die Anzahl der Netzkarten ist nicht begrenzt. Wenn ich meine, dass ich zu einem Begriff eine brauche, dann lege ich sie an und mache einen Verweis auf die Indexkarte.

Auf der rechten Seite habe ich, wie auf den Screenshots zu sehen ist, fast immer die Überschriftenansicht eingestellt, die mir die Netzkarten (oder Netzzettel) anzeigt. Das sind praktisch meine individuellen Hubs. Die einzelnen Zettel verweisen manchmal direkt aufeinander, aber jedenfalls immer auf mindestens eine Netzkarte (und die Netzkarte verweist mit einer entsprechenden Kurzüberschrift natürlich auf die Karte zurück.) Jeder Link hat also einen Namen. Das ist für mich ein entscheidendes Prinzip.

Nur soviel, dass man einen Eindruck bekommst wie ich den Zettelkasten nutze. Ich dachte, es könnte auch für dich (und andere) interessant sein zu sehen, welche Nutzungsmöglichkeiten dein Zettelkasten noch bietet.

sjPlot 0.9 (data visualization package) now on CRAN #rstats

Since version 0.8, my package for data visualization using ggplot has been released on the Comprehensive R Archive Network (CRAN), which means you can simply install the package with install.packages("sjPlot").

Last week, version 0.9 was released. Binaries are already available for OS X and Windows, and source code for Linux. Further updates will no longer be announced on this blog (except for new functions which may be described in dedicated blog postings), so please use the update function in order make sure you are using the latest package version.

2013 wird geprüft

Die WordPress.com-Statistik-Elfen fertigten einen Jahresbericht dieses Blogs für das Jahr 2013 an.

Hier ist ein Auszug:

Die Konzerthalle im Sydney Opernhaus fasst 2.700 Personen. Dieses Blog wurde in 2013 etwa 46.000 mal besucht. Wenn es ein Konzert im Sydney Opernhaus wäre, würde es etwa 17 ausverkaufte Aufführungen benötigen um so viele Besucher zu haben, wie dieses Blog.

Klicke hier um den vollständigen Bericht zu sehen.