Plotting lm and glm models with ggplot #rstats

Update
I followed the advice from Tim’s comment and changed the scaling in the sjPlotOdds-function to logarithmic scaling. The screenshots below showing the plotted glm’s have been updated.

Summary
In this posting I will show how to plot results from linear and logistic regression models (lm and glm) with ggplot. As in my previous postings on ggplot, the main idea is to have a highly customizable function for representing data. You can download all my scripts from my script page.

The inspiration source
My following two functions are based on an idea which I saw at the Sustainable Research Blog. Actually, this was a kind of starting point for me to get started with R and learn more about its data visualization facilities. After playing around some time with ggplot, I built my own function based on the script posted at Sustainable Research.

Plotting odds ratios
Plotting odds ratios gives you mainly two display styles: bars or plots (dots). First, let me show you the dot-style. Assuming you have a glm-object (in my examples, it’s called logreg) and have loaded the function sjPlotOdds.R (see my script page for downloads), you can plot the results like this (I have used oddsLabels=lab , a vector with label-strings, which are used as axis-labels. If you leave out this parameter, the variable-names from the model will be taken.):

sjp.glm(logreg,
        axisLabels.y=lab,
        gridBreaksAt=0.4)
Odds ratios as dots, with confidence intervals, "positive" effects (> 1) in blue.

Odds ratios as dots, with confidence intervals, “positive” effects (> 1) in blue.

In the above example, if you do not specifiy axis limits, the boundaries will be calculated according to the lowest and highest confidence interval, thus fitting the diagram to the highest possible “zoom”. The next example demonstrates this with bar charts:

sjp.glm(logreg,
        axisLabels.y=lab,
        type="bars",
        gridBreaksAt=0.4)
Odds ratios with confidence intervals, fitting the axes to maximum "zoom", too.

Odds ratios with confidence intervals, fitting the axes to maximum “zoom”, too.

Both diagrams contain model summaries in the lower right corner. You can change many visual parameters, for instance hiding the summary, changing bar colors, changing border or background colors, line and bar size etc.

If you dislike the grid bars to become narrower with increasing odds ratio values, you can use the transformTicks parameter, which uses exponential distances between the tick marks. This results in grid bars with (almost) equal distances. However, the tick values, of course, are accordingly set:

sjp.glm(logreg,
        axisLabels.y=lab,
        transformTicks=TRUE,
        gridBreaksAt=0.2,
        errorBarWidth=0,
        errorBarSize=1)
Odds ratios, grid bars with exponential distance, thicker bars and no error bars at bar ends.

Odds ratios, grid bars with exponential distance, thicker bars and no error bars at bar ends.

 

Plotting betas and standardized betas of linear regressions
Quite similar is my function sjPlotLinreg.R which visualizes the results of linear regressions. Thus, it requires a lm-object.

sjp.lm(linreg,
       axisLimits=c(-0.5, 0.9),
       axisTitle.x="beta (blue) and std. beta (red)",
       sort="std",
       axisLabels.y=lab,
       axisLabelSize=1,
       breakLabelsAt=30)
Linear regression, with beta-values and confidence intervals (in blue) as well as standardized beta values (in red)

Linear regression, with beta-values and confidence intervals (in blue) as well as standardized beta values (in red)

As you can see, I have used predictorLabelSize=1 and breakLabelsAt=30 due to the long variable labels. By default, each label at the left axis would break into more lines, thus being narrower and worse to read. Then I used sort="std" to sort the predictors according to their standardized beta values (default would be ordering according to the beta values).

sjp.lm(linreg,
       axisLabels.y=lab,
       axisLabelSize=1,
       breakLabelsAt=30,
       showStandardBeta=FALSE)
Linear regression, only beta values shown

Linear regression, only beta values shown

The showStandardBeta=FALSE makes the red dots (standardized beta values) and their connecting line disappear.

sjp.lm(linreg,
        axisLabels.y=lab,
        axisLabelSize=1,
        breakLabelsAt=30,
        showValues=FALSE,
        showPValues=FALSE)
Linear regression, beta and standardized beta values are shown, value labels hidden.

Linear regression, beta and standardized beta values are shown, value labels hidden.

This last example shows how to hide the value labels inside the diagram, so you only have the dots for beta and standardized beta coefficients.

Last remark
In between I have also updated my other scripts. For instance, the sjPlotGroupFrequencies.R function can now also plot box plots or violin plots (see examples at the end of that posting). So make sure you have the latest version from my script page.

About these ads

36 Gedanken zu “Plotting lm and glm models with ggplot #rstats

  1. [...] Plotting lm and glm models with ggplot #rstats | Strenge Jacke! [...]

  2. Odds ratios should be plotted on a log scale, not a linear scale as you have used. See the example on the Sustainable Research blog that inspired you – that’s the correct way to do it.

  3. Good looking plots.

  4. Daniel, great job. It worked perfectly for me — I’m using the sjPlotOdds function. But I have a doubt though. Is it possible to insert two or three models in the same graph — to compare the results — using the sjPlotOdds function?

  5. Cool. I’ll try to figure out a way to do it and will keep you posted. And, again, great job.

  6. Hi! Great work! Can you provide your GLM code? I have some troubles with my model. Iam using family = binomial with logit link. Thank you

    • # Datensatz einlesen
      source("lib/sjImportSPSS.R")
      efc <- importSPSS("NWIN-Buch/GER_Services_FU_PV_dt.sav")
      efc_vars <- getVariableLabels(efc)
      efc_labels <- getValueLabels(efc)
      
      # ---
      # Logistische Regression (LogIt-Modell)
      # bei binary response (outcome, dependent) Variablen
      # ---
      # Logistic regression, also called a logit model, is used to model 
      # dichotomous outcome variables. In the logit model the log odds of the 
      # outcome is modeled as a linear combination of the predictor variables.
      #
      # remove missing values for predictors
      df <- na.omit(data.frame(tot_sc_e=efc$tot_sc_e,
                               e42dep=efc$e42dep,
                               neg_c_7=efc$neg_c_7, 
                               c97qol1=efc$c97qol1, 
                               c161sex_0=efc$c161sex_0, 
                               c172high=efc$c172high, 
                               c172mid=efc$c172mid))
      
      logreg <- glm(factor(tot_sc_e) ~ 
                      e42dep + 
                      neg_c_7 + 
                      c97qol1 + 
                      c161sex_0 + 
                      c172high + 
                      c172mid, 
                      family=binomial(link="logit"), data=df, na.action=na.exclude)
      
      # ------------------------------------------------------
      # plot results of log reg
      # ------------------------------------------------------
      lab <- c(paste(efc_vars['e42dep']),
               paste(efc_vars['neg_c_7']),
               paste(efc_vars['c97qol1']),
               paste(efc_vars['c161sex_0']),
               paste(efc_vars['c172high']),
               paste(efc_vars['c172mid']))
      source("lib/sjPlotOdds.R")
      plotModelAssumptions(logreg)
      plotOdds(logreg, oddsLabels=lab, type="bars", gridBreaksAt=0.4)
      plotOdds(logreg, oddsLabels=lab, gridBreaksAt=0.4, errorBarWidth=0, errorBarSize=1)
    • The data set is not yet public, but you can look at the questionnaires we used to collect the data:

      http://www.uke.de/extern/eurofamcare/deli.php

      http://www.uke.de/extern/eurofamcare/documents/deliverables/cat_uk.pdf

      You can easily find the related questions because the variable names follow the same numbering / order.

  7. Owusu Michael

    Hi SJ,
    Just saw your code and has been trying to use it on my data. The x-axis limits is however giving me problems. If I use the format : axisLimits=c(0.8, 10.0),
    gridBreaksAt=0.2). The x- axis seem not to be equally spaced and shrinks towards the tail end of the x-axis scale. Your graph however looked good and equally spaced.
    My odd ratios are 1.15, 2.37, 3.7 and 6.54.
    I loaded your code and used plotted the graph like this:
    plotOdds(glm1,
    oddsLabels=NULL,
    axisLimits=c(0.8, 10.0),
    gridBreaksAt=0.2)
    Kindly advice me on what to do.

  8. I’ve tried the script and it seems like not graph the reference (intercept) factor/level. Is it correct?

    • Yes, the intercept is not plotted in both functions. However, in the model summary of the sjPlotLinreg, you find the intercept value as “y=…”. If this is not the correct notation for the intercept, please let me know and I’ll fix that.

      • I think the graph should have all factors/categories, like in “effects” library.

      • In my experience, the intercept value may deviate a lot from the beta coefficient values, which means you may have a large gap on the x axis between intercept and beta values (for instance, intercept of, let’s say -5, betas ranging from 2 to 3). To avoid this, I decided to not plot the intercept but provide this value in the plot annotation. Perhaps I should make it optional to include the intercept in the graph.

      • Daniel, it would be great to have an option to plot reference factor.

    • I’ve update the script and it’s now possible to plot the intercept for the glm (sjPlotOdds). It’s not implemented for the lm-plotting-function yet, but will come soon.

  9. Owusu Michael

    I think your code is good especially relating to studies in epidemiology. I just used the odd ratio plots in a presentation today. Great!! keep it up.

    • Thank you for your comment! Good to know my scripts are useful to someone. :-)

      I’ve just updated some of my scripts and will upload them on my page in the course of this week, hopefully. Some bug fixes for the frequency-plotting-functions, and standardization of parameter names for all functions.

  10. [...] linear model: sjPlotLinreg.R Plotting (generalized) linear models have also already been described in a posting, so I will keep it short here and just give a running [...]

  11. Perhaps I’m being dense, but I’m unable to get this script to work.

    I try

    
    source('http://www.strengejacke.de/R-Stuff/sjPlotOdds_0_7.R')
    plotModelAssumptions.glm(myModel.glm)

    But I get a few errors. The function “halfnorm” is missing, which does not appear to come from a CRAN listed package. I am also not getting the package “faraway”. Finally, there doesn’t appear to be a

    plotOdds

    function to call.

    I’m probably missing a simple step or there is something slightly different about my environment that wasn’t accounted for…

  12. Sorry, I should have updated my blog posts. The function name has changed, so the function call is now: sip.glm(…). See this posting for more examples.

    To plot the model assumptions, you need following two packages:
    car, faraway

    Best wishes
    Daniel

  13. […] to ggplot2 – How to customize ggplot2 graphics – Easily plotting grouped bars with ggplot – Plotting lm and glm models with ggplot – Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent (se også denne) – […]

  14. Thanks for this great script, it is very helpful. I was wondering if there is a way to avoid reordering the variables by the size of the coefficients. I want to compare the same predictors for different outcome variables and so it would be helpful if they were presented in the same order. I have been playing around with the script but unfortunately it has been unsuccessful. Thanks for yours or anyones help!

    • Hi Rich,
      I’m planning to enhance the script so you can plot several fitted models in one plot, each outcome variable (or each model) represented by a certain color. This may take some time, so I probably will first supply an update that has a “sort” parameter. The sjPlotLinreg-function already has such a parameter, but only offers ordering according to beta or standardized beta values, so I would update this function as well.

    • Ok, update is online, sorting should work now. Use “sortOdds”. Please refere to my script page for the updated script and make sure to read the change log first!

  15. […] asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a […]

  16. […] asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a […]

  17. […] Plotting lm and glm models with ggplot #rstats […]

  18. Odds-ratio Grapher

    I really like the sjp.glm plots. Is there any chance this can be compatible with the survey library?

    I get thrown the following errors when attempting to use:

    Error in seq.default(dots[[1L]][[1L]], dots[[2L]][[1L]], length = dots[[3L]][[1L]]) :
    ‘from’ cannot be NA, NaN or infinite
    In addition: Warning messages:
    1: In sjp.glm(logit1, axisLabels.y = unname(var.names), breakLabelsAt = 30) :
    Exp. coefficients and/or exp. confidence intervals may be out of printable bounds. Consider using “axisLimits” parameter!
    2: In logLik.svyglm(fit) : svyglm not fitted by maximum likelihood.
    3: In logLik.svyglm(fit) : svyglm not fitted by maximum likelihood.

    • In some cases, the exponentiated Beta-coeffients have such high values (due to exponentiation) that ggplot can’t cope with it. Perhaps it’s something ggplot can actually handle, if so, however, I haven’t found out how…
      Does the survey-package in general not work with sjPlot? Or just this specific regression results?

      • exponentiated Beta-coeffients

        Seems like it does have something to do with the exponentiated Beta-coeffients. Some of the other results work with the sjPlot package. Thanks for following up.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ photo

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s