Descriptive summary: Proportions of values in a vector #rstats

When describing a sample, researchers in my field often show proportions of specific characteristics as description. For instance, proportion of female persons, proportion of persons with higher or lower income etc. Since it happens often that I like to know these characteristics when exploring data, I decided to write a function, prop(), which is part of my sjstats-package – a package dedicated to summary-functions, mostly for fit- or association-measures of regression models or descriptive statistics.

prop() is designed following a similar fashion like most functions of my sjmisc-package: first, the data; then an user-defined number of logical comparisons that define the proportions. A single comparison argument as input returns a vector, multiple comparisons return a tibble (where the first column contains the comparison, and the second the related proportion).

An examle from the mtcars dataset:

library(sjstats)
data(mtcars)
# proportions of observations in mpg that are greater than 25
prop(mtcars, mpg > 25)
#> [1] 0.1875

prop(mtcars, mpg > 25, disp > 200, gear == 4)
#> # A tibble: 3 × 2
#>   condition   prop
#>       <chr>  <dbl>
#> 1    mpg>25 0.1875
#> 2  disp>200 0.5000
#> 3   gear==4 0.3750

The function also works on grouped data frames, and with labelled data. In the following example, we group a dataset on family carers by their gender and education, and then get the proportions of observations where care-receivers are at least moderately dependent and male persons. To get an impression of how the raw variables look like, we first compute simple frequency tables with frq().

library(sjmisc) # for frq()-function
data(efc)
frq(efc, e42dep)
#> # elder's dependency
#> 
#>  val                label frq raw.prc valid.prc cum.prc
#>    1          independent  66    7.27      7.33    7.33
#>    2   slightly dependent 225   24.78     24.97   32.30
#>    3 moderately dependent 306   33.70     33.96   66.26
#>    4   severely dependent 304   33.48     33.74  100.00
#>    5                   NA   7    0.77        NA      NA

frq(efc, e16sex)
#> # elder's gender
#> 
#>  val  label frq raw.prc valid.prc cum.prc
#>    1   male 296   32.60     32.85   32.85
#>    2 female 605   66.63     67.15  100.00
#>    3     NA   7    0.77        NA      NA

efc %>%
  select(e42dep, c161sex, c172code, e16sex) %>%
  group_by(c161sex, c172code) %>%
  prop(e42dep > 2, e16sex == 1)

#> # A tibble: 6 × 4
#>   `carer's gender`    `carer's level of education` `e42dep>2` `e16sex==1`
#>              <chr>                           <chr>      <dbl>       <dbl>
#> 1             Male          low level of education     0.6829      0.3659
#> 2             Male intermediate level of education     0.6590      0.3155
#> 3             Male         high level of education     0.7872      0.2766
#> 4           Female          low level of education     0.7101      0.4638
#> 5           Female intermediate level of education     0.5929      0.2832
#> 6           Female         high level of education     0.6881      0.2752

So, within the group of male family carers with low level of education, 68.29% of care-receivers are moderately or severely dependent, and 36.59% of care-receivers are male. Within female family carers with high level of education, 68.81% of care-receivers are at least moderately dependent and 27.52% are male.

Advertisements

9 Kommentare zu „Descriptive summary: Proportions of values in a vector #rstats

  1. Hey Daniel,
    thanks for this post. The way I did this until now was a bit more tedious, so I am happy to use this.

    One thing I noticed is that it does not seem to work with two or more logical statements, e.g.:
    prop(mtcars, mpg >25 & mpg < 30)

    Is there a way to do/implement this?

    Thanks in advance.

    1. Two or more logical statements are a bit tricky, at least for the parsing-method I currently use. Maybe there’s an easy solution for this, but I’ll keep this issue in mind. Probably it’s fairly easy to implement.

  2. Hi Daniel,

    Pretty good work with your packages I discovered previous week!

    This issue on conditions seems very annoying, in particular as currently it can return wrong values.

    I tried following which should work but removes weighting possibility (not tested on grouped dataframes)

    Also, I find that it would be more logical to have „prop“ function always return a same structure tibble: for grouped data.frames I would then keep „condition“ and „prop“ columns (wide format instead of long) : up to the user to cast it.

    get_proportion <- function (x, data, weight.by, na.rm=TRUE, digits=5)
    {
    # no weight.by
    x <- gsub(" ", "", deparse(x), fixed = T)
    x <- gsub("\"", "", x, fixed = TRUE)
    dummy <- with(data,eval(parse(text=paste("as.numeric(",x,")"))))
    if (na.rm)
    dummy prop(mtcars,mpg==21 , qsec>17,mpg==21&qsec>17)
    # A tibble: 3 × 2
    condition prop

    1 mpg==21 0.0625
    2 qsec>17 0.7188
    3 mpg==21&qsec>17 0.0312

    1. Humm.. sorry some truncations did happen:

      get_proportion <- function (x, data, weight.by, na.rm=TRUE, digits=5)
      {
      # no weight.by
      x <- gsub(" ", "", deparse(x), fixed = T)
      x <- gsub("\"", "", x, fixed = TRUE)
      dummy <- with(data,eval(parse(text=paste("as.numeric(",x,")"))))
      if (na.rm)
      dummy prop(mtcars,mpg==21 , qsec>17,mpg==21&qsec>17)
      # A tibble: 3 × 2
      condition prop

      1 mpg==21 0.0625
      2 qsec>17 0.7188
      3 mpg==21&qsec>17 0.0312

  3. get_proportion <- function (x, data, weight.by, na.rm=TRUE, digits=5)
    {
    # no weight.by
    x <- gsub(" ", "", deparse(x), fixed = T)
    x <- gsub("\"", "", x, fixed = TRUE)
    dummy <- with(data,eval(parse(text=paste("as.numeric(",x,")"))))
    if (na.rm) na.omit(dummy)
    round(sum(dummy, na.rm = T)/length(dummy), digits = digits)
    }

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s