Matt Gregory bio photo

Matt Gregory

Data Scientist

Twitter Github

I’ve been developing a package where I needed a function to take numerous different actions (different mutations) depending on the values of different variables within each row of a dataframe. I started off by using a series of nested dplyr::if_else functions inside of a dplyr::mutate call. I ended up with a bit of a mess, perhaps a dozen or so if_else calls… that’s when I got some abuse from my colleague following a Github pull request.

Vector example

library(tidyverse)
 
x <- 1:50  #  a numeric vector from one to fifty
  
#  divisible by 35 with no remainder
 
if_else(
  x %% 35 == 0, "fizz buzz", 
  if_else(x %% 5 == 0, "fizz",
          if_else(x %% 7 == 0, "buzz",
                  "flat"
          )
  )
) %>%
  #  give the vector of character strings a nice name
  table(dnn = "fizzybuzzyness") %>%  
  #  give the frequency a nice name
  as_tibble(x = ., n = "how_many") %>%  
  #  the . means "the output piped from the previous step"
  ggplot(., aes(fizzybuzzyness, how_many)) +  
  #  strings are sorted "alphabetically"
  geom_bar(stat = "identity") +  
  govstyle::theme_gov() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk 2017-02-23-barplot-fizz_if_else

There must be a better way? A quick Google got me here. A neat idea, but I noticed the suggestion of the new-ish dplyr::case_when function in the comments section of this excellent blog post. I’ve taken up the baton, by exploring this function in this post.

The example from the ?case_when is quite informative; I give a visual interpretation (see the code comments for details). However, we should explore it further and apply it to a dataframe to expand our understanding.

# ?case_when
 
x <- 1:50  #  a numeric vector from one to fifty
 
case_when(
  #  if divisible by 35 make "fizz buzz" else
  x %% 35 == 0 ~ "fizz buzz",  
  #  if divisible by 5 make "fizz", unless already "fizz buzz" else
  x %% 5 == 0 ~ "fizz",  
  #  if divisible by 7 make "buzz" unless already "fizz buzz" or "fizz"
  x %% 7 == 0 ~ "buzz",  
  #  anything else convert into flat
  TRUE ~ "flat"  
) %>%
   #  give the vector of character strings a nice name
  table(dnn = "fizzybuzzyness") %>% 
  #  give the frequency a nice name, default is n
  as_tibble(x = ., n = "how_many") %>%  
  #  the . means "the output piped from the previous step"
  ggplot(., aes(fizzybuzzyness, how_many)) +  
  geom_bar(stat = "identity") +  
  govstyle::theme_gov() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk 2017-02-23-barplot-fizz_case_when

The same outcome but which did you find the more readable? (Imagine you were quality assuring this code?)

Dataframes example

As pointed out in the case_when help examples, ordering is important where you want to go from most specific to least specific. In the example below we wanted the Mazda RX4 Wag to be labbelled as a Mazda Wagon in the newly created brand variable. This failed due to our ordering; to suceed we should move this before the left hand side (LHS) first argument. Notice how the right hand side of the ~ provides the replacement value. Try replacing the "Wow" with a numeric 50, what happens when you run the code?

mtcars %>%
  #  convert row names to an explicit column
  tibble::rownames_to_column("thecar") %>%  
  
  
  
  #  You should start with the most specific, as ordering is important, and you
  #  can also use logical tests e.g. > <
                           
  mutate(
    brand = case_when(
      .$thecar == "Mazda RX4" | .$thecar == "Mazda RX4 Wag"  ~ "Mazda",
      .$thecar == "Mazda RX4 Wag" ~ "Mazda Wagon", 
      .$thecar == "Maserati Bora" & .$hp > 300 ~ "Wow!", 
      TRUE ~ "Not Mazda"
      )  
  ) %>%
  select(brand) %>%
  table()
## .
##     Mazda Not Mazda      Wow! 
##         2        29         1

case_when is still somewhat new and experimental. For now I may stick with nested if_else statements until this is more stable and works well within mutate despite case_when being a bit easier to read. If you play around with this demo code it’s quite easy to break, this may be in part to some useful non-standard evaluation intrinsic to mutate. For example replacing the & with && causes it to error. Try it with your own data and keep your eyes peeled for further developments! In Hadley we trust.

Conclusion

A relatively new offering from the tidyverse on making nested if_else statements more readible.

devtools::session_info()
##  setting  value                       
##  version  R version 3.3.2 (2016-10-31)
##  system   x86_64, linux-gnu           
##  ui       RStudio (1.0.136)           
##  language en_GB:en                    
##  collate  en_GB.UTF-8                 
##  tz       GB                          
##  date     2017-02-24                  
## 
##  package      * version  date      
##  AlgDesign      1.1-7.3  2014-10-15
##  assertthat     0.1      2013-12-06
##  broom          0.4.2    2017-02-13
##  car            2.1-4    2016-12-02
##  caret        * 6.0-73   2016-11-10
##  checkpoint     0.3.18   2016-10-31
##  coda           0.18-1   2015-10-16
##  codetools      0.2-15   2016-10-05
##  colorspace     1.3-1    2016-11-18
##  data.table     1.9.8    2016-11-25
##  DBI            0.5-1    2016-09-10
##  devtools       1.12.0   2016-06-24
##  digest         0.6.11   2017-01-03
##  dplyr        * 0.5.0    2016-06-24
##  emoa           0.5-0    2012-09-25
##  estimability   1.2      2016-11-19
##  evaluate       0.10     2016-10-11
##  forcats        0.2.0    2017-01-23
##  foreach        1.4.3    2015-10-13
##  foreign        0.8-67   2016-09-13
##  GGally       * 1.3.0    2016-11-13
##  ggplot2      * 2.2.1    2016-12-30
##  ggthemes     * 3.3.0    2016-11-24
##  govstyle     * 0.1.2    2017-01-22
##  gtable         0.2.0    2016-02-26
##  haven          1.0.0    2016-09-23
##  highr          0.6      2016-05-09
##  hms            0.3      2016-11-22
##  httr           1.2.1    2016-07-03
##  iterators      1.0.8    2015-10-13
##  jsonlite       1.2      2016-12-31
##  knitr          1.15.1   2016-11-22
##  labeling       0.3      2014-08-23
##  lattice      * 0.20-34  2016-09-06
##  lazyeval       0.2.0    2016-06-12
##  lme4           1.1-12   2016-04-16
##  lsmeans        2.25     2016-11-19
##  lubridate      1.6.0    2016-09-13
##  magrittr     * 1.5      2014-11-22
##  MASS           7.3-45   2015-11-10
##  Matrix         1.2-8    2017-01-20
##  MatrixModels   0.4-1    2015-08-22
##  mco            1.0-15.1 2014-11-29
##  memoise        1.0.0    2016-01-29
##  mgcv           1.8-16   2016-11-07
##  minqa          1.2.4    2014-10-09
##  mnormt         1.5-5    2016-10-15
##  ModelMetrics   1.1.0    2016-08-26
##  modelr         0.1.0    2016-08-31
##  multcomp       1.4-6    2016-07-14
##  munsell        0.4.3    2016-02-13
##  mvtnorm        1.0-5    2016-02-02
##  nlme           3.1-131  2017-02-06
##  nloptr         1.0.4    2014-08-04
##  nnet           7.3-12   2016-02-02
##  pbkrtest       0.4-6    2016-01-27
##  plyr           1.8.4    2016-06-08
##  psych          1.6.12   2017-01-08
##  purrr        * 0.2.2    2016-06-18
##  quantreg       5.29     2016-09-04
##  R6             2.2.0    2016-10-05
##  randomForest   4.6-12   2015-10-07
##  RColorBrewer   1.1-2    2014-12-07
##  Rcpp           0.12.9   2017-01-14
##  readr        * 1.0.0    2016-08-03
##  readxl         0.1.1    2016-03-28
##  reshape        0.8.6    2016-10-21
##  reshape2       1.4.2    2016-10-22
##  rgp          * 0.4-1    2014-08-08
##  rmd2md       * 0.1.1    2017-01-28
##  rpart          4.1-10   2015-06-29
##  rsm            2.8      2016-10-16
##  rvest          0.3.2    2016-06-17
##  sandwich       2.3-4    2015-09-24
##  scales         0.4.1    2016-11-09
##  SparseM        1.74     2016-11-10
##  SPOT         * 1.1.0    2016-06-09
##  stringi        1.1.2    2016-10-01
##  stringr        1.2.0    2017-02-18
##  survival       2.39-4   2016-05-11
##  TH.data        1.0-7    2016-01-28
##  tibble       * 1.2      2016-08-26
##  tidyr        * 0.6.1    2017-01-10
##  tidyverse    * 1.1.1    2017-01-27
##  withr          1.0.2    2016-06-20
##  xgboost      * 0.6-4    2017-01-05
##  xml2           1.1.1    2017-01-24
##  xtable         1.8-2    2016-02-05
##  zoo            1.7-13   2016-05-03
##  source                                    
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  cran (@0.4.2)                             
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.1)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  cran (@0.6.11)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  cran (@0.2.0)                             
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.1)                            
##  CRAN (R 3.3.2)                            
##  cran (@2.2.1)                             
##  CRAN (R 3.3.2)                            
##  Github (ukgovdatascience/govstyle@8cd6098)
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.3)                            
##  cran (@1.2)                               
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.1)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.5)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.5)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  cran (@1.6.12)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  cran (@0.12.9)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  cran (@1.4.2)                             
##  CRAN (R 3.3.2)                            
##  Github (ivyleavedtoadflax/rmd2md@3434815) 
##  CRAN (R 3.2.1)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  cran (@1.2.0)                             
##  CRAN (R 3.3.1)                            
##  CRAN (R 3.3.2)                            
##  CRAN (R 3.2.3)                            
##  cran (@0.6.1)                             
##  cran (@1.1.1)                             
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)                            
##  cran (@1.1.1)                             
##  CRAN (R 3.2.3)                            
##  CRAN (R 3.3.2)