map_df()

I’m using Hadley’s purrr package more and more, and its beginning to change the way I program in R, much like dplyr did.

map() is a great function and one of its incarnations that I really like is map_df().This will apply a function to elements of a list, and then bind the dataframes together (assuming they can be combined). It also allows us to specify additional columns for our final dataframe which takes the names of the elements of the list.

Here’s a simple example:

library(dplyr)
library(purrr)
library(ggplot2)
 
# See ?accumulate for my inspiration for this example
 
rerun(6, rnorm(100)) %>%
  map_df(
    ~ data_frame(x = .x), 
    .id = "dist"
  ) %>%
  ggplot +
  aes(
    x = x,
    fill = dist
  ) +
  geom_histogram() +
  facet_wrap(
    ~dist,
    ncol = 2 
  )

plot of chunk 2016-05-30-hist-facet

So what’s going on here?

First I create a list of six univariate normal distributions using rnorm().
Passing this to map_df() with the function (.f) argument as data_frame(x = .x) will convert each of these vectors of variables into a dataframe, naming the column of variables as x.
map_df() essentially does a bind_rows() and outputs a single dataframe, adding a new variable dist which takes the names of the elements of the list, outputting a long dataframe.
Finally this is passed to ggplot() which creates histograms with geom_histogram(), and facets them into six panes with facet_wrap().

map_df

## function (.x, .f, ..., .id = NULL) 
## {
##     .f <- as_function(.f, ...)
##     res <- map(.x, .f, ...)
##     dplyr::bind_rows(res, .id = .id)
## }
## <environment: namespace:purrr>

It’s a very simply function, but nonetheless very useful. This is a fairly contrived example, but I find myself using these map() functions a lot recently - especially when training models, and working with lists or dataframes full of dataframes.

sessionInfo()

## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] ggplot2_2.1.0  purrr_0.2.1    dplyr_0.4.3    testthat_0.8.1
## [5] knitr_1.13    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5      digest_0.6.4     assertthat_0.1   plyr_1.8.3      
##  [5] grid_3.3.0       R6_2.1.2         gtable_0.2.0     DBI_0.4-1       
##  [9] formatR_1.4      magrittr_1.5     scales_0.4.0     evaluate_0.9    
## [13] lazyeval_0.1.10  labeling_0.3     tools_3.3.0      stringr_0.6.2   
## [17] munsell_0.4.3    parallel_3.3.0   colorspace_1.2-6 methods_3.3.0

Matt Upson

map_df()

So what’s going on here?

You might also enjoy (View all posts)