I’m using Hadley’s purrr package more and more, and its beginning to change the way I program in R, much like dplyr did.
map()
is a great function and one of its incarnations that I really like is map_df()
.This will apply a function to elements of a list, and then bind the dataframes together (assuming they can be combined).
It also allows us to specify additional columns for our final dataframe which takes the names of the elements of the list.
Here’s a simple example:
library(dplyr)
library(purrr)
library(ggplot2)
# See ?accumulate for my inspiration for this example
rerun(6, rnorm(100)) %>%
map_df(
~ data_frame(x = .x),
.id = "dist"
) %>%
ggplot +
aes(
x = x,
fill = dist
) +
geom_histogram() +
facet_wrap(
~dist,
ncol = 2
)
So what’s going on here?
- First I create a list of six univariate normal distributions using
rnorm()
. - Passing this to
map_df()
with the function (.f
) argument asdata_frame(x = .x)
will convert each of these vectors of variables into a dataframe, naming the column of variables asx
. map_df()
essentially does abind_rows()
and outputs a single dataframe, adding a new variabledist
which takes the names of the elements of the list, outputting a long dataframe.- Finally this is passed to
ggplot()
which creates histograms withgeom_histogram()
, and facets them into six panes withfacet_wrap()
.
map_df
## function (.x, .f, ..., .id = NULL)
## {
## .f <- as_function(.f, ...)
## res <- map(.x, .f, ...)
## dplyr::bind_rows(res, .id = .id)
## }
## <environment: namespace:purrr>
It’s a very simply function, but nonetheless very useful.
This is a fairly contrived example, but I find myself using these map()
functions a lot recently - especially when training models, and working with lists or dataframes full of dataframes.
sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets base
##
## other attached packages:
## [1] ggplot2_2.1.0 purrr_0.2.1 dplyr_0.4.3 testthat_0.8.1
## [5] knitr_1.13
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.5 digest_0.6.4 assertthat_0.1 plyr_1.8.3
## [5] grid_3.3.0 R6_2.1.2 gtable_0.2.0 DBI_0.4-1
## [9] formatR_1.4 magrittr_1.5 scales_0.4.0 evaluate_0.9
## [13] lazyeval_0.1.10 labeling_0.3 tools_3.3.0 stringr_0.6.2
## [17] munsell_0.4.3 parallel_3.3.0 colorspace_1.2-6 methods_3.3.0