Introduction

The respecatlbes package provides a framework to

  • create recipes define a way to derive interdependent variables
  • variables are created using generating functions

For this vignette we load the respecatbles and dplyr package:

library(respectables)
## Loading required package: tibble
## Registered S3 methods overwritten by 'tibble':
##   method     from  
##   format.tbl pillar
##   print.tbl  pillar
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Note the respectables package is still under development.

Simple Dataset

Lets start defining a simple dataset dm with a single variable id.

gen_id <- function(n) {
  paste0("id-", 1:n)
}

dm_recipe <- tribble(
  ~variables, ~dependencies,  ~func,   ~func_args,
  "id",       no_deps,        gen_id,  no_args
)

gen_table_data(N = 2, recipe = dm_recipe)
##     id
## 1 id-1
## 2 id-2

Note that the argument n is defined by respectables, in this case it is equal N.

We can use the recepie dm_recepie again to create a different dataset:

gen_table_data(N = 5, recipe = dm_recipe)
##     id
## 1 id-1
## 2 id-2
## 3 id-3
## 4 id-4
## 5 id-5

Adding Multiple Variables

We will now specify the variables height and weight to the dm recipe:

gen_hw <- function(n) {
  bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))

  data.frame(height = runif(n, min = 1.5, 1.95)) %>%
    mutate(weight = bmi * height^2)
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,  no_args,
   c("height", "weight"),   no_deps,        gen_hw,  no_args
)

gen_table_data(N = 2, recipe = dm_recipe)
##     id   height   weight
## 1 id-1 1.776615 59.15417
## 2 id-2 1.561853 43.35896

Note that we used random number generators in gen_hw, hence rerunning gen_table_data will give different values

gen_table_data(N = 2, recipe = dm_recipe)
##     id   height   weight
## 1 id-1 1.941219 71.03177
## 2 id-2 1.774771 62.25835

Variable Dependencies

We will now continue our dm example by defining the variable age which for illustrative purposes is dependent on the height.

gen_age <- function(n, .df) {
  .df %>%
    transmute(age = height*25)
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,  no_args,
   c("height", "weight"),   no_deps,        gen_hw,  no_args,
  "age",                    "height",       gen_age, no_args
)

gen_table_data(N = 2, recipe = dm_recipe)
##     id   height   weight      age
## 1 id-1 1.870419 62.85484 46.76048
## 2 id-2 1.609769 59.07986 40.24424

Note that respectables creates the arguments n and .df on the fly. Also, respectables determines the evaluation order of the variables based on the dependency structure. That is, respectables does not guarantee to build the resulting data frame using the recipe row by row.

Configurable Arguments

If we plan to make configurable variable generating functions we can specify the arguments in the recipe

gen_color <- function(n, colors = colors()) {
  data.frame(color = sample(colors, n, replace = TRUE))
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,     no_args,
   c("height", "weight"),   no_deps,        gen_hw,     no_args,
  "age",                    "height",       gen_age,    no_args,
  "color",                  no_deps,        gen_color,  list(color = c("blue", "red"))
)

gen_table_data(N = 4, recipe = dm_recipe)
##     id   height   weight      age color
## 1 id-1 1.829431 57.53877 45.73578  blue
## 2 id-2 1.595231 49.75982 39.88078  blue
## 3 id-3 1.865943 66.74266 46.64857  blue
## 4 id-4 1.835725 57.99518 45.89312  blue

Injecting Missing Data

The miss_recipe argument in gen_table_data can be used to inject missing values in the last step when creating data with gen_table_data. That is, first the data generation recipe is executed and then the missing data is injected. Hence, all variables are available at execution time and the .df argument is supplied to the func.

gen_alternate_na <- function(.df) {
  n <- nrow(.df)
  rep(c(TRUE, FALSE), length.out = n)
}

dm_na_recipe <- tribble(
  ~variables,       ~func,             ~func_args,
  "age",            gen_alternate_na,  no_args
)

gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
##     id   height   weight      age color
## 1 id-1 1.698284 69.90507       NA  blue
## 2 id-2 1.530274 41.59611 38.25684  blue
## 3 id-3 1.933799 94.47415       NA   red
## 4 id-4 1.871555 75.01149 46.78889   red

Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.

Scaffolding

For this example we create a data frame aseq with the variable seqterm being c("step 1", ..., "step i"), where i is extracted from the variable id.

dm <- gen_table_data(N = 3, recipe = dm_recipe)

# grow dataset
gen_seq <- function(.db) {

  dm <- .db$dm

  ni <- as.numeric(substring(dm$id, 4))

  df_grow <- data.frame(
    id = rep(dm$id, ni),
    seq = unlist(sapply(ni, seq, from = 1))
  )

  left_join(dm, df_grow, by = "id")
}

aseq_scf_recipe <- tribble(
  ~foreign_tbl, ~foreign_key, ~func,     ~func_args,
  "dm",         "id",         gen_seq,   no_args
)

gen_seq_term <- function(.df, ...) {
  data.frame(seqterm = paste("step", .df$seq))
}

aseq_recipe <- tribble(
  ~variables,      ~dependencies,  ~func,            ~func_args,
  "seqterm",       "seq",          gen_seq_term,     no_args
)

gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
##     id   height   weight      age color seq seqterm
## 1 id-1 1.699037 69.82811 42.47592   red   1  step 1
## 2 id-2 1.670821 55.50278 41.77051   red   1  step 1
## 3 id-2 1.670821 55.50278 41.77051   red   2  step 2
## 4 id-3 1.873748 64.04400 46.84370  blue   1  step 1
## 5 id-3 1.873748 64.04400 46.84370  blue   2  step 2
## 6 id-3 1.873748 64.04400 46.84370  blue   3  step 3

The steps here are:

  1. use joinrec to grow a new data frame, say A, possibly from db
  2. call gen_table_data with the following arguments
    • A for df
    • tblrec for recipe
    • forward miss_recipe

Note that this functionality is under development. Currently aseq_scf_recipe needs to be a tibble with one row, and the foreign_key is currently not used.

Compare dplyr

This section needs further work.

Let’s map the following code into respectible recipes:

iris %>%
  mutate(SPECIES = toupper(Species)) %>%
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1          5.1         3.5          1.4         0.2  setosa  SETOSA
## 2          4.9         3.0          1.4         0.2  setosa  SETOSA
## 3          4.7         3.2          1.3         0.2  setosa  SETOSA
## 4          4.6         3.1          1.5         0.2  setosa  SETOSA
## 5          5.0         3.6          1.4         0.2  setosa  SETOSA
## 6          5.4         3.9          1.7         0.4  setosa  SETOSA

There are multiple solutions to map this to the respectables framework.

gen_toupper <- function(varname, .df, ...) {
   toupper(.df[[varname]])
}

rcp <- tribble(
    ~variables, ~dependencies,  ~func,          ~func_args,
    "SPECIES",  "Species",       gen_toupper,   list(varname = "Species")
)

gen_table_data(recipe = rcp, df = iris) %>%
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
## 1          5.1         3.5          1.4         0.2  setosa  SETOSA
## 2          4.9         3.0          1.4         0.2  setosa  SETOSA
## 3          4.7         3.2          1.3         0.2  setosa  SETOSA
## 4          4.6         3.1          1.5         0.2  setosa  SETOSA
## 5          5.0         3.6          1.4         0.2  setosa  SETOSA
## 6          5.4         3.9          1.7         0.4  setosa  SETOSA

Note in gen_toupper we use the ellipsis ... to absorb not used arguments such as n.