Generate variables in a table

gen_table_data(
  N = if (is.null(df)) 400 else NROW(df),
  recipe,
  df = NULL,
  df_keepcols = if (is.null(df)) character() else names(df),
  miss_recipe = NULL
)

Arguments

N

numeric(1). Number of rows to generate. Defaults to 400, or the number of rows in df if provided.

recipe

tibble. A recipe for generating variables into the dataset. see Details.

df

data.frame/tibble. Existing partial data which new variables should be added to, or NULL (the default).

df_keepcols

logical. which columns from df should be retained in the resulting dataset (by position). Defaults to all columns present in df.

miss_recipe

tibble. A recipe for generating missingness positions, or NULL (the default).

Details

The recipe parameter should be a tibble made up of one or more rows which define variable recipes via the following columns:

variables

(list column as needed) names of variables generated by that row. No empty/length 0 entries allowed

dependencies

(list column). Names of variables which must have already been populated for the the variables in this row to be synthesized

func

(list column) A character value which can be used to look up a function, or the function object itself, which accepts n, .df, and ... and returns either an atomic vector of length n, or a data.frame/tibble with n rows

func_args

(list column) a list of arguments which should be passed to func in addition to n and .df

The algorithm for synthesizing the table from the recipe is as follows:

  1. Columns of synthesized data are generated according to recipe rows which have no dependencies in the order they appear in the recipe tibble and appended to the dataset with names for the variables generated

  2. Recipe rows containing dependencies are checked in the order they appear in the recipe table for whether their dependencies are met, and if so data is synthesized for the corresponding variables and added to the dataset. This step is repeated until all recipe rows have been resolved, or until a full pass through the unresolved recipe rows does not lead to any new data synthesis.

  3. After all data synthesis is complete, columns are then reordered based on any columns of df first, followed by newly synthesized variables in the order the appear in the recipe table's variables column.

Examples

library(tibble) dat <- cbind(model = row.names(mtcars), as_tibble(mtcars)) recipe <- tribble(~variables, ~dependencies, ~func, ~func_args, "id", no_deps, "seq_n", NULL, "specialid", "id", function(n, .df) paste0("special-", .df$id), NULL) gen_table_data(10, recipe)
#> id specialid #> 1 1 special-1 #> 2 2 special-2 #> 3 3 special-3 #> 4 4 special-4 #> 5 5 special-5 #> 6 6 special-6 #> 7 7 special-7 #> 8 8 special-8 #> 9 9 special-9 #> 10 10 special-10