Build a single-level, grouped, or longitudinal simulation design and evaluate generator specifications into one shared data table.
Usage
simulate_data(
generators,
n = NULL,
n_groups = NULL,
n_per_group = NULL,
group_id = "group_id",
obs_id = "obs_id",
time_id = NULL,
time_values = NULL,
time_truncate = TRUE,
seed = NULL
)Arguments
- generators
Non-empty named list of generator specifications created by
gen_*()functions.- n
Total number of observations for a single-level design. For grouped designs,
nmay be supplied instead ofn_per_group; rows are distributed as evenly as possible across groups.- n_groups
Number of groups. When
NULL, a single-level design is created.- n_per_group
Group sizes. May be a scalar/vector, a function of
n_groups, or a count-distribution list such aslist(distribution = "poisson", lambda = 4, minimum = 1).- group_id, obs_id
Character scalars naming the group and observation index columns.
- time_id
Optional character scalar naming a time column.
- time_values
Optional time values. May be a vector shared by units or a function called per unit/group.
- time_truncate
Logical; when
TRUE, vectortime_valuesare truncated to each group size. WhenFALSE, every group must have the same length astime_values.- seed
Optional random seed. When supplied, the caller's existing random seed is restored after simulation.
Value
An object of class mlsim_data, a list with:
dataA
data.table::data.table()containing design columns and generated variables.metadataSimulation design metadata, seed, index metadata, and generator order.
generator_specsGenerator specifications with simulator closures removed.
generator_metadataPer-generator metadata recorded during simulation.
Details
Generators are evaluated in list order. Each generator can depend on columns
produced by earlier generators through the simulation context. Generated
column names must be unique and must not collide after make.names().
Examples
sim <- simulate_data(
n_groups = 3,
n_per_group = 2,
seed = 10,
generators = list(
group_x = gen_normal("group_x", level = "level2", mean = 0, sd = 1),
y = gen_poisson("y", lambda = 2)
)
)
sim$data
#> group_id obs_id group_x y
#> <int> <int> <num> <int>
#> 1: 1 1 0.01874617 1
#> 2: 1 2 0.01874617 1
#> 3: 2 1 -0.18425254 2
#> 4: 2 2 -0.18425254 2
#> 5: 3 1 -1.37133055 2
#> 6: 3 2 -1.37133055 2
summary(sim)
#> <summary.mlsim_data>
#>
#> Design:
#> n n_cols n_generated_cols n_groups group_id group_size_min
#> <int> <int> <int> <int> <char> <int>
#> 6 4 2 3 group_id 2
#> group_size_median group_size_max obs_id time_id seed n_generators
#> <int> <int> <char> <char> <int> <int>
#> 2 2 obs_id <NA> 10 2
#>
#> Generators:
#> generator distribution level vars n_vars parameter_level parameter_count
#> <char> <char> <char> <char> <int> <char> <int>
#> group_x normal level2 group_x 1 group 3
#> y poisson single y 1 row 6
#> has_row_parameters has_group_parameters has_fixed_parameters has_random_cov
#> <lgcl> <lgcl> <lgcl> <lgcl>
#> FALSE TRUE FALSE FALSE
#> TRUE FALSE FALSE FALSE
#> has_random_effects has_residuals has_scale_model has_formula has_ar_terms
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> FALSE FALSE FALSE FALSE FALSE
#> FALSE FALSE FALSE FALSE FALSE
#> has_composition has_custom_output
#> <lgcl> <lgcl>
#> FALSE FALSE
#> FALSE FALSE
longitudinal <- simulate_data(
n_groups = 2,
n_per_group = 3,
time_id = "visit",
generators = list(x = gen_normal("x"))
)
longitudinal$metadata$index
#> $longitudinal
#> [1] TRUE
#>
#> $unit_id
#> [1] "group_id"
#>
#> $obs_id
#> [1] "obs_id"
#>
#> $time_id
#> [1] "visit"
#>
#> $order
#> [1] "group_id" "visit"
#>
#> $time_type
#> [1] "integer"
#>
#> $time_unique_within_unit
#> [1] TRUE
#>