Create expanded grid of valid task ID and output type value combinations
Source:R/expand_model_out_grid.R
expand_model_out_grid.Rd
Create expanded grid of valid task ID and output type value combinations
Usage
expand_model_out_grid(
config_tasks,
round_id,
required_vals_only = FALSE,
all_character = FALSE,
output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
"logical", "Date"),
as_arrow_table = FALSE,
bind_model_tasks = TRUE,
include_sample_ids = FALSE,
compound_taskid_set = NULL,
output_types = NULL,
derived_task_ids = NULL
)
Arguments
- config_tasks
a list version of the content's of a hub's
tasks.json
config file, accessed through the"config_tasks"
attribute of a<hub_connection>
object or functionhubUtils::read_config()
.- round_id
Character string. Round identifier. If the round is set to
round_id_from_variable: true
, IDs are values of the task ID defined in the round'sround_id
property ofconfig_tasks
. Otherwise should match round'sround_id
value in config. Ignored if hub contains only a single round.- required_vals_only
Logical. Whether to return only combinations of Task ID and related output type ID required values.
- all_character
Logical. Whether to return all character column.
- output_type_id_datatype
character string. One of
"from_config"
,"auto"
,"character"
,"double"
,"integer"
,"logical"
,"Date"
. Defaults to"from_config"
which uses the setting in theoutput_type_id_datatype
property in thetasks.json
config file if available. If the property is not set in the config, the argument falls back to"auto"
which determines theoutput_type_id
data type automatically from thetasks.json
config file as the simplest data type required to represent all output type ID values across all output types in the hub. Other data type values can be used to override automatic determination. Note that attempting to coerceoutput_type_id
to a data type that is not valid for the data (e.g. trying to coerce"character"
values to"double"
) will likely result in an error or potentially unexpected behaviour so use with care.- as_arrow_table
Logical. Whether to return an arrow table. Defaults to
FALSE
.- bind_model_tasks
Logical. Whether to bind expanded grids of values from multiple modeling tasks into a single tibble/arrow table or return a list.
- include_sample_ids
Logical. Whether to include sample identifiers in the
output_type_id
column.- compound_taskid_set
List of character vectors, one for each modeling task in the round. Can be used to override the compound task ID set defined in the config. If
NULL
is provided for a given modeling task, a compound task ID set of all task IDs is used.- output_types
Character vector of output type names to include. Use to subset for grids for specific output types.
- derived_task_ids
Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain
NA
s.
Value
If bind_model_tasks = TRUE
(default) a tibble or arrow table
containing all possible task ID and related output type ID
value combinations. If bind_model_tasks = FALSE
, a list containing a
tibble or arrow table for each round modeling task.
Columns are coerced to data types according to the hub schema,
unless all_character = TRUE
. If all_character = TRUE
, all columns are returned as
character which can be faster when large expanded grids are expected.
If required_vals_only = TRUE
, values are limited to the combinations of required
values only.
Note that if required_vals_only = TRUE
and an optional output type is
requested through output_types
, a zero row grid will be returned.
If all output types are requested however (i.e. when output_types = NULL
) and
they are all optional, a grid of required task ID values only will be returned.
Details
When a round is set to round_id_from_variable: true
,
the value of the task ID from which round IDs are derived (i.e. the task ID
specified in round_id
property of config_tasks
) is set to the value of the
round_id
argument in the returned output.
When sample output types are included in the output and include_sample_ids = TRUE
,
the output_type_id
column contains example sample indexes which are useful
for identifying the compound task ID structure of multivariate sampling
distributions in particular, i.e. which combinations of task ID values
represent individual samples.
Examples
hub_con <- hubData::connect_hub(
system.file("testhubs/flusight", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2023-01-02")
#> # A tibble: 3,132 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2023-01-02 wk flu hosp rate c… 2 US pmf large_decrease
#> 2 2023-01-02 wk flu hosp rate c… 1 US pmf large_decrease
#> 3 2023-01-02 wk flu hosp rate c… 2 01 pmf large_decrease
#> 4 2023-01-02 wk flu hosp rate c… 1 01 pmf large_decrease
#> 5 2023-01-02 wk flu hosp rate c… 2 02 pmf large_decrease
#> 6 2023-01-02 wk flu hosp rate c… 1 02 pmf large_decrease
#> 7 2023-01-02 wk flu hosp rate c… 2 04 pmf large_decrease
#> 8 2023-01-02 wk flu hosp rate c… 1 04 pmf large_decrease
#> 9 2023-01-02 wk flu hosp rate c… 2 05 pmf large_decrease
#> 10 2023-01-02 wk flu hosp rate c… 1 05 pmf large_decrease
#> # ℹ 3,122 more rows
expand_model_out_grid(
config_tasks,
round_id = "2023-01-02",
required_vals_only = TRUE
)
#> # A tibble: 28 × 5
#> forecast_date horizon location output_type output_type_id
#> <date> <int> <chr> <chr> <chr>
#> 1 2023-01-02 2 US pmf large_decrease
#> 2 2023-01-02 2 US pmf decrease
#> 3 2023-01-02 2 US pmf stable
#> 4 2023-01-02 2 US pmf increase
#> 5 2023-01-02 2 US pmf large_increase
#> 6 2023-01-02 2 US quantile 0.01
#> 7 2023-01-02 2 US quantile 0.025
#> 8 2023-01-02 2 US quantile 0.05
#> 9 2023-01-02 2 US quantile 0.1
#> 10 2023-01-02 2 US quantile 0.15
#> # ℹ 18 more rows
# Specifying a round in a hub with multiple round configurations.
hub_con <- hubData::connect_hub(
system.file("testhubs/simple", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2022-10-01")
#> # A tibble: 5,184 × 6
#> origin_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <dbl>
#> 1 2022-10-01 wk inc flu hosp 1 US mean NA
#> 2 2022-10-01 wk inc flu hosp 2 US mean NA
#> 3 2022-10-01 wk inc flu hosp 3 US mean NA
#> 4 2022-10-01 wk inc flu hosp 4 US mean NA
#> 5 2022-10-01 wk inc flu hosp 1 01 mean NA
#> 6 2022-10-01 wk inc flu hosp 2 01 mean NA
#> 7 2022-10-01 wk inc flu hosp 3 01 mean NA
#> 8 2022-10-01 wk inc flu hosp 4 01 mean NA
#> 9 2022-10-01 wk inc flu hosp 1 02 mean NA
#> 10 2022-10-01 wk inc flu hosp 2 02 mean NA
#> # ℹ 5,174 more rows
# Later round_id maps to round config that includes additional task ID 'age_group'.
expand_model_out_grid(config_tasks, round_id = "2022-10-29")
#> # A tibble: 25,920 × 7
#> origin_date target horizon location age_group output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-10-29 wk inc flu… 1 US 65+ mean NA
#> 2 2022-10-29 wk inc flu… 2 US 65+ mean NA
#> 3 2022-10-29 wk inc flu… 3 US 65+ mean NA
#> 4 2022-10-29 wk inc flu… 4 US 65+ mean NA
#> 5 2022-10-29 wk inc flu… 1 01 65+ mean NA
#> 6 2022-10-29 wk inc flu… 2 01 65+ mean NA
#> 7 2022-10-29 wk inc flu… 3 01 65+ mean NA
#> 8 2022-10-29 wk inc flu… 4 01 65+ mean NA
#> 9 2022-10-29 wk inc flu… 1 02 65+ mean NA
#> 10 2022-10-29 wk inc flu… 2 02 65+ mean NA
#> # ℹ 25,910 more rows
# Coerce all columns to character
expand_model_out_grid(config_tasks,
round_id = "2022-10-29",
all_character = TRUE
)
#> # A tibble: 25,920 × 7
#> origin_date target horizon location age_group output_type output_type_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2022-10-29 wk inc flu… 1 US 65+ mean NA
#> 2 2022-10-29 wk inc flu… 2 US 65+ mean NA
#> 3 2022-10-29 wk inc flu… 3 US 65+ mean NA
#> 4 2022-10-29 wk inc flu… 4 US 65+ mean NA
#> 5 2022-10-29 wk inc flu… 1 01 65+ mean NA
#> 6 2022-10-29 wk inc flu… 2 01 65+ mean NA
#> 7 2022-10-29 wk inc flu… 3 01 65+ mean NA
#> 8 2022-10-29 wk inc flu… 4 01 65+ mean NA
#> 9 2022-10-29 wk inc flu… 1 02 65+ mean NA
#> 10 2022-10-29 wk inc flu… 2 02 65+ mean NA
#> # ℹ 25,910 more rows
# Return arrow table
expand_model_out_grid(config_tasks,
round_id = "2022-10-29",
all_character = TRUE,
as_arrow_table = TRUE
)
#> Table
#> 25920 rows x 7 columns
#> $origin_date <string>
#> $target <string>
#> $horizon <string>
#> $location <string>
#> $age_group <string>
#> $output_type <string>
#> $output_type_id <string>
# Hub with sample output type
config_tasks <- hubUtils::read_config_file(system.file("config", "tasks.json",
package = "hubValidations"
))
expand_model_out_grid(config_tasks,
round_id = "2022-12-26"
)
#> # A tibble: 42 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2022-12-26 wk ahead inc flu h… 2 US sample NA
#> 2 2022-12-26 wk ahead inc flu h… 1 US sample NA
#> 3 2022-12-26 wk ahead inc flu h… 2 01 sample NA
#> 4 2022-12-26 wk ahead inc flu h… 1 01 sample NA
#> 5 2022-12-26 wk ahead inc flu h… 2 02 sample NA
#> 6 2022-12-26 wk ahead inc flu h… 1 02 sample NA
#> 7 2022-12-26 wk ahead inc flu h… 2 US mean NA
#> 8 2022-12-26 wk ahead inc flu h… 1 US mean NA
#> 9 2022-12-26 wk ahead inc flu h… 2 01 mean NA
#> 10 2022-12-26 wk ahead inc flu h… 1 01 mean NA
#> # ℹ 32 more rows
# Include sample IDS
expand_model_out_grid(config_tasks,
round_id = "2022-12-26",
include_sample_ids = TRUE
)
#> # A tibble: 42 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2022-12-26 wk ahead inc flu h… 2 US mean NA
#> 2 2022-12-26 wk ahead inc flu h… 1 US mean NA
#> 3 2022-12-26 wk ahead inc flu h… 2 01 mean NA
#> 4 2022-12-26 wk ahead inc flu h… 1 01 mean NA
#> 5 2022-12-26 wk ahead inc flu h… 2 02 mean NA
#> 6 2022-12-26 wk ahead inc flu h… 1 02 mean NA
#> 7 2022-12-26 wk ahead inc flu h… 2 US sample s1
#> 8 2022-12-26 wk ahead inc flu h… 1 US sample s2
#> 9 2022-12-26 wk ahead inc flu h… 2 01 sample s3
#> 10 2022-12-26 wk ahead inc flu h… 1 01 sample s4
#> # ℹ 32 more rows
# Hub with sample output type and compound task ID structure
config_tasks <- hubUtils::read_config_file(
system.file("config", "tasks-comp-tid.json", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
round_id = "2022-12-26",
include_sample_ids = TRUE
)
#> # A tibble: 42 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2022-12-26 wk ahead inc flu h… 2 US mean NA
#> 2 2022-12-26 wk ahead inc flu h… 1 US mean NA
#> 3 2022-12-26 wk ahead inc flu h… 2 01 mean NA
#> 4 2022-12-26 wk ahead inc flu h… 1 01 mean NA
#> 5 2022-12-26 wk ahead inc flu h… 2 02 mean NA
#> 6 2022-12-26 wk ahead inc flu h… 1 02 mean NA
#> 7 2022-12-26 wk ahead inc flu h… 2 US sample 1
#> 8 2022-12-26 wk ahead inc flu h… 2 01 sample 1
#> 9 2022-12-26 wk ahead inc flu h… 2 02 sample 1
#> 10 2022-12-26 wk ahead inc flu h… 1 US sample 2
#> # ℹ 32 more rows
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
expand_model_out_grid(config_tasks,
round_id = "2022-12-26",
include_sample_ids = TRUE,
compound_taskid_set = list(
c("forecast_date", "target"),
NULL
)
)
#> # A tibble: 42 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2022-12-26 wk ahead inc flu h… 2 US mean NA
#> 2 2022-12-26 wk ahead inc flu h… 1 US mean NA
#> 3 2022-12-26 wk ahead inc flu h… 2 01 mean NA
#> 4 2022-12-26 wk ahead inc flu h… 1 01 mean NA
#> 5 2022-12-26 wk ahead inc flu h… 2 02 mean NA
#> 6 2022-12-26 wk ahead inc flu h… 1 02 mean NA
#> 7 2022-12-26 wk ahead inc flu h… 2 US sample 1
#> 8 2022-12-26 wk ahead inc flu h… 1 US sample 1
#> 9 2022-12-26 wk ahead inc flu h… 2 01 sample 1
#> 10 2022-12-26 wk ahead inc flu h… 1 01 sample 1
#> # ℹ 32 more rows
expand_model_out_grid(config_tasks,
round_id = "2022-12-26",
include_sample_ids = TRUE,
compound_taskid_set = list(
NULL,
NULL
)
)
#> # A tibble: 42 × 6
#> forecast_date target horizon location output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr>
#> 1 2022-12-26 wk ahead inc flu h… 2 US mean NA
#> 2 2022-12-26 wk ahead inc flu h… 1 US mean NA
#> 3 2022-12-26 wk ahead inc flu h… 2 01 mean NA
#> 4 2022-12-26 wk ahead inc flu h… 1 01 mean NA
#> 5 2022-12-26 wk ahead inc flu h… 2 02 mean NA
#> 6 2022-12-26 wk ahead inc flu h… 1 02 mean NA
#> 7 2022-12-26 wk ahead inc flu h… 2 US sample 1
#> 8 2022-12-26 wk ahead inc flu h… 1 US sample 2
#> 9 2022-12-26 wk ahead inc flu h… 2 01 sample 3
#> 10 2022-12-26 wk ahead inc flu h… 1 01 sample 4
#> # ℹ 32 more rows
# Subset output types
config_tasks <- hubUtils::read_config(
system.file("testhubs", "samples", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
round_id = "2022-10-29",
include_sample_ids = TRUE,
bind_model_tasks = FALSE,
output_types = c("sample", "pmf"),
)
#> [[1]]
#> # A tibble: 2,560 × 7
#> reference_date target horizon location target_end_date output_type
#> <date> <chr> <int> <chr> <date> <chr>
#> 1 2022-10-29 wk flu hosp rate… 0 US 2022-10-22 pmf
#> 2 2022-10-29 wk flu hosp rate… 1 US 2022-10-22 pmf
#> 3 2022-10-29 wk flu hosp rate… 2 US 2022-10-22 pmf
#> 4 2022-10-29 wk flu hosp rate… 3 US 2022-10-22 pmf
#> 5 2022-10-29 wk flu hosp rate… 0 01 2022-10-22 pmf
#> 6 2022-10-29 wk flu hosp rate… 1 01 2022-10-22 pmf
#> 7 2022-10-29 wk flu hosp rate… 2 01 2022-10-22 pmf
#> 8 2022-10-29 wk flu hosp rate… 3 01 2022-10-22 pmf
#> 9 2022-10-29 wk flu hosp rate… 0 02 2022-10-22 pmf
#> 10 2022-10-29 wk flu hosp rate… 1 02 2022-10-22 pmf
#> # ℹ 2,550 more rows
#> # ℹ 1 more variable: output_type_id <chr>
#>
#> [[2]]
#> # A tibble: 640 × 7
#> reference_date target horizon location target_end_date output_type
#> <date> <chr> <int> <chr> <date> <chr>
#> 1 2022-10-29 wk inc flu hosp 0 US 2022-10-22 sample
#> 2 2022-10-29 wk inc flu hosp 1 US 2022-10-22 sample
#> 3 2022-10-29 wk inc flu hosp 2 US 2022-10-22 sample
#> 4 2022-10-29 wk inc flu hosp 3 US 2022-10-22 sample
#> 5 2022-10-29 wk inc flu hosp 0 US 2022-10-29 sample
#> 6 2022-10-29 wk inc flu hosp 1 US 2022-10-29 sample
#> 7 2022-10-29 wk inc flu hosp 2 US 2022-10-29 sample
#> 8 2022-10-29 wk inc flu hosp 3 US 2022-10-29 sample
#> 9 2022-10-29 wk inc flu hosp 0 US 2022-11-05 sample
#> 10 2022-10-29 wk inc flu hosp 1 US 2022-11-05 sample
#> # ℹ 630 more rows
#> # ℹ 1 more variable: output_type_id <chr>
#>
expand_model_out_grid(config_tasks,
round_id = "2022-10-29",
include_sample_ids = TRUE,
bind_model_tasks = TRUE,
output_types = "sample",
)
#> # A tibble: 640 × 7
#> reference_date target horizon location target_end_date output_type
#> <date> <chr> <int> <chr> <date> <chr>
#> 1 2022-10-29 wk inc flu hosp 0 US 2022-10-22 sample
#> 2 2022-10-29 wk inc flu hosp 1 US 2022-10-22 sample
#> 3 2022-10-29 wk inc flu hosp 2 US 2022-10-22 sample
#> 4 2022-10-29 wk inc flu hosp 3 US 2022-10-22 sample
#> 5 2022-10-29 wk inc flu hosp 0 US 2022-10-29 sample
#> 6 2022-10-29 wk inc flu hosp 1 US 2022-10-29 sample
#> 7 2022-10-29 wk inc flu hosp 2 US 2022-10-29 sample
#> 8 2022-10-29 wk inc flu hosp 3 US 2022-10-29 sample
#> 9 2022-10-29 wk inc flu hosp 0 US 2022-11-05 sample
#> 10 2022-10-29 wk inc flu hosp 1 US 2022-11-05 sample
#> # ℹ 630 more rows
#> # ℹ 1 more variable: output_type_id <chr>
# Ignore derived task IDs
expand_model_out_grid(config_tasks,
round_id = "2022-10-29",
include_sample_ids = TRUE,
bind_model_tasks = FALSE,
output_types = "sample",
derived_task_ids = "target_end_date"
)
#> [[1]]
#> # A tibble: 0 × 0
#>
#> [[2]]
#> # A tibble: 20 × 7
#> reference_date target horizon location target_end_date output_type
#> <date> <chr> <int> <chr> <date> <chr>
#> 1 2022-10-29 wk inc flu hosp 0 US NA sample
#> 2 2022-10-29 wk inc flu hosp 1 US NA sample
#> 3 2022-10-29 wk inc flu hosp 2 US NA sample
#> 4 2022-10-29 wk inc flu hosp 3 US NA sample
#> 5 2022-10-29 wk inc flu hosp 0 01 NA sample
#> 6 2022-10-29 wk inc flu hosp 1 01 NA sample
#> 7 2022-10-29 wk inc flu hosp 2 01 NA sample
#> 8 2022-10-29 wk inc flu hosp 3 01 NA sample
#> 9 2022-10-29 wk inc flu hosp 0 02 NA sample
#> 10 2022-10-29 wk inc flu hosp 1 02 NA sample
#> 11 2022-10-29 wk inc flu hosp 2 02 NA sample
#> 12 2022-10-29 wk inc flu hosp 3 02 NA sample
#> 13 2022-10-29 wk inc flu hosp 0 04 NA sample
#> 14 2022-10-29 wk inc flu hosp 1 04 NA sample
#> 15 2022-10-29 wk inc flu hosp 2 04 NA sample
#> 16 2022-10-29 wk inc flu hosp 3 04 NA sample
#> 17 2022-10-29 wk inc flu hosp 0 05 NA sample
#> 18 2022-10-29 wk inc flu hosp 1 05 NA sample
#> 19 2022-10-29 wk inc flu hosp 2 05 NA sample
#> 20 2022-10-29 wk inc flu hosp 3 05 NA sample
#> # ℹ 1 more variable: output_type_id <chr>
#>