Create a model output submission file template
Usage
submission_tmpl(
hub_con,
config_tasks,
round_id,
required_vals_only = FALSE,
force_output_types = FALSE,
complete_cases_only = TRUE,
compound_taskid_set = NULL,
output_types = NULL,
derived_task_ids = NULL
)
Arguments
- hub_con
A
<hub_connection
> class object.- config_tasks
a list version of the content's of a hub's
tasks.json
config file, accessed through the"config_tasks"
attribute of a<hub_connection>
object or functionhubUtils::read_config()
.- round_id
Character string. Round identifier. If the round is set to
round_id_from_variable: true
, IDs are values of the task ID defined in the round'sround_id
property ofconfig_tasks
. Otherwise should match round'sround_id
value in config. Ignored if hub contains only a single round.- required_vals_only
Logical. Whether to return only combinations of Task ID and related output type ID required values.
- force_output_types
Logical. Whether to force all output types to be required. If
TRUE
, all output type ID values are treated as required regardless of the value of theis_required
property. Useful for creating grids of required values for optional output types.- complete_cases_only
Logical. If
TRUE
(default) andrequired_vals_only = TRUE
, only rows with complete cases of combinations of required values are returned. IfFALSE
, rows with incomplete cases of combinations of required values are included in the output.- compound_taskid_set
List of character vectors, one for each modeling task in the round. Can be used to override the compound task ID set defined in the config. If
NULL
is provided for a given modeling task, a compound task ID set of all task IDs is used.- output_types
Character vector of output type names to include. Use to subset for grids for specific output types.
- derived_task_ids
Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain
NA
s. IfNULL
, defaults to extracting derived task IDs fromconfig_tasks
or theconfig_tasks
attribute ofhub_con
. Seeget_config_derived_task_ids()
for more details.
Value
a tibble template containing an expanded grid of valid task ID and
output type ID value combinations for a given submission round
and output type.
If required_vals_only = TRUE
, values are limited to the combination of required
values only.
Details
For task IDs where all values are optional, by default, columns
are created as columns of NA
s when required_vals_only = TRUE
.
When such columns exist, the function returns a tibble with zero rows, as no
complete cases of required value combinations exists.
(Note that determination of complete cases does excludes valid NA
output_type_id
values in "mean"
and "median"
output types).
To return a template of incomplete required cases, which includes NA
columns, use
complete_cases_only = FALSE
.
To include output types that are optional in the submission template
when required_vals_only = TRUE
and complete_cases_only = FALSE
, use
force_output_types = TRUE
. Use this in combination with sub-setting for
output types you plan to submit via argument output_types
to create a
submission template customised to your submission plans.
Tip: to ensure you create a template with all required output types, it's
a good idea to first run the functions without subsetting or forcing output
types and examing the unique values in output_type
to check which output
types are required.
When sample output types are included in the output, the output_type_id
column contains example sample indexes which are useful for identifying the
compound task ID structure of multivariate sampling distributions in particular,
i.e. which combinations of task ID values represent individual samples.
When a round is set to round_id_from_variable: true
,
the value of the task ID from which round IDs are derived (i.e. the task ID
specified in round_id
property of config_tasks
) is set to the value of the
round_id
argument in the returned output.
Examples
hub_con <- hubData::connect_hub(
system.file("testhubs/flusight", package = "hubUtils")
)
submission_tmpl(hub_con, round_id = "2023-01-02")
#> # A tibble: 3,132 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2023-01-02 wk flu hosp … 2 US pmf large_decrease NA
#> 2 2023-01-02 wk flu hosp … 1 US pmf large_decrease NA
#> 3 2023-01-02 wk flu hosp … 2 01 pmf large_decrease NA
#> 4 2023-01-02 wk flu hosp … 1 01 pmf large_decrease NA
#> 5 2023-01-02 wk flu hosp … 2 02 pmf large_decrease NA
#> 6 2023-01-02 wk flu hosp … 1 02 pmf large_decrease NA
#> 7 2023-01-02 wk flu hosp … 2 04 pmf large_decrease NA
#> 8 2023-01-02 wk flu hosp … 1 04 pmf large_decrease NA
#> 9 2023-01-02 wk flu hosp … 2 05 pmf large_decrease NA
#> 10 2023-01-02 wk flu hosp … 1 05 pmf large_decrease NA
#> # ℹ 3,122 more rows
submission_tmpl(
hub_con,
round_id = "2023-01-02",
required_vals_only = TRUE
)
#> # A tibble: 0 × 7
#> # ℹ 7 variables: forecast_date <date>, target <chr>, horizon <int>,
#> # location <chr>, output_type <chr>, output_type_id <chr>, value <dbl>
submission_tmpl(
hub_con,
round_id = "2023-01-02",
required_vals_only = TRUE,
complete_cases_only = FALSE
)
#> ! Column "target" whose values are all optional included as all `NA` column.
#> ! Round contains more than one modeling task (2)
#> ℹ See Hub's tasks.json file or <hub_connection> attribute "config_tasks" for
#> details of optional task ID/output_type/output_type ID value combinations.
#> # A tibble: 28 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2023-01-02 NA 2 US pmf large_decrease NA
#> 2 2023-01-02 NA 2 US pmf decrease NA
#> 3 2023-01-02 NA 2 US pmf stable NA
#> 4 2023-01-02 NA 2 US pmf increase NA
#> 5 2023-01-02 NA 2 US pmf large_increase NA
#> 6 2023-01-02 NA 2 US quantile 0.01 NA
#> 7 2023-01-02 NA 2 US quantile 0.025 NA
#> 8 2023-01-02 NA 2 US quantile 0.05 NA
#> 9 2023-01-02 NA 2 US quantile 0.1 NA
#> 10 2023-01-02 NA 2 US quantile 0.15 NA
#> # ℹ 18 more rows
# Specifying a round in a hub with multiple rounds
hub_con <- hubData::connect_hub(
system.file("testhubs/simple", package = "hubUtils")
)
submission_tmpl(hub_con, round_id = "2022-10-01")
#> # A tibble: 5,184 × 7
#> origin_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <dbl> <int>
#> 1 2022-10-01 wk inc flu hosp 1 US mean NA NA
#> 2 2022-10-01 wk inc flu hosp 2 US mean NA NA
#> 3 2022-10-01 wk inc flu hosp 3 US mean NA NA
#> 4 2022-10-01 wk inc flu hosp 4 US mean NA NA
#> 5 2022-10-01 wk inc flu hosp 1 01 mean NA NA
#> 6 2022-10-01 wk inc flu hosp 2 01 mean NA NA
#> 7 2022-10-01 wk inc flu hosp 3 01 mean NA NA
#> 8 2022-10-01 wk inc flu hosp 4 01 mean NA NA
#> 9 2022-10-01 wk inc flu hosp 1 02 mean NA NA
#> 10 2022-10-01 wk inc flu hosp 2 02 mean NA NA
#> # ℹ 5,174 more rows
submission_tmpl(hub_con, round_id = "2022-10-29")
#> # A tibble: 25,920 × 8
#> origin_date target horizon location age_group output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-10-29 wk inc flu… 1 US 65+ mean NA
#> 2 2022-10-29 wk inc flu… 2 US 65+ mean NA
#> 3 2022-10-29 wk inc flu… 3 US 65+ mean NA
#> 4 2022-10-29 wk inc flu… 4 US 65+ mean NA
#> 5 2022-10-29 wk inc flu… 1 01 65+ mean NA
#> 6 2022-10-29 wk inc flu… 2 01 65+ mean NA
#> 7 2022-10-29 wk inc flu… 3 01 65+ mean NA
#> 8 2022-10-29 wk inc flu… 4 01 65+ mean NA
#> 9 2022-10-29 wk inc flu… 1 02 65+ mean NA
#> 10 2022-10-29 wk inc flu… 2 02 65+ mean NA
#> # ℹ 25,910 more rows
#> # ℹ 1 more variable: value <int>
submission_tmpl(hub_con,
round_id = "2022-10-29",
required_vals_only = TRUE
)
#> # A tibble: 0 × 8
#> # ℹ 8 variables: origin_date <date>, target <chr>, horizon <int>,
#> # location <chr>, age_group <chr>, output_type <chr>, output_type_id <dbl>,
#> # value <int>
submission_tmpl(hub_con,
round_id = "2022-10-29",
required_vals_only = TRUE,
complete_cases_only = FALSE
)
#> ! Column "location" whose values are all optional included as all `NA` column.
#> ℹ See Hub's tasks.json file or <hub_connection> attribute "config_tasks" for
#> details of optional task ID/output_type/output_type ID value combinations.
#> # A tibble: 23 × 8
#> origin_date target horizon location age_group output_type output_type_id
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.01
#> 2 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.025
#> 3 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.05
#> 4 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.1
#> 5 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.15
#> 6 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.2
#> 7 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.25
#> 8 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.3
#> 9 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.35
#> 10 2022-10-29 wk inc flu… 1 NA 65+ quantile 0.4
#> # ℹ 13 more rows
#> # ℹ 1 more variable: value <int>
# Hub with sample output type
config_tasks <- read_config_file(system.file("config", "tasks.json",
package = "hubValidations"
))
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-26"
)
#> # A tibble: 42 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-12-26 wk ahead inc… 2 US mean NA NA
#> 2 2022-12-26 wk ahead inc… 1 US mean NA NA
#> 3 2022-12-26 wk ahead inc… 2 01 mean NA NA
#> 4 2022-12-26 wk ahead inc… 1 01 mean NA NA
#> 5 2022-12-26 wk ahead inc… 2 02 mean NA NA
#> 6 2022-12-26 wk ahead inc… 1 02 mean NA NA
#> 7 2022-12-26 wk ahead inc… 2 US sample s1 NA
#> 8 2022-12-26 wk ahead inc… 1 US sample s2 NA
#> 9 2022-12-26 wk ahead inc… 2 01 sample s3 NA
#> 10 2022-12-26 wk ahead inc… 1 01 sample s4 NA
#> # ℹ 32 more rows
# Hub with sample output type and compound task ID structure
config_tasks <- read_config_file(system.file("config", "tasks-comp-tid.json",
package = "hubValidations"
))
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-26"
)
#> # A tibble: 42 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-12-26 wk ahead inc… 2 US mean NA NA
#> 2 2022-12-26 wk ahead inc… 1 US mean NA NA
#> 3 2022-12-26 wk ahead inc… 2 01 mean NA NA
#> 4 2022-12-26 wk ahead inc… 1 01 mean NA NA
#> 5 2022-12-26 wk ahead inc… 2 02 mean NA NA
#> 6 2022-12-26 wk ahead inc… 1 02 mean NA NA
#> 7 2022-12-26 wk ahead inc… 2 US sample 1 NA
#> 8 2022-12-26 wk ahead inc… 2 01 sample 1 NA
#> 9 2022-12-26 wk ahead inc… 2 02 sample 1 NA
#> 10 2022-12-26 wk ahead inc… 1 US sample 2 NA
#> # ℹ 32 more rows
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-26",
compound_taskid_set = list(
c("forecast_date", "target"),
NULL
)
)
#> # A tibble: 42 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-12-26 wk ahead inc… 2 US mean NA NA
#> 2 2022-12-26 wk ahead inc… 1 US mean NA NA
#> 3 2022-12-26 wk ahead inc… 2 01 mean NA NA
#> 4 2022-12-26 wk ahead inc… 1 01 mean NA NA
#> 5 2022-12-26 wk ahead inc… 2 02 mean NA NA
#> 6 2022-12-26 wk ahead inc… 1 02 mean NA NA
#> 7 2022-12-26 wk ahead inc… 2 US sample 1 NA
#> 8 2022-12-26 wk ahead inc… 1 US sample 1 NA
#> 9 2022-12-26 wk ahead inc… 2 01 sample 1 NA
#> 10 2022-12-26 wk ahead inc… 1 01 sample 1 NA
#> # ℹ 32 more rows
# Subsetting for a single output type
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-26",
output_types = "sample"
)
#> # A tibble: 6 × 7
#> forecast_date target horizon location output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 2022-12-26 wk ahead inc … 2 US sample 1 NA
#> 2 2022-12-26 wk ahead inc … 2 01 sample 1 NA
#> 3 2022-12-26 wk ahead inc … 2 02 sample 1 NA
#> 4 2022-12-26 wk ahead inc … 1 US sample 2 NA
#> 5 2022-12-26 wk ahead inc … 1 01 sample 2 NA
#> 6 2022-12-26 wk ahead inc … 1 02 sample 2 NA
# Derive a template with ignored derived task ID. Useful to avoid creating
# a template with invalid derived task ID value combinations.
config_tasks <- read_config(
system.file("testhubs", "flusight", package = "hubValidations")
)
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-12",
output_types = "pmf",
derived_task_ids = "target_end_date",
complete_cases_only = FALSE
)
#> # A tibble: 540 × 8
#> forecast_date target_end_date target horizon location output_type
#> <date> <date> <chr> <int> <chr> <chr>
#> 1 2022-12-12 NA wk flu hosp rate … 2 US pmf
#> 2 2022-12-12 NA wk flu hosp rate … 1 US pmf
#> 3 2022-12-12 NA wk flu hosp rate … 2 01 pmf
#> 4 2022-12-12 NA wk flu hosp rate … 1 01 pmf
#> 5 2022-12-12 NA wk flu hosp rate … 2 02 pmf
#> 6 2022-12-12 NA wk flu hosp rate … 1 02 pmf
#> 7 2022-12-12 NA wk flu hosp rate … 2 04 pmf
#> 8 2022-12-12 NA wk flu hosp rate … 1 04 pmf
#> 9 2022-12-12 NA wk flu hosp rate … 2 05 pmf
#> 10 2022-12-12 NA wk flu hosp rate … 1 05 pmf
#> # ℹ 530 more rows
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
# Force optional output type, in this case "mean".
submission_tmpl(
config_tasks = config_tasks,
round_id = "2022-12-12",
required_vals_only = TRUE,
output_types = c("pmf", "quantile", "mean"),
force_output_types = TRUE,
derived_task_ids = "target_end_date",
complete_cases_only = FALSE
)
#> ! Columns "target_end_date" and "target" whose values are all optional included
#> as all `NA` columns.
#> ! Round contains more than one modeling task (2)
#> ℹ See Hub's tasks.json file or <hub_connection> attribute "config_tasks" for
#> details of optional task ID/output_type/output_type ID value combinations.
#> # A tibble: 29 × 8
#> forecast_date target_end_date target horizon location output_type
#> <date> <date> <chr> <int> <chr> <chr>
#> 1 2022-12-12 NA NA 2 US pmf
#> 2 2022-12-12 NA NA 2 US pmf
#> 3 2022-12-12 NA NA 2 US pmf
#> 4 2022-12-12 NA NA 2 US pmf
#> 5 2022-12-12 NA NA 2 US pmf
#> 6 2022-12-12 NA NA 2 US quantile
#> 7 2022-12-12 NA NA 2 US quantile
#> 8 2022-12-12 NA NA 2 US quantile
#> 9 2022-12-12 NA NA 2 US quantile
#> 10 2022-12-12 NA NA 2 US quantile
#> # ℹ 19 more rows
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>