Include custom validation functions

library(hubValidations)

Custom validation functions can be included and configured within standard hubValidation workflows by including a validations.yml file in the hub-config directory. Alternatively, an appropriately structured file can be included at a different location and the path to the file provided through argument validations_cfg_path.

hubValidations uses the config package to get validation configuration. This allows for configuration inheritance and the ability to include executable R code. See the confog package vignette on inheritance and R expressions for more details.

`validations.yml` structure

validations.yml files should follow the nested structure described below:

Default configuration

The top of any validations.yml file, under the required default: top level property, should contain default custom validation configurations that will be executed regardless of round ID.

Within the default configuration, individual checks can be configured for each of the 3 validation functions run as part of validate_submission(), using the following structure for each validation function:

<name-of-caller-function>: One of validate_model_data, validate_model_metadata and validate_model_file depending on the function the custom check is to be included in.
- <name-of-check>: The name of the check. This is the name of the element containing the result of the check when hub_validations is returned (required).
  - fn: The name of the check function to be run, as character string (required).
  - pkg: The name of the package namespace from which to get check function. Must be supplied if function is distributed as part of a package.
  - source: Path to .R script containing function code to be sourced. If relative, should be relative to the hub’s directory root. Must be supplied if function is not part of a package and only exists as a script.
  - args: A yaml dictionary of key/value pairs or arguments to be passed to the custom function. Values can be yaml lists or even executable R code (optional).

Note that each of the validate_*() functions contain a standard objects in their call environment which are passed automatically to any custom check function and therefore do not need including in the args configuration.

validate_model_file:
- file_path: character string of path to file being validated relative to the model-output directory.
- hub_path: character string of path to hub.
- round_id: character string of round_id
- file_meta: named list containing round_id, team_abbr, model_abbr and model_id details.
validate_model_data:
- tbl: a tibble of the model output data being validated.
- file_path: character string of path to file being validated relative to the model-output directory.
- hub_path: character string of path to hub.
- round_id: character string of round_id
- file_meta: named list containing round_id, team_abbr, model_abbr and model_id details.
- round_id_col: character string of name of tbl column containing round_id information.
validate_model_metadata:
- file_path: character string of path to file being validated relative to the model-output directory.
- hub_path: character string of path to hub.
- round_id: character string of round_id
- file_meta: named list containing round_id, team_abbr, model_abbr and model_id details.

The args configuration can be used to override objects from the caller environment as well as defaults.

Here’s an example configuration for a single check (opt_check_tbl_horizon_timediff()) to be run as part of the validate_model_data() validation function which checks the content of the model data submission files.

default:
    validate_model_data:
      horizon_timediff:
        fn: "opt_check_tbl_horizon_timediff"
        pkg: "hubValidations"
        args:
          t0_colname: "forecast_date"
          t1_colname: "target_end_date"

The above configuration file relies on default values for arguments horizon_colname ("horizon") and timediff (lubridate::weeks()). We can use the validations.yml args list to override the default values. Here’s an example that includes executable r code as the value of an argument.

default:
    validate_model_data:
      horizon_timediff:
        fn: "opt_check_tbl_horizon_timediff"
        pkg: "hubValidations"
        args:
          t0_colname: "forecast_date"
          t1_colname: "target_end_date"
          horizon_colname: "horizons"
          timediff: !expr lubridate::weeks(2)

Round specific configuration

Additional round specific configurations can be included in validations.yml that can add to or override default configurations.

For example, in the following validations.yml which deploys the opt_check_tbl_col_timediff() optional check, if the file being validated is being submitted to a round with round ID "2023-08-15", default col_timediff check configuration will be overridden by the 2023-08-15 configuration.

default:
    validate_model_data:
      col_timediff:
        fn: "opt_check_tbl_col_timediff"
        pkg: "hubValidations"
        args:
          t0_colname: "forecast_date"
          t1_colname: "target_end_date"

2023-08-15:
    validate_model_data:
      col_timediff:
        fn: "opt_check_tbl_col_timediff"
        pkg: "hubValidations"
        args:
          t0_colname: "forecast_date"
          t1_colname: "target_end_date"
          timediff: !expr lubridate::weeks(1)

Available optional functions

hubValidations includes a number of optional checks or checks that require administrator configuration to be run, detailed below.

For more detail on each function and its configuration parameters, consult the function documentation.

For deploying through `validate_model_data`

Details of available optional checks or checks requiring configuration for `validate_model_data()`.
check fun	Check	Early return	Fail output
opt_check_tbl_col_timediff	Time difference between values in two date columns equal a defined period.	FALSE	check_failure
opt_check_tbl_counts_lt_popn	Predicted values per location are less than total location population.	FALSE	check_failure
opt_check_tbl_horizon_timediff	Time difference between values in two date columns equals a defined time period defined by values in a horizon column.	FALSE	check_failure

For deploying through `validate_model_metadata`

Details of available optional checks or checks requiring configuration for `validate_model_metadata()`.
check fun	Check	Early return	Fail output	Extra info
opt_check_metadata_team_max_model_n	The number of metadata files submitted by a single team does not exceed the maximum number allowed.	FALSE	check_failure

Managing dependencies of custom sourced functions

TODO

validations.yml structure