
Include custom validation functions
Source:vignettes/articles/custom-functions.Rmd
custom-functions.RmdCustom validation functions can be included and configured within
standard hubValidation workflows by including a
validations.yml file in the hub-config
directory. Alternatively, an appropriately structured file can
be included at a different location and the path to the file provided
through argument validations_cfg_path.
hubValidations uses the config
package to get validation configuration. This allows for configuration
inheritance and the ability to include executable R code. See the
confog package vignette on inheritance
and R expressions for more details.
validations.yml structure
validations.yml files should follow the nested structure
described below:
Default configuration
The top of any validations.yml file, under the required
default: top level property, should contain default custom
validation configurations that will be executed regardless of round
ID.
Within the default configuration, individual checks can be configured
for each of the 3 validation functions run as part of
validate_submission(), using the following structure for
each validation function:
-
<name-of-caller-function>: One ofvalidate_model_data,validate_model_metadataandvalidate_model_filedepending on the function the custom check is to be included in.-
<name-of-check>: The name of the check. This is the name of the element containing the result of the check whenhub_validationsis returned (required).-
fn: The name of the check function to be run, as character string (required). -
pkg: The name of the package namespace from which to get check function. Must be supplied if function is distributed as part of a package. -
source:Path to.Rscript containing function code to be sourced. If relative, should be relative to the hub’s directory root. Must be supplied if function is not part of a package and only exists as a script. -
args: A yaml dictionary of key/value pairs or arguments to be passed to the custom function. Values can be yaml lists or even executable R code (optional).
-
-
Note that each of the validate_*() functions contain a
standard objects in their call environment which are passed
automatically to any custom check function and therefore do not need
including in the args configuration.
-
validate_model_file:-
file_path: character string of path to file being validated relative to themodel-outputdirectory. -
hub_path: character string of path to hub. -
round_id: character string ofround_id -
file_meta: named list containinground_id,team_abbr,model_abbrandmodel_iddetails.
-
-
validate_model_data:-
tbl: a tibble of the model output data being validated. -
file_path: character string of path to file being validated relative to themodel-outputdirectory. -
hub_path: character string of path to hub. -
round_id: character string ofround_id -
file_meta: named list containinground_id,team_abbr,model_abbrandmodel_iddetails. -
round_id_col: character string of name oftblcolumn containinground_idinformation.
-
-
validate_model_metadata:-
file_path: character string of path to file being validated relative to themodel-outputdirectory. -
hub_path: character string of path to hub. -
round_id: character string ofround_id -
file_meta: named list containinground_id,team_abbr,model_abbrandmodel_iddetails.
-
The args configuration can be used to override objects
from the caller environment as well as defaults.
Here’s an example configuration for a single check
(opt_check_tbl_horizon_timediff()) to be run as part of the
validate_model_data() validation function which checks the
content of the model data submission files.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"The above configuration file relies on default values for arguments
horizon_colname ("horizon") and
timediff (lubridate::weeks()). We can use the
validations.yml args list to override the
default values. Here’s an example that includes executable r
code as the value of an argument.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
horizon_colname: "horizons"
timediff: !expr lubridate::weeks(2)
Round specific configuration
Additional round specific configurations can be included in
validations.yml that can add to or override default
configurations.
For example, in the following validations.yml which
deploys the opt_check_tbl_col_timediff() optional check, if
the file being validated is being submitted to a round with round ID
"2023-08-15", default col_timediff check
configuration will be overridden by the 2023-08-15
configuration.
default:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
2023-08-15:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
timediff: !expr lubridate::weeks(1)Available optional functions
hubValidations includes a number of optional checks or
checks that require administrator configuration to be run, detailed
below.
For more detail on each function and its configuration parameters, consult the function documentation.
For deploying through validate_model_data
| check fun | Check | Early return | Fail output | Extra info |
|---|---|---|---|---|
| opt_check_tbl_col_timediff | Time difference between values in two date columns equal a defined period. | FALSE | check_failure | |
| opt_check_tbl_counts_lt_popn | Predicted values per location are less than total location population. | FALSE | check_failure | |
| opt_check_tbl_horizon_timediff | Time difference between values in two date columns equals a defined time period defined by values in a horizon column. | FALSE | check_failure |
For deploying through validate_model_metadata
| check fun | Check | Early return | Fail output | Extra info |
|---|---|---|---|---|
| opt_check_metadata_team_max_model_n | The number of metadata files submitted by a single team does not exceed the maximum number allowed. | FALSE | check_failure |