Deploying custom validation functions
Source:vignettes/articles/deploying-custom-functions.Rmd
deploying-custom-functions.Rmd
Custom validation functions can be included and configured within
standard hubValidation
workflows by including a
validations.yml
file in the hub-config
directory. Alternatively, an appropriately structured file can
be included at a different location and the path to the file provided
through argument validations_cfg_path
.
hubValidations
uses the config
package to get validation configuration. This allows for configuration
inheritance and the ability to include executable R code. See the
config
package vignette on inheritance
and R expressions for more details.
validations.yml
structure
validations.yml
files should follow the nested structure
described below:
Default configuration
The top of any validations.yml
file, under the required
default:
top level property, should contain default custom
validation configurations that will be executed regardless of round
ID.
Within the default configuration, individual checks can be configured
for each of the 3 validation functions run as part of
validate_submission()
, using the following structure for
each validation function:
-
<name-of-caller-function>
: One ofvalidate_model_data
,validate_model_metadata
andvalidate_model_file
depending on the function the custom check is to be included in.-
<name-of-check>
: The name of the check. This is the name of the element containing the result of the check whenhub_validations
is returned (required).-
fn
: The name of the check function to be run, as character string (required). -
pkg
: The name of the package namespace from which to get check function. Must be supplied if function is distributed as part of a package. -
source:
Path to.R
script containing function code to be sourced. If relative, should be relative to the hub’s directory root. Must be supplied if function is not part of a package and only exists as a script. -
args
: A yaml dictionary of key/value pairs of arguments and their values to be passed to the custom function. Values can be yaml lists or even executable R code (optional).
-
-
Each of the validate_*()
functions contain a
number of standard objects in their call environment which are
available for downstream check functions to use as
arguments and are passed automatically to
arguments of optional/custom functions with the same
name. Therefore, values for such arguments do not need
including in function deployment configuration but can
be overridden through a function’s args
configuration in validations.yml
during
deployment.
All validate_*()
functions will contain the
following five objects in their caller environment:
-
file_path
: character string of path to file being validated relative to themodel-output
directory.
-
hub_path
: character string of path to hub. -
round_id
: character string ofround_id
derived from the model file name. -
file_meta
: named list containinground_id
,team_abbr
,model_abbr
andmodel_id
details. -
validations_cfg_path
: character string of path tovalidations.yml
file. Defaults tohub-config/validations.yml
.
validate_model_data()
will contain the following
additional objects:
-
tbl
: a tibble of the model output data being validated.
-
tbl_chr
: a tibble of the model output data being validated with all columns coerced to character type.
-
round_id_col
: character string of name oftbl
column containinground_id
information. Defaults toNULL
and usually determined from thetasks.json
config if applicable unless explicitly provided as an argument tovalidate_model_data()
. -
output_type_id_datatype
: character string. The value of theoutput_type_id_datatype
argument. This value is useful in functions likehubData::create_hub_schema()
orhubValidations::expand_model_out_grid()
to set the data type ofoutput_type_id
column. -
derived_task_ids
: character vector orNULL
. The value of thederived_task_ids
argument, i.e. the names of task IDs whose values depend on other task IDs.
The args
configuration can be used to override objects
from the caller environment as well as defaults during deployment.
Deploying optional hubValidations
functions
Here’s an example configuration for a single optional
hubValidations
check,
opt_check_tbl_horizon_timediff()
, which checks that the
temporal difference between the values in two date columns (defined by
additional arguments t0_colname
&
t1_colname
) is equal to a time period defined by horizon
values (contained in a column defined by horizon_colname
)
and the length of a single horizon defined by argument
timediff
.
The check is to be run as part of the
validate_model_data()
validation function which checks the
content of the model data submission files.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
The above configuration file relies on default values for arguments
horizon_colname
("horizon"
) and
timediff
(lubridate::weeks()
). We can
use the validations.yml
args
list to
override the horizon_colname
and timediff
argument default values.
In this example, we also include executable r code
as the value of the timediff
argument.
default:
validate_model_data:
horizon_timediff:
fn: "opt_check_tbl_horizon_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
horizon_colname: "horizons"
timediff: !expr lubridate::weeks(2)
Deploying custom functions
The above example involved an optional hubValidation
function. To deploy a custom function that is not part of the
hubValidations
or any other package, you should store the
script containing the function in the src/validations/R/
directory (relative to the root of your hub) and include the path to the
script in the source
argument in the configuration
file.
default:
validate_model_data:
custom_check:
fn: "cstm_check_tbl_example"
source: "src/validations/R/cstm_check_tbl_example.R"
Round specific configuration
Additional round specific configurations can be included in
validations.yml
that can add to or override default
configurations.
For example, in the following validations.yml
which
deploys the opt_check_tbl_col_timediff()
optional check, if
the file being validated is being submitted to a round with round ID
"2023-08-15"
, default col_timediff
check
configuration will be overridden by the 2023-08-15
configuration.
default:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
2023-08-15:
validate_model_data:
col_timediff:
fn: "opt_check_tbl_col_timediff"
pkg: "hubValidations"
args:
t0_colname: "forecast_date"
t1_colname: "target_end_date"
timediff: !expr lubridate::weeks(1)
Available optional functions
hubValidations
includes a number of optional checks or
checks that require administrator configuration to be run, detailed
below.
For more detail on each function and its configuration parameters, consult the function documentation.
For deploying through validate_model_data
check fun | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
opt_check_tbl_col_timediff | Time difference between values in two date columns equal a defined period. | FALSE | check_failure | |
opt_check_tbl_counts_lt_popn | Predicted values per location are less than total location population. | FALSE | check_failure | |
opt_check_tbl_horizon_timediff | Time difference between values in two date columns equals a defined time period defined by values in a horizon column. | FALSE | check_failure |
For deploying through validate_model_metadata
check fun | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
opt_check_metadata_team_max_model_n | The number of metadata files submitted by a single team does not exceed the maximum number allowed. | FALSE | check_failure |
Managing dependencies of custom functions
If any custom functions you are deploying depend on additional packages, you will need to ensure these packages are available during validation.
Available dependencies
All hubValidations
exported functions are
available for use in your custom check functions as well as
functions from hubverse packages huUtils
,
hubAdmin
and
hubData
.
In addition, functions in packages from the
hubValidations
dependency tree are also generally
available, both locally (once hubValidations
is
installed) and in the hubverse validate-submission
GitHub
Action.
Functions from these packages can be used in your custom checks without specifying them as additional dependencies.
Additional dependencies
If any custom functions you are deploying depend on additional packages, you will need to ensure these packages are available during validation.
The simplest way to ensure they are available is to edit the
setup-r-dependencies
step in the
hubverse-actions
validate-submission.yaml
GitHub Action workflow of your hub and add any additional dependency to
the packages
field list.
In the following pseudo example we add additionalPackage
package to the list of standard dependencies:
- uses: r-lib/actions/setup-r-dependencies@v2
with:
packages: |
any::hubValidations
any::sessioninfo
any::additionalPackage
Note that this ensures the additional dependency is available during
validation on GitHub but does not guarantee it will be installed locally
for hub administrators or submitting teams. Indeed such missing
dependencies could lead to execution errors in custom checks when
running validate_submission()
locally.
You could use documentation, like your hub’s README to communicate additional required dependencies for validation to submitting teams. Even better, you could add a check to the top of your function to catch missing dependencies and provide a helpful error message to the user.
Deploying custom functions as a package
To deploy custom functions managed as a package in
src/validations
, you can use the pkg
configuration property in the validations.yml
file to
specify the package namespace.
For example, if you have created a simple package in
src/validations/
with a
cstm_check_tbl_example.R
script containing the
specification of an cstm_check_tbl_example()
function in
src/validations/R
, you can use the following configuration
in your validation.yml
file to source the function from the
installed validations
package namespace:
default:
validate_model_data:
custom_check:
fn: "cstm_check_tbl_example"
pkg: "validations"
To ensure the package (and any additional dependencies it depends on)
is installed and available during validation, you must add the package
to the setup-r-dependencies
step in the
hubverse-actions
validate-submission.yaml
GitHub Action workflow of your hub like so: