
Validate dataset level properties of a given target type
Source:R/validate_target_dataset.R
validate_target_dataset.Rd
Validate dataset level properties of a given target type
Usage
validate_target_dataset(
hub_path,
target_type = c("time-series", "oracle-output"),
validations_cfg_path = NULL,
round_id = "default"
)
Arguments
- hub_path
Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>
created using functionss3_bucket()
orgs_bucket()
by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrow
package. The hub must be fully configured with validadmin.json
andtasks.json
files within thehub-config
directory.- target_type
Type of target data to retrieve matching files. One of "time-series" or "oracle-output". Defaults to "time-series".
- validations_cfg_path
Path to YAML file configuring custom validation checks. If
NULL
defaults to standardhub-config/validations.yml
path. For more details see article on custom validation checks.- round_id
Character string. Not generally relevant to target datasets but can be used to specify a specific block of custom validation checks. Otherwise best set to
"default"
which will deploy the default custom validation checks.
Value
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations>
objects, including
how to access more information on individual checks,
see article on <hub_validations>
S3 class objects.
Details
Details of checks performed by validate_target_dataset()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
target_dataset_exists | Target dataset can be successfully detected for a given target type. | TRUE | check_error | |
target_dataset_unique | A single unique target dataset exists for a given target type. | TRUE | check_error | |
target_dataset_file_ext_unique | All files of a given target type share a single unique file format. | TRUE | check_error | |
target_dataset_rows_unique | Target dataset rows are all unique. | FALSE | check_failure |
Examples
# Validate single file target datasets
hub_path <- system.file("testhubs/v5/target_file", package = "hubUtils")
validate_target_dataset(hub_path,
target_type = "time-series"
)
#>
#> ── time-series.csv ────
#>
#> ✔ [target_dataset_exists]: time-series dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#> time-series dataset.
#> ✔ [target_dataset_file_ext_unique]: time-series dataset files share single
#> unique file format.
#> ✔ [target_dataset_rows_unique]: time-series target dataset rows are unique.
validate_target_dataset(hub_path,
target_type = "oracle-output"
)
#>
#> ── oracle-output.csv ────
#>
#> ✔ [target_dataset_exists]: oracle-output dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#> oracle-output dataset.
#> ✔ [target_dataset_file_ext_unique]: oracle-output dataset files share single
#> unique file format.
#> ✔ [target_dataset_rows_unique]: oracle-output target dataset rows are unique.
# Validate multi-file partitioned target datasets
hub_path <- system.file("testhubs/v5/target_dir", package = "hubUtils")
validate_target_dataset(hub_path,
target_type = "time-series"
)
#>
#> ── time-series ────
#>
#> ✔ [target_dataset_exists]: time-series dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#> time-series dataset.
#> ✔ [target_dataset_file_ext_unique]: time-series dataset files share single
#> unique file format.
#> ✔ [target_dataset_rows_unique]: time-series target dataset rows are unique.
validate_target_dataset(hub_path,
target_type = "oracle-output"
)
#>
#> ── oracle-output ────
#>
#> ✔ [target_dataset_exists]: oracle-output dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#> oracle-output dataset.
#> ✔ [target_dataset_file_ext_unique]: oracle-output dataset files share single
#> unique file format.
#> ✔ [target_dataset_rows_unique]: oracle-output target dataset rows are unique.