
Validate dataset level properties of a given target type
Source:R/validate_target_dataset.R
      validate_target_dataset.RdValidate dataset level properties of a given target type
Usage
validate_target_dataset(
  hub_path,
  target_type = c("time-series", "oracle-output"),
  validations_cfg_path = NULL,
  round_id = "default"
)Arguments
- hub_path
 Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>created using functionss3_bucket()orgs_bucket()by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrowpackage. The hub must be fully configured with validadmin.jsonandtasks.jsonfiles within thehub-configdirectory.- target_type
 Type of target data to retrieve matching files. One of "time-series" or "oracle-output". Defaults to "time-series".
- validations_cfg_path
 Path to YAML file configuring custom validation checks. If
NULLdefaults to standardhub-config/validations.ymlpath. For more details see article on custom validation checks.- round_id
 Character string. Not generally relevant to target datasets but can be used to specify a specific block of custom validation checks. Otherwise best set to
"default"which will deploy the default custom validation checks.
Value
An object of class hub_validations. Each named element contains
a hub_check class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations> objects, including
how to access more information on individual checks,
see article on <hub_validations> S3 class objects.
Details
Details of checks performed by validate_target_dataset()
| Name | Check | Early return | Fail output | Extra info | 
|---|---|---|---|---|
| target_dataset_exists | Target dataset can be successfully detected for a given target type. | TRUE | check_error | |
| target_dataset_unique | A single unique target dataset exists for a given target type. | TRUE | check_error | |
| target_dataset_file_ext_unique | All files of a given target type share a single unique file format. | TRUE | check_error | |
| target_dataset_rows_unique | Target dataset rows are all unique. | FALSE | check_failure | 
Examples
# Validate single file target datasets
hub_path <- system.file("testhubs/v5/target_file", package = "hubUtils")
validate_target_dataset(hub_path,
  target_type = "time-series"
)
#> 
#> ── time-series.csv ────
#> 
#> ✔ [target_dataset_exists]: time-series dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#>   time-series dataset.
#> ✔ [target_dataset_file_ext_unique]: time-series dataset files share single
#>   unique file format.
#> ✔ [target_dataset_rows_unique]: time-series target dataset rows are unique.
validate_target_dataset(hub_path,
  target_type = "oracle-output"
)
#> 
#> ── oracle-output.csv ────
#> 
#> ✔ [target_dataset_exists]: oracle-output dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#>   oracle-output dataset.
#> ✔ [target_dataset_file_ext_unique]: oracle-output dataset files share single
#>   unique file format.
#> ✔ [target_dataset_rows_unique]: oracle-output target dataset rows are unique.
# Validate multi-file partitioned target datasets
hub_path <- system.file("testhubs/v5/target_dir", package = "hubUtils")
validate_target_dataset(hub_path,
  target_type = "time-series"
)
#> 
#> ── time-series ────
#> 
#> ✔ [target_dataset_exists]: time-series dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#>   time-series dataset.
#> ✔ [target_dataset_file_ext_unique]: time-series dataset files share single
#>   unique file format.
#> ✔ [target_dataset_rows_unique]: time-series target dataset rows are unique.
validate_target_dataset(hub_path,
  target_type = "oracle-output"
)
#> 
#> ── oracle-output ────
#> 
#> ✔ [target_dataset_exists]: oracle-output dataset detected.
#> ✔ [target_dataset_unique]: target-data directory contains single unique
#>   oracle-output dataset.
#> ✔ [target_dataset_file_ext_unique]: oracle-output dataset files share single
#>   unique file format.
#> ✔ [target_dataset_rows_unique]: oracle-output target dataset rows are unique.