
Check target data rows are all unique
Source:R/check_target_tbl_rows_unique.R
check_target_tbl_rows_unique.RdCheck that there are no duplicate rows in target data files being validated.
Usage
check_target_tbl_rows_unique(
target_tbl,
target_type = c("time-series", "oracle-output"),
file_path,
hub_path,
config_target_data = NULL
)Arguments
- target_tbl
A tibble/data.frame of the contents of the target data file being validated.
- target_type
Type of target data to retrieve matching files. One of "time-series" or "oracle-output". Defaults to "time-series".
- file_path
A character string representing the path to the target data file relative to the
target-datadirectory.- hub_path
Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>created using functionss3_bucket()orgs_bucket()by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrowpackage. The hub must be fully configured with validadmin.jsonandtasks.jsonfiles within thehub-configdirectory.- config_target_data
Optional. A
target-data.jsonconfig object. If provided, validation uses deterministic schema from config. IfNULL(default), validation uses inference fromtasks.json.
Value
Depending on whether validation has succeeded, one of:
<message/check_success>condition class object.<error/check_failure>condition class object.
Returned object also inherits from subclass <hub_check>.
Details
Row uniqueness is determined by checking for duplicate combinations of key columns (excluding value columns).
With target-data.json config:
Columns to check are determined from the config's observable_unit
specification. For oracle-output data with output type IDs, the
output_type and output_type_id columns are also included in the
uniqueness check.
Without target-data.json config:
For time-series data, if versioned, multiple observations are allowed
so long as they have different as_of values. The as_of column is
therefore included when determining duplicates.
For oracle-output data, there should be only a single observation,
regardless of the as_of value, so the column is not included when
determining duplicates.