
Check target data rows are all unique
Source:R/check_target_tbl_rows_unique.R
check_target_tbl_rows_unique.Rd
Check that there are no duplicate rows in target data files being validated.
Usage
check_target_tbl_rows_unique(
target_tbl,
target_type = c("time-series", "oracle-output"),
file_path,
hub_path
)
Arguments
- target_tbl
A tibble/data.frame of the contents of the target data file being validated.
- target_type
Type of target data to retrieve matching files. One of "time-series" or "oracle-output". Defaults to "time-series".
- file_path
A character string representing the path to the target data file relative to the
target-data
directory.- hub_path
Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>
created using functionss3_bucket()
orgs_bucket()
by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrow
package. The hub must be fully configured with validadmin.json
andtasks.json
files within thehub-config
directory.
Value
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Details
If datasets are versioned, multiple observations are allowed in time-series
target data, so long as they have different as_of
values. The as_of
column
is therefore included when determining duplicates.
In oracle-output
data, there should be only a single observation,
regardless of the as_of
value so the column it is not be included when
determining duplicates.