
Check that a target data file has the correct column names according to target type
Source:R/check_target_tbl_colnames.R
check_target_tbl_colnames.RdCheck that a target data file has the correct column names according to target type
Usage
check_target_tbl_colnames(
target_tbl,
target_type = c("time-series", "oracle-output"),
file_path,
hub_path,
config_target_data = NULL,
date_col = NULL
)Arguments
- target_tbl
A tibble/data.frame of the contents of the target data file being validated.
- target_type
Type of target data to retrieve matching files. One of "time-series" or "oracle-output". Defaults to "time-series".
- file_path
A character string representing the path to the target data file relative to the
target-datadirectory.- hub_path
Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>created using functionss3_bucket()orgs_bucket()by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrowpackage. The hub must be fully configured with validadmin.jsonandtasks.jsonfiles within thehub-configdirectory.- config_target_data
Optional. A
target-data.jsonconfig object. If provided, validation uses deterministic schema from config. IfNULL(default), validation uses inference fromtasks.json.- date_col
Optional. Name of the date column in target data (e.g.,
"target_end_date") representing the date observations actually occurred. Only relevant when it is not a task ID defined intasks.json. Enables deterministic validation in inference mode. Ignored whenconfig_target_datais provided.
Value
Depending on whether validation has succeeded, one of:
<message/check_success>condition class object.<error/check_error>condition class object.
Returned object also inherits from subclass <hub_check>.
Details
Column name validation depends on whether a target-data.json configuration
file is provided:
With target-data.json config:
Expected columns are determined directly from the configuration. The target
table must contain exactly the columns defined in the config.
Without target-data.json config (inference mode):
Expected columns are inferred from the task ID configuration in tasks.json,
allowed columns according to the target type, and expectations based on the
detected output types in the target data. Additional optional columns
(e.g., as_of) are allowed for time-series data.
Note on date columns: Target data always contains a date column (e.g.,
target_end_date) representing when observations occurred. However, in
horizon-based forecast hubs, task IDs may only define origin_date
and horizon (with target dates calculated from these). In such cases,
provide date_col to enable deterministic validation of the date column
when it is not a valid task ID. Validation of date column existence and
type is performed by check_target_tbl_coltypes().
Inference mode validation for time-series data is limited. For robust
validation, create a target-data.json config file. See
target-data.json schema # nolint: line_length_linter.
for more information on the json schema scpecifics.