Check model output data tbl sample compound task id sets for each modeling task match or are coarser than the expected set defined in the config.
Source:R/check_tbl_spl_compound_taskid_set.R
check_tbl_spl_compound_taskid_set.Rd
This check detects the compound task ID sets of samples, implied by the output_type_id
and task ID values, and checks them for internal consistency and compliacance with
the compound_taskid_set
defined for each round modeling task in the tasks.json
config.
Usage
check_tbl_spl_compound_taskid_set(
tbl,
round_id,
file_path,
hub_path,
derived_task_ids = NULL
)
Arguments
- tbl
a tibble/data.frame of the contents of the file being validated. Column types must all be character.
- round_id
character string. The round identifier.
- file_path
character string. Path to the file being validated relative to the hub's model-output directory.
- hub_path
Either a character string path to a local Modeling Hub directory or an object of class
<SubTreeFileSystem>
created using functionss3_bucket()
orgs_bucket()
by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in thearrow
package. The hub must be fully configured with validadmin.json
andtasks.json
files within thehub-config
directory.- derived_task_ids
Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain
NA
s.
Value
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Details
If the check fails, the output of the check includes an errors
element,
a list of items, one for each modeling task failing validation.
The structure depends on the reason the check failed.
If the check failed because more that a single unique compound_taskid_set
was found
for a given model task, the errors
object will be a list with one element for each
compound_taskid_set
detected and will have the following structure:
tbl_comp_tids
: a compound task id set detected in the the tbl.output_type_ids
: The output type ID of the sample that does not contain a single, unique value for each compound task ID.
If the check failed because task IDs which is not allowed in the config, were identified
as compound task ID (i.e. samples describe "finer" compound modeling tasks)
for a given model task, the errors
object will be a list with the structure
described above as well as the additional following elements:
config_comp_tids
: the allowedcompound_taskid_set
defined in the modeling task config.invalid_tbl_comp_tids
: the names of invalid compound task IDs.
The name of each element is the index identifying the config modeling task the sample is associated with mt_id
.
See hubverse documentation on samples
for more details.