Utility functions for extracting properties from target-data.json configuration files (v6.0.0 schema). These functions handle defaults and inheritance patterns for target data configuration.
Usage
get_date_col(config_target_data)
get_observable_unit(
config_target_data,
dataset = c("time-series", "oracle-output")
)
get_versioned(config_target_data, dataset = c("time-series", "oracle-output"))
get_has_output_type_ids(config_target_data)
get_non_task_id_schema(config_target_data)
has_target_data_config(hub_path)
# Default S3 method
has_target_data_config(hub_path)
# S3 method for class 'SubTreeFileSystem'
has_target_data_config(hub_path)Arguments
- config_target_data
A target-data config object created by
read_config(hub_path, "target-data").- dataset
Character string specifying the dataset type: either
"time-series"or"oracle-output". Used for functions that extract dataset-specific properties.- hub_path
Path to a hub. Can be a local directory path or cloud URL (S3, GCS).
Value
get_date_col() returns a character string: the name of the date
column that stores the date on which observed data actually occurred.
get_observable_unit() returns a character vector: column names
whose unique value combinations define the minimum observable unit.
get_versioned() returns a logical value: whether the dataset is
versioned using as_of dates.
get_has_output_type_ids() returns a logical value: whether
oracle-output data has output_type and output_type_id columns (default
FALSE if not specified).
get_non_task_id_schema() returns a named list: key-value pairs of
non-task ID column names and their data types, or NULL if not specified.
has_target_data_config() returns a logical value: TRUE if the
target-data.json file exists in the hub-config directory of the hub,
FALSE otherwise.
Details
Inheritance and Defaults
Some properties can be specified at both the global level and the dataset level:
observable_unit: Dataset-specific values override global when specified, otherwise the global value is used.
versioned: Dataset-specific values override global when specified, otherwise inherits from global (default
FALSEif not specified anywhere).
Other properties are dataset-specific only:
has_output_type_ids: Only for oracle-output dataset (default
FALSE)non_task_id_schema: Only for time-series dataset (default
NULL)
Functions
get_date_col(): Get the name of the date column across hub data.get_observable_unit(): Get observable unit column names. Returns dataset-specific observable_unit if configured, otherwise falls back to global.get_versioned(): Get whether target data is versioned for the specified dataset. Returns dataset-specific setting if configured, otherwise inherits from global (defaultFALSEif not specified).get_has_output_type_ids(): Get whether oracle-output data has output_type/output_type_id columns.get_non_task_id_schema(): Get the schema for non-task ID columns in time-series data.has_target_data_config(): Check if target data config file exists in hub.
Examples
hub_path <- system.file("testhubs/v6/target_dir", package = "hubUtils")
config <- read_config(hub_path, "target-data")
# Get the date column name
get_date_col(config)
#> [1] "target_end_date"
# Get observable unit (uses dataset-specific or falls back to global)
get_observable_unit(config, dataset = "time-series")
#> [1] "target_end_date" "target" "location"
get_observable_unit(config, dataset = "oracle-output")
#> [1] "target_end_date" "target" "location"
# Get versioned setting (inherits from global if not specified)
get_versioned(config, dataset = "time-series")
#> [1] FALSE
# Get oracle-output specific property
get_has_output_type_ids(config)
#> [1] TRUE
# Get time-series specific property
get_non_task_id_schema(config)
#> NULL
# Check if target data config exists
has_target_data_config(hub_path)
#> /home/runner/work/_temp/Library/hubUtils/testhubs/v6/target_dir/hub-config/target-data.json
#> TRUE
no_config_hub <- system.file("testhubs/v5/target_file/", package = "hubUtils")
has_target_data_config(no_config_hub)
#> /home/runner/work/_temp/Library/hubUtils/testhubs/v5/target_file/hub-config/target-data.json
#> FALSE
