Create an arrow schema from a tasks.json
config file. For use when
opening an arrow dataset.
Arguments
- config_tasks
a list version of the content's of a hub's
tasks.json
config file created using functionhubUtils::read_config()
.- partitions
a named list specifying the arrow data types of any partitioning column.
- output_type_id_datatype
character string. One of
"from_config"
,"auto"
,"character"
,"double"
,"integer"
,"logical"
,"Date"
. Defaults to"from_config"
which uses the setting in theoutput_type_id_datatype
property in thetasks.json
config file if available. If the property is not set in the config, the argument falls back to"auto"
which determines theoutput_type_id
data type automatically from thetasks.json
config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (whereoutput_type_id
s areNA
,) are being collected by a hub, theoutput_type_id
column is assigned acharacter
data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerceoutput_type_id
to a data type that is not valid for the data (e.g. trying to coerce"character"
values to"double"
) will likely result in an error or potentially unexpected behaviour so use with care.- r_schema
Logical. If
FALSE
(default), return anarrow::schema()
object. IfTRUE
, return a character vector of R data types.
Value
an arrow schema object that can be used to define column datatypes when
opening model output data. If r_schema = TRUE
, a character vector of R data types.
Examples
hub_path <- system.file("testhubs/simple", package = "hubUtils")
config_tasks <- hubUtils::read_config(hub_path, "tasks")
schema <- create_hub_schema(config_tasks)