Skip to contents

Create time-series target data file schema

Usage

create_timeseries_schema(hub_path, date_col = NULL, na = c("NA", ""))

Arguments

hub_path

Either a character string path to a local Modeling Hub directory or an object of class <SubTreeFileSystem> created using functions s3_bucket() or gs_bucket() by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the arrow package. The hub must be fully configured with valid admin.json and tasks.json files within the hub-config directory.

date_col

Optional column name to be interpreted as date. Default is NULL. Useful when the required date column is a partitioning column in the target data and does not have the same name as a date typed task ID variable in the config.

na

A character vector of strings to interpret as missing values. Only applies to CSV files. The default is c("NA", ""). Useful when actual character string "NA" values are used in the data. In such a case, use empty cells to indicate missing values in your files and set na = "".

Value

an arrow <schema> class object

Examples

#' # Clone example hub
tmp_hub_path <- withr::local_tempdir()
example_hub <- "https://github.com/hubverse-org/example-complex-forecast-hub.git"
gert::git_clone(url = example_hub, path = tmp_hub_path)
# Create target time-series schema
create_timeseries_schema(tmp_hub_path)
#> Schema
#> date: date32[day]
#> target: string
#> location: string
#> observation: double
#  target time-series schema from a cloud hub
s3_hub_path <- s3_bucket("example-complex-forecast-hub")
create_timeseries_schema(s3_hub_path)
#> Schema
#> date: date32[day]
#> target: string
#> location: string
#> observation: double