Load forecasts from zoltardata.com in hubverse format

collect_zoltar retrieves data from a zoltardata.com project and transforms it from Zoltar's native download format into a hubverse one. Zoltar (documentation here) is a pre-hubverse research project that implements a repository of model forecast results, including tools to administer, query, and visualize uploaded data, along with R and Python APIs to access data programmatically (zoltr and zoltpy, respectively.) (This hubData function is itself implemented using the zoltr package.)

Usage

collect_zoltar(
  project_name,
  models = NULL,
  timezeros = NULL,
  units = NULL,
  targets = NULL,
  types = NULL,
  as_of = NULL,
  point_output_type = "median"
)

Arguments

project_name: A string naming the Zoltar project to load forecasts from. Assumes the host is zoltardata.com .
models: A character vector that specifies the models to query. Must be model abbreviations. Defaults to NULL, which queries all models in the project.
timezeros: A character vector that specifies the timezeros to query. Must be yyyy-mm-dd format. Defaults to NULL, which queries all timezeros in the project.
units: A character vector that specifies the units to query. Must be unit abbreviations. Defaults to NULL, which queries all units in the project.
targets: A character vector that specifies the targets to query. Must be target names. Defaults to NULL, which queries all targets in the project.
types: A character vector that specifies the forecast types to query. Choices are "bin", "point", "sample", "quantile", "mean", and "median". Defaults to NULL, which queries all types in the project. Note: While Zoltar supports "named" and "mode" forecasts, this function ignores them.
as_of: A datetime string that specifies the forecast version. The datetime must include timezone information for disambiguation, without which the query will fail. The datatime parsing function used below (base::strftime) is extremely lenient when it comes to formatting, so please exercise caution. Defaults to NULL to load the latest version.
point_output_type: A string that specifies how to convert zoltar point forecast data to hubverse output type. Must be either "median" or "mean". Defaults to "median".

Value

A hubverse model_out_tbl containing the following columns: "model_id", "timezero", "season", "unit", "horizon", "target", "output_type", "output_type_id", and "value".

Details

Zoltar's data model differs from that of the hubverse in a few important ways. While Zoltar's model has the concepts of unit, target, and timezero, hubverse projects have hub-configurable columns, which makes the mapping from the former to the latter imperfect. In particular, Zoltar units translate roughly to hubverse task IDs, Zoltar targets include both the target outcome and numeric horizon in the target name, and Zoltar timezeros map to round ids. Finally, Zoltar's forecast types differ from those of the hubverse. Whereas Zoltar has seven types (bin, named, point, sample, quantile, mean, median, and mode), the hubverse has six (cdf, mean, median, pmf, quantile, sample), only some of which overlap.

Additional notes:

Requires the user to have a Zoltar account (use the Zoltar contact page to request one).
Requires Z_USERNAME and Z_PASSWORD environment vars to be set to those of the user's Zoltar account.
While Zoltar supports "named" and "mode" forecasts, this function ignores them.
Rows with non-numeric values are ignored.
This function removes numeric_horizon mentions from zoltar target names. Target names can contain a maximum of one numeric_horizon. Example: "1 wk ahead inc case" -> "wk ahead inc case".
Querying a large number of rows may cause errors, so we recommend providing one or more filtering arguments (e.g., models, timezeros, etc.) to limit the result.

Examples

if (FALSE) { # \dontrun{
df <- collect_zoltar("Docs Example Project")
df <-
  collect_zoltar("Docs Example Project", models = c("docs_mod"),
                        timezeros = c("2011-10-16"), units = c("loc1", "loc3"),
                        targets = c("pct next week", "cases next week"), types = c("point"),
                        as_of = NULL, point_output_type = "mean")
} # }