The hubVis
package contains a function called
plot_step_ahead_model_output()
that can be used to plot
model output that is in the format of forecasts or projects that look
multiple horizons into the future.
This function plots forecasts/scenario projections and optional target data. Faceted plots can be created for multiple scenarios, locations, forecast dates, models, etc. Currently, the function can plot only quantile data, with the possibility to add “median” information from the model projections.
For more information about the Hubverse standard format, please refer to the HubDocs website.
The following vignette describes the principal usage of the
plot_step_ahead_model_output()
function.
Plots are available in two output formats:
- “interactive” format: a Plotly output object with interactive legend, hover text, zoom-in and zoom-out options, etc.
- “static” format: a ggplot2 output object. By
default, the output plot is “interactive”, but it can be changed to
“static” by setting the
interactive
parameter to FALSE. See end of the document for examples.
Load and Filter Data
To demonstrate the functionality of the
plot_step_ahead_model_output()
function, we will use the
examples data from the hubExamples
package.
Scenario
scenario_outputs
: example scenario projection data that represents model outputs and an ensemble (generated withhubEnsemble
) from a scenario hub with predictions for one target (inc hosp
) in one location ("US"
), one round (“2021-03-07”) and four scenarios.scenario_target_ts
: contains time series target data associated with the scenario projection data.
Forecast
forecast_outputs
: example forecast data that represents model outputs from a forecast hub with predictions for three influenza-related targets (wk inc flu hosp, wk flu hops rate category, and wk flu hosp rate) for two reference dates in 2022.forecast_target_ts
: contains time series target data associated with the forecast projection data.
Load data
library(hubExamples)
# Scenario examples
head(scenario_outputs)
#> # A tibble: 6 × 9
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <date> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>
head(scenario_target_ts)
#> # A tibble: 6 × 4
#> location date observation target
#> <chr> <chr> <int> <chr>
#> 1 US 2020-10-03 300678 inc case
#> 2 US 2020-10-10 334493 inc case
#> 3 US 2020-10-17 388282 inc case
#> 4 US 2020-10-24 484422 inc case
#> 5 US 2020-10-31 571389 inc case
#> 6 US 2020-11-07 776479 inc case
# Forecast examples
head(forecast_outputs)
#> # A tibble: 6 × 9
#> model_id reference_date target horizon location target_end_date output_type
#> <chr> <date> <chr> <int> <chr> <date> <chr>
#> 1 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 2 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 3 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 4 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 5 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 6 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
head(forecast_target_ts)
#> # A tibble: 6 × 3
#> date location observation
#> <date> <chr> <dbl>
#> 1 2020-01-11 01 0
#> 2 2020-01-11 15 0
#> 3 2020-01-11 18 0
#> 4 2020-01-11 27 0
#> 5 2020-01-11 30 0
#> 6 2020-01-11 37 0
Data Preparation
The forecast and scenario output should be a
model_out_tbl
. In addition to the standard requirements for
this class, the plot_step_ahead_model_output()
function in
hubVis
has other requirement.
- a Date column used for the x-axis of a “step ahead” plot. By
default, the function expect a
"target_date"
column, although this could be over-ridden by specifying a different column using thex_col_name
argument. -
quantile
andmedian
are the only accepted output type
# Add a `target_date` column in the scenario example
projection_data <- dplyr::mutate(scenario_outputs,
target_date = as.Date(origin_date) +
(horizon * 7) - 1)
head(projection_data)
#> # A tibble: 6 × 10
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <date> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 3 more variables: output_type_id <dbl>, value <dbl>, target_date <date>
# Filter only `quantile` output type in the forecast example
forecast_quantile <- dplyr::filter(forecast_outputs, output_type == "quantile")
head(forecast_quantile)
#> # A tibble: 6 × 9
#> model_id reference_date target horizon location target_end_date output_type
#> <chr> <date> <chr> <int> <chr> <date> <chr>
#> 1 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 2 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 3 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 4 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 5 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> 6 Flusight-b… 2022-11-19 wk in… 0 25 2022-11-19 quantile
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
Plot
The plotting function requires only 2 parameters:
model_out_tbl
: amodel_out_tbl
object containing all the Hubverse standard columns, including"target_date"
and"model_id"
columns. As all model_output in model_out_tbl will be plotted, any filtering needs to happen outside this function.target_data
: adata.frame
object containing the target data, including the columns:"date"
and"observation"
.
“Simple” plot
The projection_data
and target_data
contain
information for multiple locations, and scenarios.
Scenario
To plot the model projections for the US, Scenario A:
# Pre-filtering
projection_data_a_us <- dplyr::filter(projection_data,
scenario_id == "A-2021-03-05",
location == "US")
# Limit date for layout reason
target_data_us <-
dplyr::filter(scenario_target_ts, location == "US",
date < min(projection_data$target_date) + 21,
date > "2020-10-01")
plot_step_ahead_model_output(projection_data_a_us, target_data_us)
By default, the 50%, 80% and 95% intervals are plotted, with a
specific color palette per model_id
.
In general, it is hard to see multiple intervals when multiple models are plotted, so specifying only one interval can be useful:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8)
It is also possible to add a median line on the plot with the
use_median_as_point
parameter:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8,
use_median_as_point = TRUE)
By default plots are interactive, but that can be easily switched to static:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8,
use_median_as_point = TRUE,
interactive = FALSE)
Forecast
To plot the forecast projections for one reference dates (2022-11-19) for Massachusetts (25).
# Pre-filtering
forecast_quantile <- dplyr::mutate(forecast_quantile,
output_type_id = as.numeric(output_type_id))
forecast_quantile_1 <- dplyr::filter(forecast_quantile,
reference_date == "2022-11-19",
location == 25)
# Limit date for layout reason
forecast_target_ma <- dplyr::filter(forecast_target_ts, location == 25,
date >= "2022-11-01",
date <= "2023-01-01")
As the forecast projections used the column
target_end_date
and contains the quantiles:
"0.05"
, "0.1"
, "0.25"
,
"0.5"
, "0.75"
, "0.9"
,
"0.95"
, the parameters in the
plot_step_ahead_model_output()
need to be ajusted:
plot_step_ahead_model_output(forecast_quantile_1, forecast_target_ma,
intervals = c(0.9, 0.5),
use_median_as_point = TRUE,
x_col_name = "target_end_date")
Facet plot
Scenario
A “facet” (or subplot) plot can also be created for each scenario
# Pre-filtering
projection_data_us <- dplyr::filter(projection_data,
location == "US")
plot_step_ahead_model_output(projection_data_us, target_data_us,
facet = "scenario_id")
The layout of the “facets” can be adjusted, with the different
facet_
parameters.
plot_step_ahead_model_output(projection_data_us, target_data_us,
use_median_as_point = TRUE,
facet = "scenario_id", facet_scales = "free_x",
facet_nrow = 2, facet_title = "bottom left")
Or with the additional facet_ncol
parameter for the
statics plot
plot_step_ahead_model_output(projection_data_us, target_data_us,
use_median_as_point = TRUE, interactive = FALSE,
facet = "scenario_id", facet_scales = "free_x",
facet_ncol = 4, facet_title = "bottom left")
A “facet” (or subplot) plot can also be created for each model. In
this case, the legend will be adapted to return the
model_id
value.
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
facet = "model_id")
The legend can be removed with the parameter
show_legend = FALSE
.
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
facet = "model_id", show_legend = FALSE)