Skip to contents

The hubVis package contains a function called plot_step_ahead_model_output() that can be used to plot model output that is in the format of forecasts or projects that look multiple horizons into the future.

This function plots forecasts/scenario projections and optional target data. Faceted plots can be created for multiple scenarios, locations, forecast dates, models, etc. Currently, the function can plot only quantile data, with the possibility to add “median” information from the model projections.

For more information about the Hubverse standard format, please refer to the HubDocs website.

The following vignette describes the principal usage of the plot_step_ahead_model_output() function.

Plots are available in two output formats:

  • “interactive” format: a Plotly output object with interactive legend, hover text, zoom-in and zoom-out options, etc.
  • “static” format: a ggplot2 output object. By default, the output plot is “interactive”, but it can be changed to “static” by setting the interactive parameter to FALSE. See end of the document for examples.

Load and Filter Data

To demonstrate the functionality of the plot_step_ahead_model_output() function, we will use the examples data from the hubExamples package.

Scenario

  • scenario_outputs: example scenario projection data that represents model outputs and an ensemble (generated with hubEnsemble) from a scenario hub with predictions for one target (inc hosp) in one location ("US"), one round (“2021-03-07”) and four scenarios.

  • scenario_target_ts: contains time series target data associated with the scenario projection data.

Forecast

  • forecast_outputs: example forecast data that represents model outputs from a forecast hub with predictions for three influenza-related targets (wk inc flu hosp, wk flu hops rate category, and wk flu hosp rate) for two reference dates in 2022.

  • forecast_target_ts: contains time series target data associated with the forecast projection data.

Load data

library(hubExamples)

# Scenario examples
head(scenario_outputs)
#> # A tibble: 6 × 9
#>   model_id        origin_date scenario_id  location target   horizon output_type
#>   <chr>           <date>      <chr>        <chr>    <chr>      <int> <chr>      
#> 1 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 2 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 3 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 4 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 5 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 6 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>
head(scenario_target_ts)
#> # A tibble: 6 × 4
#>   location date       observation target  
#>   <chr>    <chr>            <int> <chr>   
#> 1 US       2020-10-03      300678 inc case
#> 2 US       2020-10-10      334493 inc case
#> 3 US       2020-10-17      388282 inc case
#> 4 US       2020-10-24      484422 inc case
#> 5 US       2020-10-31      571389 inc case
#> 6 US       2020-11-07      776479 inc case

# Forecast examples
head(forecast_outputs)
#> # A tibble: 6 × 9
#>   model_id    reference_date target horizon location target_end_date output_type
#>   <chr>       <date>         <chr>    <int> <chr>    <date>          <chr>      
#> 1 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 2 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 3 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 4 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 5 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 6 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
head(forecast_target_ts)
#> # A tibble: 6 × 3
#>   date       location observation
#>   <date>     <chr>          <dbl>
#> 1 2020-01-11 01                 0
#> 2 2020-01-11 15                 0
#> 3 2020-01-11 18                 0
#> 4 2020-01-11 27                 0
#> 5 2020-01-11 30                 0
#> 6 2020-01-11 37                 0

Data Preparation

The forecast and scenario output should be a model_out_tbl. In addition to the standard requirements for this class, the plot_step_ahead_model_output() function in hubVis has other requirement.

  • a Date column used for the x-axis of a “step ahead” plot. By default, the function expect a"target_date" column, although this could be over-ridden by specifying a different column using the x_col_name argument.
  • quantile and median are the only accepted output type
# Add a `target_date` column in the scenario example
projection_data <- dplyr::mutate(scenario_outputs,
                                 target_date = as.Date(origin_date) +
                                   (horizon * 7) - 1)
head(projection_data)
#> # A tibble: 6 × 10
#>   model_id        origin_date scenario_id  location target   horizon output_type
#>   <chr>           <date>      <chr>        <chr>    <chr>      <int> <chr>      
#> 1 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 2 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 3 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 4 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 5 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 6 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> # ℹ 3 more variables: output_type_id <dbl>, value <dbl>, target_date <date>
# Filter only `quantile` output type in the forecast example
forecast_quantile <- dplyr::filter(forecast_outputs, output_type == "quantile")
head(forecast_quantile)
#> # A tibble: 6 × 9
#>   model_id    reference_date target horizon location target_end_date output_type
#>   <chr>       <date>         <chr>    <int> <chr>    <date>          <chr>      
#> 1 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 2 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 3 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 4 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 5 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> 6 Flusight-b… 2022-11-19     wk in…       0 25       2022-11-19      quantile   
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>

Plot

The plotting function requires only 2 parameters:

  • model_out_tbl: a model_out_tbl object containing all the Hubverse standard columns, including "target_date" and "model_id" columns. As all model_output in model_out_tbl will be plotted, any filtering needs to happen outside this function.

  • target_data: a data.frame object containing the target data, including the columns: "date" and "observation".

“Simple” plot

The projection_data and target_data contain information for multiple locations, and scenarios.

Scenario

To plot the model projections for the US, Scenario A:

# Pre-filtering
projection_data_a_us <- dplyr::filter(projection_data,
                                      scenario_id == "A-2021-03-05",
                                      location == "US")

# Limit date for layout reason
target_data_us <-
  dplyr::filter(scenario_target_ts, location == "US",
                date < min(projection_data$target_date) + 21,
                date > "2020-10-01")
plot_step_ahead_model_output(projection_data_a_us, target_data_us)

By default, the 50%, 80% and 95% intervals are plotted, with a specific color palette per model_id.

In general, it is hard to see multiple intervals when multiple models are plotted, so specifying only one interval can be useful:

plot_step_ahead_model_output(projection_data_a_us, target_data_us,
                             intervals = 0.8)

It is also possible to add a median line on the plot with the use_median_as_point parameter:

plot_step_ahead_model_output(projection_data_a_us, target_data_us,
                             intervals = 0.8,
                             use_median_as_point = TRUE)

By default plots are interactive, but that can be easily switched to static:

plot_step_ahead_model_output(projection_data_a_us, target_data_us,
                             intervals = 0.8,
                             use_median_as_point = TRUE,
                             interactive = FALSE)

Forecast

To plot the forecast projections for one reference dates (2022-11-19) for Massachusetts (25).

# Pre-filtering
forecast_quantile <- dplyr::mutate(forecast_quantile,
                                   output_type_id = as.numeric(output_type_id))
forecast_quantile_1 <- dplyr::filter(forecast_quantile,
                                     reference_date == "2022-11-19",
                                     location == 25)

# Limit date for layout reason
forecast_target_ma <- dplyr::filter(forecast_target_ts, location == 25,
                                    date >= "2022-11-01",
                                    date <= "2023-01-01")

As the forecast projections used the column target_end_date and contains the quantiles: "0.05", "0.1", "0.25", "0.5", "0.75", "0.9", "0.95", the parameters in the plot_step_ahead_model_output() need to be ajusted:

plot_step_ahead_model_output(forecast_quantile_1, forecast_target_ma,
                             intervals = c(0.9, 0.5),
                             use_median_as_point = TRUE,
                             x_col_name = "target_end_date")

Facet plot

Scenario

A “facet” (or subplot) plot can also be created for each scenario

# Pre-filtering
projection_data_us <- dplyr::filter(projection_data,
                                    location == "US")
plot_step_ahead_model_output(projection_data_us, target_data_us,
                             facet = "scenario_id")

The layout of the “facets” can be adjusted, with the different facet_ parameters.

plot_step_ahead_model_output(projection_data_us, target_data_us,
                             use_median_as_point = TRUE,
                             facet = "scenario_id", facet_scales = "free_x",
                             facet_nrow = 2, facet_title = "bottom left")

Or with the additional facet_ncol parameter for the statics plot

plot_step_ahead_model_output(projection_data_us, target_data_us,
                             use_median_as_point = TRUE, interactive = FALSE,
                             facet = "scenario_id", facet_scales = "free_x",
                             facet_ncol = 4, facet_title = "bottom left")

A “facet” (or subplot) plot can also be created for each model. In this case, the legend will be adapted to return the model_id value.

plot_step_ahead_model_output(projection_data_a_us, target_data_us,
                             facet = "model_id")

The legend can be removed with the parameter show_legend = FALSE.

plot_step_ahead_model_output(projection_data_a_us, target_data_us,
                             facet = "model_id", show_legend = FALSE)