Skip to contents

Introduction

The hubEnsembles package provides a flexible framework for aggregating model outputs, such as forecasts or projections, that are submitted to a hub by multiple models and combined into ensemble model outputs. The package includes two main functions: simple_ensemble and linear_pool. We illustrate these functions in this vignette, and briefly compare them.

This vignette uses the following R packages:

Example data: a forecast hub

We will use an example hub provided by the hubverse to demonstrate the functionality of the hubEnsembles package. This example hub was generated with modified forecasts from the FluSight forecasting challenge, a collaborative modeling exercise run by the US Centers for Disease Control and Prevention (CDC) since 2013 that solicits seasonal influenza forecasts from outside modeling teams. The example hub includes both example model output data and target data (sometimes known as “truth” data), which are stored in the hubExamples package as data objects named forecast_outputs and forecast_target_ts. Note that the toy model outputs contain predictions for only a small subset rows of select dates, locations, and output type IDs, far fewer than an actual modeling hub would typically collect.

The model output data includes mean, median, quantile, and sample forecasts of future incident influenza hospitalizations; as well as CDF and PMF forecasts of hospitalization intensity (the latter made up of categories determined by threshold of weekly hospital admissions per 100,000 population). Each forecast is made for five task ID variables, including the location for which the forecast was made (location), the date on which the forecast was made (reference_date), the number of steps ahead (horizon), the date of the forecast prediction (a combination of the date the forecast was made and the forecast horizon, target_end_date), and the forecast target (target). Below we print a subset of this example model output.

otid <- list(
  mean = NA,
  median = NA,
  quantile = c(0, 0.25, 0.75),
  sample = c("2101", "2102", "2103"),
  pmf = c("low", "moderate", "high", "very high"),
  cdf = c(1, 13, 15)
)

hubExamples::forecast_outputs |>
  dplyr::filter(
    output_type_id %in% unlist(otid),
    reference_date == "2022-12-17",
    location == "25",
    horizon == 1
  ) |>
  dplyr::arrange(model_id, dplyr::desc(target), output_type) |>
  print(n = 16)
#> # A tibble: 48 × 9
#>    model_id          reference_date target                    horizon location target_end_date output_type output_type_id   value
#>    <chr>             <date>         <chr>                       <int> <chr>    <date>          <chr>       <chr>            <dbl>
#>  1 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      mean        NA             5.82e+2
#>  2 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      median      NA             5.82e+2
#>  3 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      quantile    0.25           5.66e+2
#>  4 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      quantile    0.75           5.98e+2
#>  5 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      sample      2101           6.06e+2
#>  6 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      sample      2102           5.76e+2
#>  7 Flusight-baseline 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      sample      2103           5.78e+2
#>  8 Flusight-baseline 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         low            9.70e-6
#>  9 Flusight-baseline 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         moderate       2.94e-3
#> 10 Flusight-baseline 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         high           7.35e-2
#> 11 Flusight-baseline 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         very high      9.24e-1
#> 12 Flusight-baseline 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         0.25           8.63e-9
#> 13 Flusight-baseline 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         0.75           4.83e-8
#> 14 Flusight-baseline 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         1              1.10e-7
#> 15 Flusight-baseline 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         13             1.00e+0
#> 16 Flusight-baseline 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         15             1.00e+0
#> # ℹ 32 more rows

We also have corresponding target data included in the hubExamples package. The example target data provide observed incident influenza hospitalizations (observation) in a given week (date) and for a given location (location). This target data could be used as calibration data for generating forecasts or for evaluating the forecasts post hoc. The forecast-specific task ID variables reference_date and horizon are not relevant for the target data.

head(hubExamples::forecast_target_ts, 10)
#> # A tibble: 10 × 3
#>    date       location observation
#>    <date>     <chr>          <dbl>
#>  1 2020-01-11 01                 0
#>  2 2020-01-11 15                 0
#>  3 2020-01-11 18                 0
#>  4 2020-01-11 27                 0
#>  5 2020-01-11 30                 0
#>  6 2020-01-11 37                 0
#>  7 2020-01-11 48                 0
#>  8 2020-01-11 US                 1
#>  9 2020-01-18 01                 0
#> 10 2020-01-18 15                 0

Creating ensembles with simple_ensemble

The simple_ensemble() function directly computes an ensemble from component model outputs by combining them via some function within each unique combination of task ID variables, output types, and output type IDs. This function can be used to summarize predictions of output types mean, median, quantile, CDF, and PMF. The mechanics of the ensemble calculations are the same for each of the output types, though the resulting statistical ensembling method differs for different output types.

By default, simple_ensemble() uses the mean for the aggregation function and equal weights for all models, though the user can create different types of weighted ensembles by specifying an aggregation function and weights.

Using the default options for simple_ensemble(), we can generate an equally weighted mean ensemble for each unique combination of values for the task ID variables, the output_type and the output_type_id. This means different ensemble methods will be used for different output types: for the quantile output type in our example data, the resulting ensemble is a quantile average, while for the mean, CDF, and PMF output types the ensemble is a linear pool. The simple_ensemble() function does not support the sample output type, so we remove the sample predictions from the forecast model outputs.

mean_ens <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type != "sample") |>
  hubEnsembles::simple_ensemble(
    model_id = "simple-ensemble-mean"
  )

The resulting model output has the same structure as the original model output data, with columns for model ID, task ID variables, output type, output type ID, and value. We also use model_id = "simple-ensemble-mean" to change the name of this ensemble in the resulting model output; if not specified, the default will be “hub-ensemble”. A subset of the predictions is printed below.

mean_ens |>
  dplyr::filter(
    output_type_id %in% unlist(otid),
    reference_date == "2022-12-17",
    location == "25",
    horizon == 1
  )
#> # A tibble: 13 × 9
#>    model_id             reference_date target                    horizon location target_end_date output_type output_type_id      value
#>    <chr>                <date>         <chr>                       <int> <chr>    <date>          <chr>       <chr>               <dbl>
#>  1 simple-ensemble-mean 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         0.25             0.000284
#>  2 simple-ensemble-mean 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         0.75             0.000556
#>  3 simple-ensemble-mean 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         1                0.000767
#>  4 simple-ensemble-mean 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         13               0.947   
#>  5 simple-ensemble-mean 2022-12-17     wk flu hosp rate                1 25       2022-12-24      cdf         15               0.977   
#>  6 simple-ensemble-mean 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         high             0.151   
#>  7 simple-ensemble-mean 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         low              0.00437 
#>  8 simple-ensemble-mean 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         moderate         0.0233  
#>  9 simple-ensemble-mean 2022-12-17     wk flu hosp rate category       1 25       2022-12-24      pmf         very high        0.821   
#> 10 simple-ensemble-mean 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      mean        NA             627.      
#> 11 simple-ensemble-mean 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      median      NA             620.      
#> 12 simple-ensemble-mean 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      quantile    0.25           542.      
#> 13 simple-ensemble-mean 2022-12-17     wk inc flu hosp                 1 25       2022-12-24      quantile    0.75           704.

Changing the aggregation function

We can change the function that is used to aggregate model outputs. For example, we may want to calculate a median of the component models’ submitted values for each quantile. We do so by specifying agg_fun = median.

median_ens <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type != "sample") |>
  hubEnsembles::simple_ensemble(
    agg_fun = median,
    model_id = "simple-ensemble-median"
  )

Custom functions can also be passed into the agg_fun argument. We illustrate this by defining a custom function to compute the ensemble prediction as a geometric mean of the component model predictions. Any custom function to be used must have an argument x for the vector of numeric values to summarize, and if relevant, an argument w of numeric weights.

geometric_mean <- function(x) {
  n <- length(x)
  prod(x)^(1 / n)
}
geometric_mean_ens <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type != "sample") |>
  hubEnsembles::simple_ensemble(
    agg_fun = geometric_mean,
    model_id = "simple-ensemble-geometric"
  )

As expected, the mean, median, and geometric mean each give us slightly different resulting ensembles. The median point estimates, 50% prediction intervals, and 90% prediction intervals in the figure below demonstrate this. Note that the geometric mean ensemble and simple mean ensemble generate similar estimates in this case of predicting weekly incident influenza hospitalizations in Massachusetts.

Weighting model contributions

We can weight the contributions of each model in the ensemble using the weights argument of simple_ensemble(). This argument takes a data.frame that should include a model_id column containing each unique model ID and a weight column. In the following example, we include the baseline model in the ensemble, but give it less weight than the other forecasts.

model_weights <- data.frame(
  model_id = c("MOBS-GLEAM_FLUH", "PSI-DICE", "Flusight-baseline"),
  weight = c(0.4, 0.4, 0.2)
)
weighted_mean_ens <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type != "sample") |>
  hubEnsembles::simple_ensemble(
    weights = model_weights,
    model_id = "simple-ensemble-weighted-mean"
  )
head(weighted_mean_ens, 10)
#> # A tibble: 10 × 9
#>    model_id                      reference_date target           horizon location target_end_date output_type output_type_id  value
#>    <chr>                         <date>         <chr>              <int> <chr>    <date>          <chr>       <chr>           <dbl>
#>  1 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.25           0.0129
#>  2 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.5            0.115 
#>  3 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.75           0.546 
#>  4 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1              0.805 
#>  5 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.25           0.910 
#>  6 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.5            0.964 
#>  7 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.75           0.989 
#>  8 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10             1     
#>  9 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.25          1     
#> 10 simple-ensemble-weighted-mean 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.5           1

Creating ensembles with linear_pool

The linear_pool() function implements the linear opinion pool (LOP, also known as a distributional mixture) method (Stone 1961, Lichtendahl 2013) when ensembling predictions. This function can be used to combine predictions with output types mean, CDF, PMF, sample, and quantile. Unlike simple_ensemble(), this function handles its computation differently based on the output type. For the CDF, PMF, and mean output types, the linear pool method is equivalent to calling simple_ensemble() with a mean aggregation function, since simple_ensemble() produces a linear pool prediction (an average of individual model cumulative or bin probabilities).

For the sample output type, the LOP method pools the input sample predictions into a combined ensemble distribution. By default, the linear_pool() function will simply collect and return all provided samples, so that the number of samples for the ensemble is equal to the sum of the number of samples from all individual models. However, the user may also specify a number of sample predictions for the ensemble to return using the n_output_samples argument, in which case a random subset of predictions from individual models will be selected to create the linear pool of samples so that all component models are represented equally. This random selection of samples is stratified by model so that approximately the same number of samples from each individual model is included in the ensemble.

For the quantile output type, the linear_pool() function first must approximate a full probability distribution using the value-quantile level pairs from each component model. As a default, this is done with functions in the distfromq package, which defaults to fitting a monotonic cubic spline for the interior and a Gaussian normal distribution for the tails. Quasi-random samples are drawn from each distributional estimate, which are then collected and used to extract the desired quantiles from the final ensemble distribution.

Using the default options for linear_pool(), we can generate an equally-weighted linear pool for each of the output types in our example hub (except for the median output type, which must be excluded). The resulting distribution for the linear pool of quantiles is estimated using a default of n_samples = 1e4 quasi-random samples drawn from the distribution of each component model.

linear_pool_norm <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type != "median") |>
  hubEnsembles::linear_pool(model_id = "linear-pool-normal")
head(linear_pool_norm, 10)
#> # A tibble: 10 × 9
#>    model_id           reference_date target           horizon location target_end_date output_type output_type_id  value
#>    <chr>              <date>         <chr>              <int> <chr>    <date>          <chr>       <chr>           <dbl>
#>  1 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.25           0.0176
#>  2 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.5            0.118 
#>  3 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.75           0.550 
#>  4 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1              0.819 
#>  5 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.25           0.919 
#>  6 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.5            0.968 
#>  7 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.75           0.990 
#>  8 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10             1     
#>  9 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.25          1     
#> 10 linear-pool-normal 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.5           1

In the figure below, we compare ensemble results generated by simple_ensemble() and linear_pool() for model outputs of output types PMF and quantile. Panel A shows PMF type predictions of Massachusetts incident influenza hospitalization intensity while Panel B shows quantile type predictions of Massachusetts weekly incident influenza hospitalizations. As expected, the results from the two functions are equivalent for the PMF output type: for this output type, the simple_ensemble() method averages the predicted probability of each category across the component models, which is the definition of the linear pool ensemble method. This is not the case for the quantile output type, because the simple_ensemble() is computing a quantile average.

Weighting model contributions

Like with simple_ensemble(), we can change the default function settings. For example, weights that determine a model’s contribution to the resulting ensemble may be provided. (Note that we must exclude the sample output type here because it is not yet supported for weighted ensembles.)

model_weights <- data.frame(
  model_id = c("MOBS-GLEAM_FLUH", "PSI-DICE", "Flusight-baseline"),
  weight = c(0.4, 0.4, 0.2)
)
weighted_linear_pool_norm <- hubExamples::forecast_outputs |>
  dplyr::filter(!output_type %in% c("median", "sample")) |>
  hubEnsembles::linear_pool(
    weights = model_weights,
    model_id = "linear-pool-weighted"
  )
head(weighted_linear_pool_norm, 10)
#> # A tibble: 10 × 9
#>    model_id             reference_date target           horizon location target_end_date output_type output_type_id  value
#>    <chr>                <date>         <chr>              <int> <chr>    <date>          <chr>       <chr>           <dbl>
#>  1 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.25           0.0129
#>  2 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.5            0.115 
#>  3 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         0.75           0.546 
#>  4 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1              0.805 
#>  5 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.25           0.910 
#>  6 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.5            0.964 
#>  7 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         1.75           0.989 
#>  8 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10             1     
#>  9 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.25          1     
#> 10 linear-pool-weighted 2022-11-19     wk flu hosp rate       0 25       2022-11-19      cdf         10.5           1

Changing the parametric family used for extrapolation into distribution tails

We can also change the distribution that distfromq uses to approximate the tails of component models’ predictive distributions to either log normal or Cauchy using the tail_dist argument. This choice usually does not have a large impact on the resulting ensemble distribution, though, and can only be seen in its outer edges. (For more details and other function options, see the documentation in the distfromq package.)

linear_pool_lnorm <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type == "quantile") |>
  hubEnsembles::linear_pool(
    model_id = "linear-pool-lognormal",
    tail_dist = "lnorm"
  )
linear_pool_cauchy <- hubExamples::forecast_outputs |>
  dplyr::filter(output_type == "quantile") |>
  hubEnsembles::linear_pool(
    model_id = "linear-pool-cauchy",
    tail_dist = "cauchy"
  )

Ensembling samples from joint distributions

All output types will summarize predictions from marginal distributions. The sample output type is unique in that it can additionally represent predictions from joint predictive distributions. Hence, the sample representation may capture dependence across multiple combinations of values for modeled task ID variables when recording samples from a joint predictive distribution. In this case, samples with the same index (specified by the output_type_id) may be assumed to correspond to a single sample from a joint distribution across multiple combinations of values for the task ID variables.

The example data for the sample output type has task ID variables "reference_date", "location", "horizon", "target", and "target_end_date". In this example, the samples capture dependence across different forecast "horizon"s. However, the samples do not capture dependence across different "reference_date"s, "location"s, or "target"s.

When specifically requesting a linear pool made up of a subset of the input sample predictions, the user must identify the dependence structure using the compound_taskid_set parameter to ensure the resulting ensemble is valid. The compound task ID set consists of task ID variables that, together, identify a “compound modeling task” corresponding to a single modeled unit with a multivariate outcome of interest. Samples are expected to capture dependence across different outcomes within the same compound task.

For example, a compound task could be predicting the number of weekly incident influenza hospitalizations ("target") in Massachusetts ("location") starting on November 19, 2022 ("reference_date"). Here, "horizon" is not part of the compound task ID set, indicating that sample predictions made at each horizon depend on those for the other horizons within every compound task for the sample output type. Each sample can therefore be interpreted as a trajectory giving a possible path of hospitalizations over time.

If samples are not expected to capture dependence across the levels of any task ID variables (that is, we are summarizing predictions for marginal distributions), the compound_taskid_set consists of all task ID variables. Please see the hubverse documentation on samples for more information about compound tasks.

Thus, in this example, those three task id variables ("reference_date", "location", and "target") make up the compound task ID set that is specified in the call to linear_pool(). The "target_end_date" for a given forecast is derived from the combination of "reference_date" and "horizon", and so it is specified as the argument for derived_tasks.

hubExamples::forecast_outputs |>
  dplyr::filter(output_type == "sample") |>
  dplyr::mutate(output_type_id = as.numeric(output_type_id)) |> # make indices numeric for readability
  hubEnsembles::linear_pool(
    weights = NULL,
    model_id = "linear-pool-joint",
    task_id_cols = c("reference_date", "location", "horizon", "target", "target_end_date"),
    compound_taskid_set = c("reference_date", "location", "target"),
    derived_tasks = "target_end_date",
    n_output_samples = 100
  )
#> # A tibble: 1,600 × 9
#>    model_id          reference_date target          horizon location target_end_date output_type output_type_id value
#>  * <chr>             <date>         <chr>             <int> <chr>    <date>          <chr>                <int> <dbl>
#>  1 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   1     2
#>  2 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   2    47
#>  3 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   3    56
#>  4 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   4    47
#>  5 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   5    64
#>  6 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   6    55
#>  7 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   7    54
#>  8 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   8    56
#>  9 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                   9    58
#> 10 linear-pool-joint 2022-11-19     wk inc flu hosp       0 25       2022-11-19      sample                  10    36
#> # ℹ 1,590 more rows

Derived task IDs are another subset of task ID variables whose values are “derived” from a combination of the values from other task ID variables, which may or may not be part of the compound task ID set. In the example, the derived task ID "target_end_date" is not part of the compound task ID set because "reference_date" and "horizon" are not both part of the compound task ID set.

Generally, the derived task IDs are not needed to identify a single model unit with a multivariate outcome of interest (the purpose of the compound task id set), unless all of the task ID variables their values depend upon are already a part of the compound task ID set.

Not all model outputs will contain derived task IDs, in which case the argument may be set to NULL (the default value). However, it is important to provide the linear_pool() function with any derived task IDs, as they are used to check that the provided compound task ID set is compatible with the input sample predictions to help ensure the resulting (multivariate) ensemble is valid.