The Hubverse: Streamlining Collaborative Infectious Disease Modeling

US-RSE Conference 2025

7 October 2025

Background

❌ The problem

Infectious disease modeling has scaled rapidly…

  • But the landscape is fragmented:
    • Inconsistent formats
    • Redundant or conflicting forecasts
    • Lack of coordination between modelers and stakeholders

“Comparing the accuracy of forecasting applications is difficult because forecasting methods, forecast outcomes, and reported validation metrics varied widely.”

✨ The promise of modeling hubs

Modeling hubs coordinate collaborative forecasting:

  • Provide centralised location for effort coordination

  • Define data standards and modeling targets

  • Improve transparency and comparability

  • Aggregate forecasts enabling ensembles

  • Facilitate timely public health decision-making


“Collaborative Hubs: Making the Most of Predictive Epidemic Modeling”, American Journal of Public Health Reich, et al. 2022

🕰️ Project origins

  • Pre-COVID: Forecasting code base existed for CDC influenza hubs
  • During COVID: That code was reused for new COVID-19 hubs + demand internationally (e.g. Europe) for similar setups
  • ❗ Problem: Each hub required manual editing of source code

➡️ Need for generalisation, modularity, and configurability

Timeline of forecasting hub development

Figure credits: Alex Vespignani and Nicole Samay

🌐 Enter the hubverse

An open-source software ecosystem to power modeling hubs:

  • GitHub repositories for centralising hub activity
  • Data standards for infectious disease modeling data
  • Schema-driven configuration for modeling tasks + hub setup
  • Modular tools for validation, access, evaluation, ensembling, communication and hub administration
  • Supports full lifecycle: from hub set up, data submission to decision-making

Hubverse overview

☑️ Standardised Data

Modeling hubs are built around a shared data standard:

  • Modeling task definition: targets (response variables), standard predictors, output types (e.g. mean, quantiles)
  • Structured hub layout: consistent file system for organizing submissions
  • Standard model output format: for file content and naming

✅ Enables comparability, validation, and streamlined data access

⚙️ Config-driven hub setup

Hub administrators configure hubs using structured JSON config files:

  • admin.json: hub-level metadata.
  • task.json: modeling task specification:
    • Task IDs: Targets (response), horizons, locations (predictors) etc.
    • Output types: accepted model outputs e.g. mean, median, quantiles, cdf, pmf, samples.
  • Configs are validated against a shared JSON schema

📦 The R (and friends) package stack

The hubverse package ecosystem is organized by role. Each tool is designed to support a particular group of users in the hub workflow.

Hub roles

  • 🛠️ Hub administrators
  • 🔬 Modelers
  • 📊 Analysts
  • 🏛️ Policy makers

Tools & packages

  • hubAdmin : config creation + validation 🛠️

  • hubValidations : submission checks (structure, schema, content) 🔬 🛠️

  • hubData (R) / hubdata (Python) : access multi-file model output via Arrow 🛠️ 🔬 📊 🏛️

  • hubEvals : compute evaluation metrics 🛠️ 📊 🏛️

  • hubEnsembles : build weighted/unweighted ensembles 🛠 📊 🏛️

  • hubVis : visualise model outputs 🛠️ 📊 🏛️

📊 Dashboards & communication

  • Built with Quarto so easily customisable via Quarto configuration
  • Deployed as a fully static site , no backend required
  • Powered by JSON data prepared via GitHub workflows
  • Interactive UI built with client-side JavaScript (fast!)
  • New instances can be set up by copying/configuring the hub-dashboard-template

☁️ Cloud hub storage and access

  • Hubs mirrored to public AWS S3 buckets
  • Multi-file data can be opened as Arrow datasets
  • Enables query-able data access via R 📦 hubData and Python 📦 hubdata.

🔁 GitHub workflows

We automate everything we can:

  • ✅ PR-level model output validation
  • ✅ Hub configuration validation
  • ☁️ AWS Cloud hub data synching
  • 📊 dashboard data regeneration and model evaludation with each update

All hubverse actions stored in the hubverse-actions repo and can be installed with hubCI::use_hub_github_action()

🪩 List of adopting hubs

https://hubverse.io/community/hubs.html

🦠 Real-world example: CDC FluSight Hub

Real-world example: CDC FluSight Hub

https://github.com/cdcepi/FluSight-forecast-hub

Screenshot of CDC Flusight Hub Github repo

  • Used by US CDC to monitor influenza severity
  • Weekly forecasts from 40 teams across 70 different models.
  • Hosted on GitHub + S3 cloud mirror.
  • Managed using full hubverse stack since 2023/2024 season.

📁 File structure: model output (CDC FluSight)

Model outputs committed by teams to versioned directories > one directory per model > one file per modeling round.

Screenshot of flusight hub model output files

✅ Model output validation with hubValidations

Model outputs submitted through PRs and validated through GitHub Actions

Screenshot of flusight hub model submission PRs

screenshot of flusight hub model submission validation results

📂 Accessing model output via hubData

Connect to Arrow dataset of forecast submissions

library(hubData)

hub_path <- s3_bucket(
  "cdcepi-flusight-forecast-hub"
)
hub_con <- connect_hub(
  hub_path,
  skip_checks = TRUE
)
hub_con
hub_connection
9 columns
reference_date: date32[day]
target: string
horizon: int32
target_end_date: date32[day]
location: string
output_type: string
output_type_id: string
value: double
model_id: string

Query and collect data

# Filter for one model and forecast date using dplyr
library(dplyr)
hub_con |>
  filter(
    model_id == "CADPH-FluCAT_Ensemble",
    target_end_date == "2023-10-28"
  ) |>
  collect_hub()
# A tibble: 92 × 9
   model_id   reference_date target horizon target_end_date location output_type
 * <chr>      <date>         <chr>    <int> <date>          <chr>    <chr>      
 1 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 2 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 3 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 4 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 5 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 6 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 7 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 8 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
 9 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
10 CADPH-Flu… 2023-10-14     wk in…       2 2023-10-28      06       quantile   
# ℹ 82 more rows
# ℹ 2 more variables: output_type_id <chr>, value <dbl>

See more in Accessing data vignette.

Python analogue hub-data also available.

🌐 Ensembling with hubEnsembles

Combine models using simple or weighted rules

forecast_df <- hub_con |>
  filter(
    model_id %in%
      c(
        "CADPH-FluCAT_Ensemble",
        "CEPH-Rtrend_fluH",
        "CFA_Pyrenew-Pyrenew_HE_Flu"
      ),
    output_type == "quantile"
  ) |>
  collect_hub()


hubEnsembles::simple_ensemble(
  forecast_df, 
  agg_fun = median,
  model_id = "simple-ensemble-median"
)
# A tibble: 282,716 × 9
   model_id   reference_date target horizon target_end_date location output_type
 * <chr>      <date>         <chr>    <int> <date>          <chr>    <chr>      
 1 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 2 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 3 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 4 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 5 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 6 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 7 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 8 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
 9 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
10 simple-en… 2023-10-14     wk in…      -1 2023-10-07      01       quantile   
# ℹ 282,706 more rows
# ℹ 2 more variables: output_type_id <chr>, value <dbl>

📈 Dashboard - forecasts

🩺 Dashboard - model evaluations

Evaluates forecasts against target (observed) data.

💡 Lessons & wider relevance

  • Standards + automation reduce friction
  • 🧰 Open source keeps it free & accessible
  • 🏥 Collaborative infrastructure empowers public health
  • 🌍 Standardised, open data fuels downstream use cases like training, education, and reproducible research

🙏 Thank you!

Tip

Interested in getting involved in the community? Check out our Getting Involved page!