Score model output predictions
score_model_out.Rd
Scores model outputs with a single output_type
against observed data.
Usage
score_model_out(
model_out_tbl,
target_observations,
metrics = NULL,
summarize = TRUE,
by = "model_id",
output_type_id_order = NULL
)
Arguments
- model_out_tbl
Model output tibble with predictions
- target_observations
Observed 'ground truth' data to be compared to predictions
- metrics
Character vector of scoring metrics to compute. If
NULL
(the default), appropriate metrics are chosen automatically. See details for more.- summarize
Boolean indicator of whether summaries of forecast scores should be computed. Defaults to
TRUE
.- by
Character vector naming columns to summarize by. For example, specifying
by = "model_id"
(the default) will compute average scores for each model.- output_type_id_order
For ordinal variables in pmf format, this is a vector of levels for pmf forecasts, in increasing order of the levels. For all other output types, this is ignored.
Details
Default metrics are provided by the scoringutils
package. You can select
metrics by passing in a character vector of metric names to the metrics
argument.
The following metrics can be selected (all are used by default) for the
different output_type
s:
Quantile forecasts: (output_type == "quantile"
)
wis
overprediction
underprediction
dispersion
bias
interval_coverage_deviation
ae_median
"interval_coverage_XX": interval coverage at the "XX" level. For example, "interval_coverage_95" is the 95% interval coverage rate, which would be calculated based on quantiles at the probability levels 0.025 and 0.975.
See scoringutils::get_metrics.forecast_quantile for details.
Nominal forecasts: (output_type == "pmf"
and output_type_id_order
is NULL
)
log_score
(scoring for ordinal forecasts will be added in the future).
See scoringutils::get_metrics.forecast_nominal for details.
Median forecasts: (output_type == "median"
)
ae_point: absolute error of the point forecast (recommended for the median, see Gneiting (2011))
See scoringutils::get_metrics.forecast_point for details.
Mean forecasts: (output_type == "mean"
)
se_point
: squared error of the point forecast (recommended for the mean, see Gneiting (2011))
References
Making and Evaluating Point Forecasts, Gneiting, Tilmann, 2011, Journal of the American Statistical Association.
Examples
# compute WIS and interval coverage rates at 80% and 90% levels based on
# quantile forecasts, summarized by the mean score for each model
quantile_scores <- score_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(.data[["output_type"]] == "quantile"),
target_observations = hubExamples::forecast_target_observations,
metrics = c("wis", "interval_coverage_80", "interval_coverage_90"),
by = "model_id"
)
quantile_scores
#> model_id wis interval_coverage_80 interval_coverage_90
#> <char> <num> <num> <num>
#> 1: Flusight-baseline 329.4545 0.0 0.1250
#> 2: MOBS-GLEAM_FLUH 315.2393 0.5 0.5625
#> 3: PSI-DICE 227.9527 0.5 0.5000
# compute log scores based on pmf predictions for categorical targets,
# summarized by the mean score for each combination of model and location.
# Note: if the model_out_tbl had forecasts for multiple targets using a
# pmf output_type with different bins, it would be necessary to score the
# predictions for those targets separately.
pmf_scores <- score_model_out(
model_out_tbl = hubExamples::forecast_outputs |>
dplyr::filter(.data[["output_type"]] == "pmf"),
target_observations = hubExamples::forecast_target_observations,
metrics = "log_score",
by = c("model_id", "location", "horizon")
)
head(pmf_scores)
#> model_id location horizon log_score
#> <char> <char> <int> <num>
#> 1: Flusight-baseline 25 0 0.02107606
#> 2: Flusight-baseline 25 1 6.69652380
#> 3: Flusight-baseline 25 2 17.73313203
#> 4: Flusight-baseline 25 3 Inf
#> 5: Flusight-baseline 48 0 2.18418007
#> 6: Flusight-baseline 48 1 7.49960792