Forecast Evaluation Overview Why Evaluate Point Forecasts vs

Overview
Evaluating forecast distributions
„ Combining forecast distributions
„ Disseminating results
„ The DIME website
„ Further perspectives
„
Forecast Evaluation
LSE CATS Operational Weather
Risk Meeting @ GaTech
Point Forecasts vs
Ensemble Forecasts
Why Evaluate
Ensemble weather forecasts appear to be
invaluable to weather dependent business
„ Methods for handling and valuing ensemble
forecasts are subject of ongoing research
„
„
„
An ensemble forecast is a set of numbers:
London temperatures tomorrow:
23°
23°, 26°
26°, 27°
27°, 29°
29°, 31°
31°
LSE CATS Operational Weather
Risk Meeting @ GaTech
3
Temperature at London
Heathrow – Point Forecast
LSE CATS Operational Weather
Risk Meeting @ GaTech
A point forecast is a single number:
London temperature tomorrow: 25°
25°
The ultimate criterion is their usefulness to
end users
LSE CATS Operational Weather
Risk Meeting @ GaTech
2
4
Evaluating Point Forecasts
5
„
For classical point forecasts we can evaluate
the error:
London temperature tomorrow: 25°
25°
Reality turns out to be: 28°
28°
Error: 3°
3
„
A point forecast is considered good if the
error is small on average
LSE CATS Operational Weather
Risk Meeting @ GaTech
6
Temperature at London
Heathrow – Ensemble Forecast
Enhancing Point Forecasts –
Method of Dressing
„
Dressing point forecasts adds uncertainty
information
Probability
Point Forecast
LSE CATS Operational Weather
Risk Meeting @ GaTech
7
Temperature
Kernel width needs adjustment –
Many possible ways to do that!
8
Dressed Ensemble Forecast
Dressing Ensemble Forecasts
Dressing ensemble forecasts
Probability
„
Kernel Function
Temperature
www.dime.lse.ac.uk
How do we combine the probability densities?
LSE CATS Operational Weather
Risk Meeting @ GaTech
9
Combining Forecast Distributions
„
To quantify the potential usefulness of
different forecast distributions we need to
evaluate their skill
Take into account that the different
forecasts have different skills
Determine optimal combination by
looking at the skill, e.g. Ignorance
LSE CATS Operational Weather
Risk Meeting @ GaTech
10
Evaluating Forecast Distributions
There exist many different methods to
combine forecast distributions
„
LSE CATS Operational Weather
Risk Meeting @ GaTech
How do we compare reality (a single number)
with the forecast (a distribution)?
11
LSE CATS Operational Weather
Risk Meeting @ GaTech
12
Different Problems Require
Different Skill Scores!
What DIME does
DIME aims to investigate the skill of NWP’s
and dressing techniques using various skill
scores
„ DIME disseminates background information
„
A forecast can be
Specific (electricity output from a wind farm)
„ General (wind speed)
„
What is the best dressing method
for your application?
These require different measures of
skill
LSE CATS Operational Weather
Risk Meeting @ GaTech
Evaluating various skill scores of
operational NWP’s over a period of time
MODELS
MODEL 1
„
Ignorance Skill Brier Skill
Score
Score
4.5
0.25
„
Bet on the outcome of tomorrow’s weather
Tomorrow’s Temperature
…
Temperature
…
4 5 6 7 8 9 10 11 12
Spread wealth for safety
What do these skill scores mean?
LSE CATS Operational Weather
Risk Meeting @ GaTech
Restrict spread to make money
LSE CATS Operational Weather
Risk Meeting @ GaTech
15
When Is a Forecast Distribution
Good in Weather Roulette?
Skill Scores for Binary Event
Forecasts – The Brier Score
A good forecast distribution balances
between spread and accuracy
„ Criterion is:
„
A forecast distribution is better the more
money it yields in weather roulette
Forecast:
It freezes with probability p
It doesn’t with probability (1(1-p)
The ignorance reflects the expected rate of
wealth growth.
LSE CATS Operational Weather
Risk Meeting @ GaTech
16
Binary (yes/no) events are e.g.: Will it
freeze? Will precipitation exceed a
threshold? etc.
„ A forecast for a binary event:
„
„
14
Ignorance And Weather Roulette
Example of an Assessment of Skill
„
LSE CATS Operational Weather
Risk Meeting @ GaTech
13
„
17
Brier score reflects the quality of binary
event forecasts
LSE CATS Operational Weather
Risk Meeting @ GaTech
18
Comparing Forecasts
„
Skill evaluation done properly
Skill is a statistical quantity and needs
errorbars
Two different forecasts can be compared by
means of their ignorance
NWP
schemes
Ignorance Skill Brier Skill
Score
Score
NWP
schemes
…
MODEL 1
4.5
0.25
…
MODEL 2
3.8
0.3
…
Ignorance Skill Brier Skill
Score
Score
…
MODEL 1
4.5 ± 0.2
0.25 ± 0.03
…
MODEL 2
3.8 ± 0.1
0.3 ± 0.02
…
(errorbars will be suppressed from now on)
LSE CATS Operational Weather
Risk Meeting @ GaTech
Comparing Forecasts
Ignorance
Skill Score
20
Combining Forecasts
Dressing allows us to compare point
forecasts and ensemble forecasts
MODELS
LSE CATS Operational Weather
Risk Meeting @ GaTech
19
Brier Skill
Score
„
MODELS
…
MODEL 1 (Ens
.)
(Ens.)
4.5
0.25
…
MODEL 2 (Ens
.)
(Ens.)
3.8
0.3
…
MODEL 3
(Point Forecast)
5
0.6
…
LSE CATS Operational Weather
Risk Meeting @ GaTech
The table grows again…
MODEL 1 (Ens
.)
(Ens.)
MODEL 2 (Ens
.)
(Ens.)
Ignorance
Skill Score
4.5
3.8
Brier Skill
Score
0.25
0.3
…
MODEL 3
(Point Forecast)
5.0
0.6
…
MODEL 1 and
MODEL 2
3.2
0.2
…
LSE CATS Operational Weather
Risk Meeting @ GaTech
21
DIME Objectives
Dissemination of Results
Objectively compare operational NWP
model ensemble forecasts by their skill
„ Suggest schemes to combine ensemble
forecasts and evaluate the skill of these
schemes
„ Provide actual weather forecasts for specific
locations using a selection of our methods
„
„
LSE CATS Operational Weather
Risk Meeting @ GaTech
23
…
…
22
The medium to disseminate DIME results is
the internet
„ The web allows users to interactively
request the products they are most
interested in
www.dime.lse
.ac.uk
uk
www.dime.lse.ac.
LSE CATS Operational Weather
Risk Meeting @ GaTech
24