Two-sample Kalman filter and system error modelling for storm surge forecasting

Two-sample Kalman filter and system error modelling for
storm surge forecasting
Two-sample Kalman filter and system error modelling for
storm surge forecasting
Proefschrift
ter verkrijging van de graad van doctor
aan de Technische Universiteit Delft,
op gezag van de Rector Magnificus Prof. dr. ir. J.T. Fokkema,
voorzitter van het College voor Promoties,
in het openbaar te verdedigen op maandag 19 oktober 2009 om 12.30 uur
door
Julius Harry SUMIHAR
Wiskundig ingenieur
geboren te Bandung, Indonesi¨e.
Dit proefschrift is goedgekeurd door de promotor:
Prof. dr. A.W. Heemink
Copromotor:
Dr. M. Verlaan
Samenstelling Promotiecommissie:
Rector Magnificus,
Prof. dr. A.W. Heemink,
Dr. M. Verlaan,
Prof. dr. M.F.P. Bierkens,
Prof. dr. P.J. van Leeuwen,
Prof. dr. A. Mynett,
Prof. dr. K. Ponnambalam,
Dr. H. Madsen,
voorzitter
Technische Universiteit Delft, promotor
Technische Universiteit Delft, copromotor
Utrecht University
University of Reading, United Kingdom
UNESCO-IHE/Technische Universiteit Delft
University of Waterloo, Canada
DHI Group, Denmark
ISBN 978-90-8570-412-6
c 2009, by J.H. Sumihar, Delft Institute of Applied Mathematics, Delft University of
Copyright Technology, The Netherlands.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, without the prior written permission of the author.
Typesetting system: LATEX 2ε
Printed in The Netherlands by: W¨ohrmann Print Service
Acknowledgements
Many people have contributed to the completion of this thesis, directly or indirectly.
Therefore, here I would like to extend my gratitude to those who somehow have made it
possible for me to finish this thesis.
I’d like to extend a deep gratitude to Martin Verlaan, with whom I colaborated mostly
in carrying out this research. His enthusiasm has helped me to endure and to enjoy the
research process in the same time. He has guided me patiently throughout these years.
I would also like to thank Arnold Heemink for the opportunity he has given me to
pursue this PhD study. I thank him for his support. He has the patience to listen and
discuss any issues that I raised. It is very encouraging that even during his busy time, he
remains accessible. I thank him as well for the careful reading and constructive comments
on the manuscript.
I would like to acknowledge Bruce Hackett from the Norwegian Meteorological Institute for introducing one of the ideas explored in this research. I thank Øyvind Sætra
and Inger-Lisse Frogner for helping me provide information about LAMEPS. I acknowledge also Hans de Vries, Gerrit Burgers, and Niels Zweers from KNMI for their help and
discussion about HIRLAM.
I have also enjoyed technical support from Kees Lemmens, Eef Hartman, Carl Schneider, and Wim Tiwon. I thank them for their help with the computer systems. I thank Kees
Lemmens also for his inisiative of organizing lectures about programming and computational techniques. I would also like to thank Bas van ’t Hof from VORtech for his
encouragement and for helping me to solve some technical difficulties with WAQUA.
There were a lot of administrative works related to a PhD study. I have received significant help from Evelyn Sharabi and Dorothee Engering in handling many administrative
matters. They have always been ready to help and I appreciate them for that. I would
also like to thank Theda Olsder, Manon Post, Veronique van der Varst, Rene Tamboer,
and Paul Althaus from CICAT for their help. All their helps have contributed to my well
being and avoided me to be distracted from doing my research.
I would like to thank my former officemates: Loeky Haryanto, Nils van Velzen, Gosia
Kaleta and Remus Hanea for the helps, nice talks, and good musics. I thank Nils also for
being my main guide in understanding the detail of the code of WAQUA. I attribute mainly
to his assistance that I was able to use and modify WAQUA codes for my experiments. I
thank him as well for translating the summary and my propositions into Dutch (Later on
I learned that in doing the translation, he was helped by his colleges at VORtech: Jeroen
and Erwin. So, I thank them as well for their help).
I thank all other former and present PhD colleges for being part of the nice atmosphere. I would also like to extend special thanks to Eka Budiarto, Umer Althaf, Nils van
Velzen and Ivan Garcia for proofreading parts of this thesis. I thank all participants of the
weekly literature meeting for having shared their knowledge and for the meaningful and
fun discussions.
i
ii
The community of International Student Chaplaincy has been a big support for me.
Here, I’d like to thank Ben Engelbertink, Waltraut Stroh, Avin Kunnekkadan, Ruben
Abellon, Henk van der Vaart, and Ton. I extend a big thank to all the ISC choir members, the biggest support group for me since the first month of my stay in Delft. I thank
especially Mieke and Reini Knoppers, the main pillars of the choir. I’d also like to thank
ISC Rotterdam community, which at a later stage of my stay in The Netherlands, has
become an important group for me as well. I thank all other friends of mine, especially
Dwi, Xander, Eka, Dela, Iwa, Sinar, I Nengah Suparta, Loeky, and Ferry. I thank Xander
also for translating my summary into Dutch. I would also like to thank all my former
housemates for their cooperation and the good talks. I thank Mim, Carol Yu, Sarah Los,
Alex Peschanskyy and Umer Althaf (with his gesture) for their constructive comments on
the earlier design of the cover of this thesis. I would like also to thank all my friends in
Indonesia. I am fortunate to have had the opportunity to meet them at least once a year
during this period. Certainly, all these people have helped me enjoy my life and in turn,
helped maintain my stamina to carry out my research.
I thank Sandra Singgih for her courage, love, and for many other good reasons. I
would like to thank my family for the constant support they have been giving me. I thank
especially my mother, who has taught me the value of work. It is she who has initially
opened up the ways and opportunities that I find in life. I extend my deepest gratitute to
her for what she has done.
Delft, September 2009,
Julius H. Sumihar
Summary
Two directions for improving the accuracy of sea level forecast are investigated in
this study. The first direction seeks to improve the forecast accuracy of astronomical tide
component. Here, a method is applied to analyze and forecast the remaining periodic
components of harmonic analysis residual. This method is found to work reasonably well
during calm weather, but poorly during stormy period. This finding has led to continue
the study with the second direction, which is about data assimilation implemented into the
operational two-dimensional storm surge forecast model.
The operational storm surge forecast system in the Netherlands uses a steady-state
Kalman filter to provide more accurate initial conditions for forecast runs. An important
factor, which determines the success of a Kalman filter, is the specification of system error
covariance. In the operational system, the system error covariance is modelled explicitly
by assuming isotropy and homogeneity. In this study, we investigate the use of the difference between wind products of two similarly skillful atmospheric models as proxy to
the unknown error of the storm surge forecast model. To accommodate this investigation,
a new method for computing a steady-state Kalman gain, called the two-sample Kalman
filter, is developed in this study. It is an iterative procedure for computing the steadystate Kalman gain of a stochastic process by using two samples of the process. A number
of experiments have been performed to demonstrate that this algorithm produces correct
solutions and is potentially applicable to different models.
The two-sample Kalman filter algorithm is implemented by using the wind products
from two meteorological centers: the Royal Dutch Meteorological Institute (KNMI) and
UK Met Office (UKMO). Here, the investigation is focused on random component of the
system error. Therefore, bias or systematic error is eliminated prior to the implementation
of the wind products to the two-sample Kalman filter. The system error spatial correlation estimated from these two wind products is found to be anisotropic, in contrast to
the one assumed in the operational system. The steady-state Kalman filter based on this
error covariance estimate is found to work well in steering the model closer to the observation data. For the stations along the Dutch coast, the data assimilation is found to
improve the forecast accuracy up to about 12 hours. Moreover, it is also demonstrated that
this data assimilation system outperforms a steady-state Kalman filter based on isotropy
assumption.
To further improve the data assimilation system, the two-sample Kalman filter is extended to work with more samples. By using more samples, the computation of the error
covariance can be done by averaging over shorter time. This relaxes the stationarity assumption and is expected to simulate better the state-dependence model error. In this
study, this algorithm is implemented by using wind ensemble of the LAMEPS, which is
operational at the Norwegian Meteorological Institute. This setup is found to perform
similarly well as the steady-state Kalman filter during large positive surge. However, the
steady-state Kalman filter is found to perform better than the ensemble system in forecastiii
iv
ing negative surge. The resulting ensemble spread during negative surge is found to be
narrower than the standard deviation assumed by the steady-state Kalman filter. A further
investigation on the wind ensemble is required.
Contents
Acknowlegement
i
Summary
iii
1
1.1
1.2
.
.
.
.
.
.
.
1
1
3
4
5
5
7
10
.
.
.
.
.
13
13
15
17
18
20
.
.
.
.
21
21
21
24
27
1.3
1.4
1.5
Introduction
Why sea level prediction? . . . . . .
Operational sea level forecast system
1.2.1 Astronomical tides . . . . .
1.2.2 Storm surge . . . . . . . . .
Data assimilation . . . . . . . . . .
Research objectives . . . . . . . . .
Overview . . . . . . . . . . . . . .
2
2.1
2.2
2.3
2.4
2.5
Kalman filter
Kalman filter . . . . . . . . . . . . . . .
Steady-state Kalman filter . . . . . . . . .
Ensemble Kalman filter . . . . . . . . . .
Reduced Rank Square Root Kalman filter
Error statistics . . . . . . . . . . . . . . .
3
3.1
3.2
3.3
3.4
The Dutch Continental Shelf Model
Model area . . . . . . . . . . . . . .
Model equations . . . . . . . . . . . .
Numerical approximations . . . . . .
Operational setting . . . . . . . . . .
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Analysis and prediction of tidal components
Kalman filter
4.1 Introduction . . . . . . . . . . . . . . . . . . .
4.2 Approach . . . . . . . . . . . . . . . . . . . .
4.2.1 Narrow-band Processes . . . . . . . . .
4.2.2 Surge model . . . . . . . . . . . . . .
4.2.3 Kalman filter . . . . . . . . . . . . . .
4.3 System characteristics . . . . . . . . . . . . . .
4.3.1 Bode diagram . . . . . . . . . . . . . .
4.3.2 Cross Over . . . . . . . . . . . . . . .
4.4 Results and discussion . . . . . . . . . . . . .
4.5 Conclusion . . . . . . . . . . . . . . . . . . .
5
5.1
5.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
within observed surge using
29
. . . . . . . . . . . . . . 29
. . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . 34
. . . . . . . . . . . . . . 34
. . . . . . . . . . . . . . 34
. . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . 41
. . . . . . . . . . . . . . 49
Two-sample Kalman filter for steady state data assimilation
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
51
53
CONTENTS
II
5.3
5.4
5.5
5.6
6
6.1
6.2
6.3
6.4
6.5
7
7.1
7.2
7.3
7.4
7.5
5.2.1 Statistics from two independent realizations .
5.2.2 Open-loop estimate . . . . . . . . . . . . . .
5.2.3 Closed-loop estimate . . . . . . . . . . . . .
5.2.4 Proposed algorithm . . . . . . . . . . . . . .
Experiment using a simple wave model . . . . . . .
Experiment using the Dutch Continental Shelf Model
Experiment using the three-variable Lorenz model . .
Conclusion . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
54
55
55
58
60
65
68
Storm surge forecasting using a two-sample Kalman filter
71
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.2 Two-sample Kalman filter algorithm . . . . . . . . . . . . . . . . 73
6.2.3 System transformation . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Experiment with a simple wave model . . . . . . . . . . . . . . . . . . . 78
Experiments with the Dutch Continental Shelf Model . . . . . . . . . . . 81
6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4.2 The Dutch Continental Shelf Model . . . . . . . . . . . . . . . . 81
6.4.3 Wind stress error statistics . . . . . . . . . . . . . . . . . . . . . 81
6.4.4 Water level observation data . . . . . . . . . . . . . . . . . . . . 89
6.4.5 Computing Kalman gain using the two-sample Kalman filter . . . 90
6.4.6 Hindcast experiment . . . . . . . . . . . . . . . . . . . . . . . . 92
6.4.7 Forecast experiment . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4.8 Comparison with isotropic assumption based steady-state Kalman
filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.9 Tuning of observation error statistics . . . . . . . . . . . . . . . . 97
6.4.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Ensemble storm surge data assimilation and forecast
Introduction . . . . . . . . . . . . . . . . . . . . . . . .
Data assimilation algorithm . . . . . . . . . . . . . . . .
7.2.1 System representation transformation . . . . . .
7.2.2 Analysis algorithm . . . . . . . . . . . . . . . .
7.2.3 Summary . . . . . . . . . . . . . . . . . . . . .
Setup of experiment . . . . . . . . . . . . . . . . . . . .
7.3.1 The Model . . . . . . . . . . . . . . . . . . . .
7.3.2 10 m wind speed forecasts ensemble . . . . . . .
7.3.3 Water-level data . . . . . . . . . . . . . . . . . .
Wind stress error statistics . . . . . . . . . . . . . . . .
Hindcast experiment . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
105
105
106
107
109
110
111
111
112
114
114
116
CONTENTS
7.6
7.7
7.8
7.5.1 Time series of ensemble spread . . .
7.5.2 Innovations statistics . . . . . . . . .
7.5.3 Rank histograms . . . . . . . . . . .
7.5.4 Water level residual rms . . . . . . .
7.5.5 Discussion . . . . . . . . . . . . . .
Forecast experiment . . . . . . . . . . . . . .
Comparison with a steady-state Kalman filter
Conclusion . . . . . . . . . . . . . . . . . .
8
Conclusion
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
119
121
122
125
128
129
133
140
143
Bibliography
146
Samenvatting
155
Curriculum Vitae
157
1
Introduction
1.1 Why sea level prediction?
The work presented in this dissertation is about sea water level prediction. In particular,
the research work is focused on methods to improve the accuracy of water level prediction
in the North Sea, especially along the Dutch coast. In this section, we describe why it is
important to have an accurate sea level prediction system.
Predicting water level is one of the water management activities in the Netherlands.
There are different interests involved, which together motivate the effort to predict water
level accurately. These interests include safety, ecological, as well as economy. All of
these interests should be accommodated for successful water management.
The safety issue concerns with the risk of storm surge flooding. For coastal areas,
storm surge is the main contributor of flood risks. In fact, the risk deriving from storm
surges is a global, albeit regionalized, phenomenon affecting a large percentage of the
world population and many centers of commerce and trade (von Storch and Woth, 2008).
In the Netherlands, since parts of its area lie below mean sea level, the risk is even greater.
In fact, a number of occurrences have taken place in the past, where the storm surge
flooded the area near the coast. Throughout history, the populations of the Dutch coastal
provinces have been regularly afflicted by devastating storm surges. The last occurrence
happened on 1st February 1953. It took victims of 1836 casualties and approximately
100000 people were evacuated. It inundated about 136500 ha of land and killed tens
of thousands of livestocks. The damage to buildings, dikes and other infrastructure was
enormous (Gerritsen, 2005).
Immediately after the devastating storm surge of 1953, a Delta Commission was installed, with a task to develop measures to prevent such a disaster from recurring. The
key part of the plans recommended by the commission was the full closure of the Eastern
Scheldt with a regular dam to create a closed coastline.
In 1975, however, concerns about the economic and ecological issues were raised by
different social groups. If the sea is closed with dams, the salt water behind the dams
will gradually be changed into sweet water. This in turn will affect the ecosystem in that
area. A great variety of life lives in this ecosystem: more than 70 types of fish, 140 types
of water plants and algae, 350 types of animals living in the water and between 500 and
1
1.1 Why sea level prediction?
2
Figure 1.1: The 1953 storm surge flooding at Scharendijke, a village in Zeeland B.Th.
Boot / Rijkswaterstaat
600 types on the land. The Eastern Scheldt is also an important area for birds, as a place
for food, brooding, as well as hibernation. Closing the inlets with dams will eventually
ruin this habitat, the impact of which is irreversible. This would also have had severe
economic consequences. Fishery has always been the largest source of income for the
traditional fishing villages in this area. People have been farming oysters in the Eastern
Scheldt since 1870. Although safety remains an important concern, other aspects like
fishery and the environment should also be considered (www.deltawerken.com).
Through many discussions, the parliament eventually agreed to replace the design of
the closed dams with moveable storm surge barriers. The barrier is normally open, so the
water body in the two sides of the barrier is still connected to each other. In times when
the water level is higher than a safety threshold, the barrier is closed to protect the land
from the water.
Another moveable storm surge barrier is located in the New Waterway. This channel
connects the North Sea with the Rotterdam port. It protects around 1 million people in
Rotterdam, Dordrecht and the surrounding towns from storm surges from the North Sea.
Rotterdam is the largest port in Europe, with an annual throughput of more than 400
million tones of goods and is still expected to grow (www.portofrotterdam.com). It is a
gateway to a European market of more than 500 million consumers. A lot of ships, big
and small, come in and out of the port continuously via this channel. Considering the
economic value at stake, the barrier should be closed only if it is really necessary.
Using a moveable storm surge barrier, both safety, economy and ecological demands
are satisfied. Nonetheless, for successful operation of a storm surge barrier, information
about the water level some time ahead is required to be able to close the barrier in time.
The procedure of closing the barrier at the New Waterway, for example, takes six hours.
Introduction
Therefore, an accurate water level prediction is required, at least six hours ahead.
Besides being required for the closure of the storm surge barrier, water level prediction
is also required for raising alarms. To ensure that the people have sufficient time to flee the
area should a flood is indicated, alarm is raised at least six hours ahead of when the storm
surge is expected to come. This can be done only if an accurate water level prediction
system is employed.
Accurate water level prediction is also required for port management. At times of
negative surges at low tides, it may be dangerous for supertankers and bulk carriers to
leave or enter the ports of Rotterdam, IJmuiden and the Westerschelde. The largest ships
that can enter the port of Rotterdam are confined to the Euro channel, with a depth of 25.4
m, and to Meuse channel, with a depth of 23.75 m (Schiereck and Eisma, 2002). One
of the biggest ships calling at Rotterdam is the Berge Stahl, carrying iron ore and sails
between Brazil and Rotterdam. It is a ship with 343 m length, 65 m width, and a draft,
or depth in the water, of 23 m. Passage must be timed to coincide with the time window
when the water is high to prevent the ship running aground.
Figure 1.2: Storm surge barrier at the Eastern Scheldt. Rijkwaterstaat-AGI
1.2 Operational sea level forecast system
Two most dominant forces, which determine the sea water level are the gravitational forces
of the moon and sun on earth and the wind stresses exerted by the atmospheric flow just
above the sea surface. The operational water level forecast system in the Netherlands
is based on the decomposition of the water level components according to these forces.
Each component is predicted separately and the predicted total water level is defined as
the superposition of both components (Heemink, 1986).
3
1.2 Operational sea level forecast system
4
Figure 1.3: Storm surge barrier at the Nieuwe Waterweg. The picture is taken from ”Flood
Control,” Microsoft Encarta Online Encyclopedia 2008.
1.2.1 Astronomical tides
The water level component, which is due to the gravitational forces of the moon and sun,
is called the astronomical component. The gravitational forces of the moon and the sun
on the ocean deform the water body and induce variation of water level in the sea. In the
same time, this deformation propagates as tidal wave.
Since the relative motion of the earth, moon and sun is harmonic in time, the astronomical component is also harmonic. Hence, the astronomical component can be expressed
as the sum of several sinusoid functions with different frequencies, each of the sinusoid
function is referred to by tidal constituent:
ξ(t) =
J
X
Hj cos(ωj t + Gj )
(1.1)
j=1
where ξ(t) is the astronomical component water level at time t, while Hj , Gj , and ωj are
the amplitude, phase and frequency of tidal constituent j, respectively. The frequency
of each tidal constituent corresponds to the characteristic frequencies of relative motion
of the moon-earth-sun system. For example, the tidal constituent M2 corresponds to the
motion of the moon around the earth and has a frequency of 12.421 h. The frequencies of
oceanic tidal constituents can be derived from gravitational potential on earth surface due
to the moon and the sun (see, for example, Kantha and Clayson 1999). When the tidal
waves propagate into a shallow water area, higher harmonics of tides are formed due to
the nonlinearity of the shallow water system. The frequencies of these shallow water tides
is a linear combination of those of the fundamental frequencies. For example, MS4 has a
frequency that is the sum of M2 and S2 and is generated by the interaction of M2 and S2.
Introduction
Due to its periodic nature, the analysis of the astronomical component is performed
by using harmonic analysis. Here, using a long sequence of observed water level data,
usually of one year long, Equation (1.1) is solved using a least square estimator to determine the amplitude and phase of each tidal constituent. On its propagation, the tidal
waves are deformed by the structure and local characteristics of the sea, such as sea bed
configurations, coastal shapes, etc. As the result, the astronomical tide has a local characteristic. Hence, the number of tidal constituents required to provide accurate predictions
is different from one location to another. For the Dutch coast, typically more than one
hundred tidal constituents are needed.
The prediction of the astronomical component is done by using the same equation,
with the estimated amplitude and phase of each tidal constituent. The prediction of astronomical tide at a location can be done fairly accurately over a period of years in advance
given that there is no structural change in the vicinity of that location, like for example
the building of dams or other constructions.
1.2.2 Storm surge
Another factor which affects the water level is the wind stress exerted on the sea surface.
When the wind blows in one direction, it will push against the water and cause the water to
pile up higher than the normal level of the sea. This pile of water is pushed and propagated
in the direction of the wind. The growth of the water level due to the wind stress depends
on wind strength and how long the wind blows in one direction. The meteorological
component of sea water level is often called a storm surge.
The operational storm surge prediction is performed by using a numerical model called
the Dutch Continental Shelf Model (DCSM). This model is based on the shallow water
equations. It describes how the water level and velocities are related and evolves in time
as a response to the wind forcing exerted on the sea surface as well as to the tidal waves
coming in from the Atlantic ocean. A more detailed description of this model is given in
Chapter 3.
Although the DCSM also includes the astronomical component as its input forcing,
its prediction is found to be less accurate than the ones obtained using harmonic analysis. Therefore, in the operational system it is used only for predicting the meteorological
component. The meteorological component is obtained by subtracting the periodic component from the total predicted water level. The DCSM periodic component is obtained
by running the DCSM without wind input (Figure 1.4).
1.3 Data assimilation
A mathematical model is developed to predict the actual behavior of the system of interest.
However, a model is just an approximate representation of the actual system. Therefore,
predictions made by the model will always contain errors. The errors may be due, for
5
1.3 Data assimilation
6
wind
DCSM
Storm Surge
−
DCSM
(without wind input)
Total Sea Level
+
HARMONIC ANALYSIS
Astronomical Tides
Figure 1.4: Scheme of the operational sea level forecast system in the Netherlands
example, to neglected physical processes in the model equations or to uncertain model
parameters. The DCSM consists of a set of partial differential equations, which describes
how the state variables evolve in time. Solving the equations requires discretization in
time and space. This discretization entails that only processes with scales larger than
several grid size can be reproduced reliably. The model also contains model parameters,
such as bathymetry and bottom friction coefficients, which are not known exactly and
should be determined empirically. Errors in the prediction can also be due to uncertain
initial and boundary conditions. A major source of errors in a storm surge model is the
uncertain wind input forcing. The wind data that is used as input forcing for a storm surge
model is usually the product of a meteorological model, which in itself contains error.
The multitude of error sources generate uncertainty in the model prediction.
Another source of information about the actual system in question is the information
from observations. In the Netherlands, for example, a network of tide gauges are available along the coast and off shore, which provide on-line information about water level
regularly every ten minutes. Observation data are also available from satellite altimetry
or from measuring ships. The information from observations is usually reasonably accurate. However, observed data usually only gives information at a few locations. Moreover,
observation data alone can not provide predictions about the future state of the system.
In order to improve the prediction, one may use a data assimilation technique. Data
assimilation is the incorporation of observation data into a model to improve the forecast made by the model. The search for data assimilation algorithms was started by the
meteorological science community. Nowadays, data assimilation has become one of the
essential components of forecast systems in many earth system applications. In operational forecast systems, the state obtained at the end of the assimilation period is taken as
the initial state for the ensuing forecast.
There are two main classes of data assimilation algorithm: variational and sequential
methods. The variational data assimilation aims at globally adjusting a model solution
Introduction
to all the observations available over the assimilation period (Talagrand, 1997). In this
method, the incorporation of observed data is done by first defining a scalar function,
which measures the deviation of the model results from the observation. The initial condition estimate is computed by minimizing this so called cost function with respect to
the initial condition. For an efficient computation, the minimization of the cost function
requires the implementation of the adjoint of the linearized model in question. For some
cases, however, it is difficult to build an adjoint model. The difficulty in developing the
adjoint model makes the variational method impractical in some cases.
In the sequential data assimilation method, observation data is assimilated every time
it becomes available. The most popular method in this class is the Kalman filter. In
Kalman filter, the model is not only used to propagate the model state, but also to determine the uncertainty of the estimate, expressed in term of the error covariance. The
estimate of the uncertain parameters or model states are obtained by a weighted average
between observation and model prediction, where the weights are determined based on
the covariances of the model prediction and observation errors. The implementation of
a Kalman filter does not require an adjoint model. However, its implementation has also
difficulties, namely the computational cost of the error covariance propagation and the
nonlinearity. A more complete description about Kalman filtering is given in Chapter 2.
The data assimilation used in the operational storm surge forecast system in the Netherlands is based on the so called steady state Kalman filter. A more complete description
about the steady-state Kalman filter is given in Chapter 2. Here we suffice ourselves to
describe that the steady-state Kalman filter is a Kalman filter with constant error covariances. For stable systems with constant error covariances, the weighting matrix, usually
called as the Kalman gain, will also be constant. Hence, it is possible to compute the
Kalman gain prior to using the Kalman filter in an operational setting. Once the Kalman
gain is available, it is simply inserted to the forecast system as shown in Figure 1.5.
Experience has shown that assimilating total sea level observation into the DCSM
may cause negative effect on the forecast performance. This has to do with the fact that
the DCSM can not reconstruct the astronomical components accurately, as mentioned
in Section 1.2.2. To avoid assimilating astronomical tides information into the DCSM,
the adjusted observation described in Figure 1.6 is used instead of the total sea level
observation. Here, the astronomical components as predicted by harmonic analysis are
replaced by those obtained using DCSM without wind input.
1.4 Research objectives
The general objective of this research is to develop and implement methods for improving
sea level forecast. This objective is achieved by following several directions. From the
scheme of the operational forecast system in the Netherlands, we can identify various
directions, which we can take to improve the sea level forecast accuracy. Two directions
are explored in this study.
7
1.4 Research objectives
8
wind
H
DCSM
−
adjusted observation
+ K
Figure 1.5: Scheme of the operational data assimilation system, H: observation operator,
K: Kalman gain.
Total Sea Level Observation
HARMONIC ANALYSIS
−
adjusted observation
+
DCSM
(without wind input)
Figure 1.6: Diagram block of adjusted observation
The first direction is to improve the accuracy of astronomical tides prediction. The
decision to follow this direction is motivated by the fact that periodic components can still
be found in the residual of harmonic analysis. The idea is to use the method introduced
by Heemink et al. (1991) to predict the remaining harmonic components and use them as
correction to the astronomical tides forecasts obtained using harmonic analysis.
Another direction is to improve the data assimilation scheme, which is coupled to
the DCSM. As mentioned earlier, the operational data assimilation system is based on a
Kalman filter. Kalman filter could in theory be used for optimal estimation of a system
state. However, in practice it requires the information about the error covariances of the
forecast model, which is crucial but difficult to estimate (Dee, 1995). In fact, the issue of
error covariance estimation applies to most data assimilation algorithms, which find root
Introduction
in the statistical estimation theory.
A problem of specifying forecast/model error covariance matrix is that the error covariance matrix is usually enormous. We are forced to simplify it just to fit it into the
computer. Even if we could fit it into the computer, we do not have sufficient statistical
information to determine its elements. The second problem is that we do not know what
the errors are since we do not know the truth. We can not produce samples of model or
forecast errors since we do not know the true state (Fisher, 2003).
There are three possible ways to approach this problem. The first way is explicitly
model the structure of the error covariance. In the operational system, for example, the
model error variance is assumed to be homogeneous in space. Moreover, the spatial correlation is assumed to be isotropic. This means that the correlation between errors at two
locations is dependent only on distance between the two locations and independent from
direction. Here, the error covariance is parameterized into a few parameters, such as correlation length and standard deviation (Verlaan, 1998). Parameterizing error covariance
into a few calibrating parameters is also proposed by Dee (1995) and used by Mitchell
and Houtekamer (2000).
The second possible way is to disentangle information about the error statistics from
the difference between the model results and observations (Hollingsworth and L¨onnberg,
1986; Xu and Wei, 2001). This method is likely to give the most accurate estimate of the
error covariance. However, it requires a lot of observation points within the model area.
It gives less accurate information for the area with only few or no observations.
The third method is to find a surrogate quantity whose error statistics are expected to
be similar to those of the unknown system error statistics (Fisher, 2003). The most popular
choice for a surrogate of samples of the error is to use differences between forecasts of
different length that verify at the same time, frequently referred to as the NMC method
(Parish and Derber, 1992; Yang et al., 2006a). The practical advantage of this approach
is that once such forecast samples have been generated, it is straightforward to determine
covariances of the error for the entire model domain, in terms of the model variables.
Moreover, the main advantage of such method is that in an operational NWP environment,
the forecasts required to calculate the statistics are already available in the operational
archives. No computationally expensive running of the forecast or analysis system is
required (Fisher, 2003). The difficulty with the approach is that the statistics of any chosen
surrogate quantity are likely to differ in some aspects from the true statistics of the error.
In this research, we develop a method for estimating error covariance using a surrogate quantity. For the storm surge forecast system used in our study, wind fields data as
products of two similarly skillful atmospheric models are available. Here, we explore the
possibility of representing the model error from the difference of the model results obtained by running the model with these two wind fields. This approach is similar to Alves
and Robert (2005) and Leeuwenburgh (2007). However, their works are in the framework of ensemble Kalman filtering. In our research, like in the operational system, the
method is developed in the framework of steady-state Kalman filtering. To accommodate
this study we first develop a method for computing a steady-state Kalman gain from two
9
1.5 Overview
10
samples of the forecast system. We call it the two-sample Kalman filter. It is a recursive
algorithm for computing a steady-state Kalman gain of a stochastic system based on two
samples of the system. It is based on the assumption that the error is stationary. The error
covariance is computed by averaging over time.
It is also known that model error actually vary in time along with the actual system
state (Dee, 1995). To improve the data assimilation performance even further, we extend
the two-sample Kalman filter algorithm to work with more samples. By using more samples, it is possible to compute the error covariance by averaging over shorter time. In
this way we can relax the stationary assumption. This will in principle result in a more
accurate estimate.
1.5 Overview
The remainder of this book is organized as follows. The next two chapters give a general introduction related to the research, followed by four chapters describing the content
of the research works. Finally, the last chapter concludes this book with summary and
recommendations.
Chapter 2 is devoted to a brief description about Kalman filtering. It presents the basic
equations of the Kalman filter algorithm. Some difficulties in practical implementation of
a Kalman filter are mentioned and some variants of Kalman filter algorithm to cope with
these difficulties, which are found in literature, are presented.
Chapter 3 gives the description of the DCSM, the numerical storm surge model used
throughout this study. A brief explanation about the model area, system of equations and
the numerical method is presented.
The forecast residual using harmonic analysis is found to still contain some astronomical components. This has motivated us to improve the forecasts, by extracting the
astronomical components from the harmonic analysis residual. We call hereafter the residual of harmonic analysis as observed surge. In Chapter 4, the results of the study about
analysis and forecast of astronomical components within observed surge are presented. It
describes how each tidal constituent is modelled and forecast.
Chapter 5 presents our proposed algorithm, called the two-sample Kalman filter, for
computing a steady state Kalman gain. The development of this algorithm was motivated
to provide a framework for estimating forecast error covariance by simply using two realizations of a dynamic system. Here, results of twin experiments using different models
are presented to demonstrate the validity of the algorithm.
In Chapter 6, we implement the two-sample Kalman filter on the DCSM. Two wind
fields from two different meteorological centers are used to force the DCSM for generating the two samples required for computing the steady state Kalman gain. It is shown that
the algorithm performs well with the DCSM.
We have extended the two-sample Kalman filter algorithm for working with more than
two samples. Chapter 7 describes both the modified algorithm as well as its implementa-
Introduction
tion with the DCSM. In this experiment, we use an ensemble of wind field forecasts, to
generate the ensemble of DCSM forecasts. Both the analysis and forecast ensembles are
evaluated.
Chapter 8 concludes the disertation, by summarizing the research works that have
been done as well as giving recommendations for future research.
11
12
1.5 Overview
2
Kalman filter
Kalman filter is a recursive algorithm for estimating the state of a dynamic system. It
combines sequentially in time the state of a dynamic system with observation to produce
optimal estimate of the system state. Since its first publication (Kalman, 1960), Kalman
filter has received a lot of attention and found many applications. Originated from the
field of control theory, it has found many applications in other fields as well, including
earth system applications.
The purpose of this chapter is to provide the readers with some basic equations of
the Kalman filter and its variants pertaining to this study. We have not intended to give
a complete description about the Kalman filter theory. The exposition will focus only on
certain issues related to Kalman filter algorithm and how certain algorithms found in the
literature solve these issues. For a complete description about the Kalman filter theory,
the readers are referred to, for example, Jazwinski (1970) and Maybeck (1979).
We first give a brief description about the classical Kalman filter and mention some
issues related to that. It is followed by a short description about some variants of the
classical Kalman filter. Only three variants pertaining to this study are described. Those
are the steady-state Kalman filter, the Ensemble Kalman filter (EnKF), and the Reduced
Rank Square Root Kalman (RRSQRT) filter. In the last section we give a brief discussion
about a practical problem of implementing a Kalman filter, which we deal with in this
research.
2.1 Kalman filter
The Kalman filter is an algorithm, which solves the general problem of estimating the
process state X(tk ) ∈ Rn at time tk that is governed by the linear stochastic difference
equation
X(tk ) = M(tk )X(tk−1 ) + B(tk )u(tk−1 ) + G(tk )η(tk−1)
(2.1)
with a measurement Y(tk ) ∈ Rm that is
Y(tk ) = H(tk )X(tk ) + ν(tk )
(2.2)
where M(tk ) is the model operator, u(tk ) input forcing, B(tk ) input forcing matrix, G(tk )
model error matrix and H(tk ) observation operator. The random variable η(tk ) represents
13
2.1 Kalman filter
14
the noise of the process due to model and/or input forcing error, while ν(tk ) represents
observation noise. They are assumed to be Gaussian white noise and independent of each
other, with covariance Q(tk ) ∈ Rn×n and R(tk ) ∈ Rm×m , respectively.
Due to the linearity of the system, the Gaussianity of the model state is preserved
in time. It is known that the statistics of a normally distributed random variable can be
described completely by using its mean and variance. Therefore, to obtain a complete
statistical description of the system state X(tk ) above, it is sufficient to propagate both
the mean and covariance of the system state. This is what is done in the Kalman filter: it
propagates in time the mean as well as covariance of the system state.
The Kalman filter uses the mean as the estimate of the system state. For normally
distributed random variable, the mean value is the optimal estimate that satisfies many
different criteria of optimality, such as minimum variance, least squares, and maximum
likelihood estimates (Maybeck, 1979). Moreover, by using the mean value as the state
estimate, the state covariance is in fact also the covariance of the error committed by that
estimate of the state value. When observations are available, it uses the covariance to
define how the observation should be used to update the system state estimate.
The Kalman filter solves the estimation problem above in two steps: forecast and
analysis steps. Therefore, for clarity, we denote the estimates of the system state computed
at forecast and analysis steps as xf (tk ) and xa (tk ), respectively. Their corresponding
error covariances are denoted by Pf (tk ) and Pa (tk ). In the forecast step, the mean and
covariance of the process are propagated forward in time by:
xf (tk ) = M(tk−1 )xa (tk−1 ) + B(tk−1 )u(tk−1 )
(2.3)
Pf (tk ) = M(tk−1 )Pa (tk−1 )M(tk−1 )⊤ + G(tk−1 )Q(tk−1 )G(tk−1 )⊤
(2.4)
while at analysis time, that is when observation, y o , is available, the forecast state is
combined linearly with observation to obtain an updated system state estimate:
xa (tk ) = xf (tk ) + K(tk )(yo (tk ) − H(tk )xf (tk ))
a
f
P (tk ) = (I − K(tk )H(tk ))P (tk )
(2.5)
(2.6)
where K(tk ) is called the Kalman gain, given by:
K(tk ) = Pf (tk )H(tk )⊤ [H(tk )Pf (tk )H(tk )⊤ + R(tk )]−1
(2.7)
The algorithm is run sequentially in time and started from a specified initial condition:
xa (t0 ) = xa0
a
P (t0 ) =
Pa0
(2.8)
(2.9)
In short, to implement a Kalman filter, we need to supply the information about xa (to ) and
P(to ) as initial guess and its covariance, as well as the covariance of model error Q(tk )
and measurement error R(tk ) at each time tk as input parameters.
In principle, the optimal estimator for linear systems with Gaussian error process is
the Kalman filter. However, there are some difficulties in its implementation, especially
Kalman filter
15
for earth systems applications. One difficulty, which hampers the implementation of the
classical Kalman filter for large scale models is the computational cost due to the propagation of the error covariance. For a system with dimension n, the covariance matrix
will have a dimension of n × n. In earth system applications, the system state is usually of large dimension. For the storm surge forecast system that we are working with
here, for example, the dimension of the system state is about 60000. Working directly
with matrices having dimension of squared of this size will take too much computational
time. Another difficulty is related to the linear assumption. Most of real life systems are
nonlinear. Many approximations to the Kalman filter have been proposed to solve these
problems. In the remaining of this chapter, we give a brief explanation of three algorithms,
which are pertaining to our study.
2.2 Steady-state Kalman filter
It should be noted here that the Kalman gain K(tk ) only depends on the forecast error
covariance Pf (tk ) and measurement error covariance R(tk ), and it is independent from
the realizations of measurement itself. Therefore, it is possible to prepare the Kalman
gain before using the Kalman filter for data assimilation. Moreover, if the model is timeinvariant and stable, the Kalman gain K(tk ) will converge to a limiting value K. When this
steady-state Kalman gain is used for all measurement times the estimate converges to the
optimal estimate for large times tk . This fact can be exploited to reduce the computational
cost of running the Kalman filter.
A time-invariant model is a special system described by Equation 2.1-2.2, where the
system matrices M, B, G, H, Q and R are constant. A model is exponentially stable if
the modulus of all eigenvalues of M is smaller than one. The stability of a linear and
time-invariant system can also be evaluated by using the concept of controllability and
observability.
A system is controllable if all components or linear combinations of these components are affected by the system noise. The result is that the forecast error covariance
matrix will be positive definite. Controllability can be verified by computing the rank of
controllability matrix C:
C = [G MG M2 G ... Mn−1 G].
(2.10)
The system is controllable if the rank of C is equal to n, where n is the dimension of the
state vector.
The concept of observability is related to the measurement information that is available. A system is observable if all variation of the system state can be observed in the
measurements. The result of observability is that the covariance matrix of the analysis
error is bounded from above. Observability can be checked by computing the rank of the
observability matrix O:
O = [H⊤ MH⊤ M2 H⊤ ... Mn−1 H⊤ ].
(2.11)
2.2 Steady-state Kalman filter
16
The system is observable if the rank of O is equal to n.
If a system is observable and controllable, then it is also exponentially stable. For
time invariant systems, controllability and observability also imply that the forecast error
statistic will converge to a steady-state and the Kalman filter for large time tk becomes
time invariant as well.
For linear, time-invariant and stable system, the steady-state Kalman gain can be computed by running only the covariance propagation and using the forecast error covariance
to compute the Kalman gain, i.e. Equation 2.4, 2.6 and 2.7. Once the solution of the equations has reached a constant value, the Kalman gain can be used for data assimilation. This
computation is performed off-line and only once. Hence, the computational problem of
the error covariance propagation is eliminated. Once the Kalman gain is available, only
the equations for the state forecast and analysis have to be solved on-line:
xf (tk ) = Mxa (tk−1 ) + Bu(tk−1 )
a
f
o
f
x (tk ) = x (tk ) + K(y (tk ) − Hx (tk ))
(2.12)
(2.13)
The steady-state Kalman filter is computationally efficient. It only adds a small fraction to the running time of the model itself. Moreover, the implementation of the analysis
equation only requires a few additional codes to the model program.
For many environmental systems, it is computationally expensive to directly solve
Equation 2.4, 2.6 and 2.7 due to the large dimension of the involved matrices. To efficiently compute the steady-state Kalman gain, different methods can be used. Heemink
(1986), for example, used a discrete form of the Chandrasekhar-type algorithm to compute a steady-state Kalman filter for a storm surge forecast system. This algorithm was
first proposed by Morf, Sidhu and Kailath (1974) and is based on the recursion of the
incremental covariance ∆P(tk ) = Pa (tk ) − Pa (tk−1 ). The advantage of working with incremental covariance is that the rank of this matrix is equal to p, where p is the dimension
of the system noise process and usually much smaller than the dimension of the system
state n, i.e. p << n. This reduces the computational time significantly.
Although in reality the error statistics may vary in time with the actual system state
(Dee, 1995), the steady-state Kalman filter algorithm has been proven efficient in many
applications (e.g. Heemink, 1990; Canizares et al., 2001; Oke et al., 2002; Verlaan et al.,
2005; Sørensen et al., 2006; Serafy and Mynett, 2008). The limitation of this approach
is that it requires a fixed observing network; that is, the observation matrix H is constant.
Moreover, this algorithm does not solve the nonlinearity issue. The approach usually
taken for dealing with the nonlinearity issue is to propagate the error covariance matrix by
using a linearized model and to compute the steady-state Kalman gain with this simpler
model. Sometimes, this approach is combined with using a coarser grid to solve the
computational cost problem in propagating the error covariance (Heemink, 1990).
Kalman filter
17
2.3 Ensemble Kalman filter
A popular method to solve the nonlinearity and computational cost issues in implementing the Kalman filter is the ensemble Kalman filter (EnKF), first introduced by Evensen
(1994). It is based on Monte Carlo method where the uncertainty of the model state is
represented by using ensemble of the model state realizations. Here, the propagation of
the error covariance is replaced by the propagation of each member of the ensemble. The
number of the ensemble members q is chosen to be much smaller than the number of
columns of the error covariance matrix, q << n. Therefore, it reduces the computational
cost considerably. Moreover, it is relatively simple to implement. Due to these nice properties, it has been a subject of many researches and applications. A complete description
about the derivation of EnKF can be found in Evensen (2003). In this section, we are
going to give only the basic descriptions about the algorithm.
The equations for the EnKF are as follows. Suppose we have an N-member ensemble
of the model state, ξ1 , ξ2, ..., ξN . At forecast step, each ensemble member is propagated in
time and forecast perturbation matrix Ef is formed:
ξif (tk ) = M(tk−1 )ξia (tk−1 ) + B(tk−1 )u(tk−1 ) + G(tk−1 )ηi (tk−1 ), i = 1, ..., N (2.14)
N
1 X f
ξ (tk ) (2.15)
xf (tk ) =
N i=1 i
Ef (tk ) = √
1
f
[ξ1f (tk ) − xf (tk ), ..., ξN
(tk ) − xf (tk )] (2.16)
N −1
At analysis step, each ensemble member is updated using perturbed observation and analysis perturbation matrix is defined
ξia (tk ) = ξif (tk ) + K(tk )(yo (tk ) − H(tk )ξif (tk ) + νi (tk ))
1
a
Ea (tk ) = √
[ξ1a (tk ) − xa (tk ), ..., ξN
(tk ) − xa (tk )],
N −1
(2.17)
(2.18)
where the Kalman gain is computed by
K(tk ) = Ef (tk )Ef (tk )⊤ H(tk )⊤ [H(tk )Ef (tk )Ef (tk )⊤ H(tk )⊤ + R(tk )]−1
(2.19)
As can be seen from the equations above, instead of working directly with error covariance matrices Pf and Pa , the EnKF works with the perturbation matrix Ef and Ea ,
which are of lower rank. Therefore, the computational expense is reduced. When needed,
the covariance matrices can be computed from Pf = Ef (Ef )⊤ and Pa = Ea (Ea )⊤ .
Note that at each analysis step, a random vector drawn from observation error distribution is used to perturb the observation, as in Equation (2.17). This is necessary in
order to prevent systematic underestimation of the analysis error covariance (Evensen,
2003). While it helps to produce correct analysis error covariance, the addition of perturbed observations adds an additional source of sampling error related to the estimation
of the observation error covariances. Some modifications of this approach have been proposed to avoid having to perturb the observations randomly (Whitaker and Hamill, 2002;
18
2.4 Reduced Rank Square Root Kalman filter
Tippett et al., 2003; Evensen, 2003; Sakov and Oke, 2008). Reformulating the analysis
deterministically reduces the sampling error in the analysis step.
Because the error covariance matrix is represented by the multiplication of its square
root matrix, it will always be positive definite. The algorithm can also be implemented
for nonlinear models without any conceptual difficulty. Here, like for linear models, the
propagation of error covariance for nonlinear models is simply done by propagating each
ensemble member by the nonlinear model.
A limitation of the EnKF is that the standard deviation of the errors in the state estimate
is of a statistical nature and converge slowly with the ensemble size. In fact, due to the
limitation of computational resources, application of EnKF for large dimensional systems
requires one to use a small ensemble. This in turn will cause a problem of spurious
correlation, due to sampling error. A popular ad hoc technique to solve this problem is the
covariance localization method (Houtekamer and Mitchell, 2001). This method involves
the Schur product of the empirical covariances with an analytical covariance function
(Gaspari and Cohn, 1999), which sets the spurious correlation at distant locations from a
reference point to zero. The covariance localization, however, has certain disadvantages.
It induces, for example, the issue of imbalance (e.g. Mitchell et al., 2002), which may
be important for certain applications such as for large-scale atmospheric systems. It also
adds an additional parameter, which requires tuning.
Another problem raised due to the small ensemble size is that the error covariance
tends to be underestimated. A popular method for solving this problem is to use the
covariance inflation algorithm (Anderson and Anderson, 1999). Here, the underestimated
prior ensemble state covariance is increased by linearly inflating each ensemble member
with respect to the ensemble mean.
2.4 Reduced Rank Square Root Kalman filter
Another variant of the Kalman filter algorithm, which solves the problem of computational
cost, is the Reduced Rank Square Root (RRSQRT) Kalman filter (Verlaan and Heemink,
1997). It has been applied for data assimilation research in estuarine system (Bertino
et al., 2002), coastal modelling (Madsen and Canizares, 1999) and atmospheric chemistry
models (Segers et al., 2000).
The RRSQRT method can be considered as an ensemble-based filter as well. The
difference is that in the RRSQRT algorithm the ensemble members are not selected randomly, but in the direction of the most important eigen vectors. Here, a square root
decomposition of the error covariance is used to ensure that the error covariance remains
positive at all times. Instead of working with the full covariance matrix, the RRSQRT
propagates the square root of the covariance with lower rank. The forecast step of the
Kalman filter
19
RRSQRT can be written as
xf (tk ) = M(tk−1 )xa (tk−1 ) + B(tk−1 )u(tk−1 )
Lfc (tk )
a
= [M(tk−1 )L (tk−1 ), G(tk−1 )Q(tk−1 )
1/2
(2.20)
]
(2.21)
Lf (tk ) = Πf (tk )Lfc (tk )
(2.22)
while the analysis step as
K(tk ) = Lf (tk )Lf (tk )⊤ H(tk )⊤ [H(tk )Lf (tk )Lf (tk )⊤ H(tk )⊤ + R(tk )]−1
a
f
o
f
x (tk ) = x (tk ) + K(tk )(y (tk ) − H(tk )x (tk ))
Lac (tk ) = [(I − K(tk )H(tk ))Lf (tk ), K(tk )R(tk )1/2 ]
a
L (tk ) =
Πa (tk )Lac (tk )
(2.23)
(2.24)
(2.25)
(2.26)
where Lf (tk ) and La (tk ) are the n × q estimate square root of the error covariance Pf (tk )
and Pa (tk ), respectively. The notation [ , ] denotes that the matrix is built from two
block matrices on the left- and right-hand side of the coma sign. Note that the propagation M(tk−1 )La (tk−1 ) is much faster than that of the original Kalman filter, because in
RRSQRT the matrix Lf (tk ) contains only q columns, while q << n. The addition of the
square-root of model error covariance in Equation (2.21), however, results in a matrix with
a larger dimension. Hence, a reduction step must follows to reduce the number of columns
back to q. The symbol Π denotes the reduction operator. The basic idea of this approximation is to use only the first q leading eigenvalues and eigenvectors of the approximate
error covariance matrix Pfc (tk ) = Lfc (tk )Lfc (tk )⊤ . The covariance matrix Pfc (tk ) itself is
actually never computed. The computations are based on the eigen decomposition of the
smaller dimension matrix Lfc (tk )⊤ Lfc (tk ), hence it is computationally cheaper than the full
Kalman filter. Similar procedure is also performed for the analysis error covariance.
Besides being computationally cheaper than the original full Kalman filter, the RRSQRT
can be extended to work with nonlinear model. The changes needed are conceptually not
very difficult. For nonlinear models, the time propagation of the estimate is performed using the non-linear model, M, while the time propagation of the error covariance estimate
is done by the tangent linear model, as follows
xf (tk ) = M[xa (tk−1 ), u(tk−1 )]
1
lif = {M[xa (tk−1 ) + ǫlia (tk−1 ), u(tk−1 )] − M[xa (tk−1 ), u(tk−1 )]}
ǫ
(2.27)
(2.28)
where lif is ith column (mode) of the matrix Lf
The weakest point of the RRSQRT Kalman filter is the reduction step [Equation (2.22,
2.22)]. This step can be computationally expensive and it is also not invariant for a scaling of the state variables. Some modifications have been proposed to cope with these
problems, for example, by Treebushny and Madsen (2005) and Segers et al. (2000).
2.5 Error statistics
20
2.5 Error statistics
In the previous sections, we have described three approximate algorithms, which solve
the computational and nonlinearity issues of Kalman filter implementation. Another important issue related to the implementation of Kalman filter is the specification of error
statistics. Although a Kalman filter computes a forecast error covariance internally, this is
a valid depiction of the true errors committed by the filter only to the extent that the filter’s
own system model adequately portrays true system behavior (Maybeck, 1979). In other
words, the Kalman filter will give an optimal estimate only if the specified error statistics
exactly represent the true measurement and model error statistics. In practice, however, it
is impossible to obtain the true model error statistics, since we do not know the true state.
Note that the issue of specifying error covariance applies also to most other data assimilation methods. To solve this problem one usually resorts to explicitly model the error
covariance and represent it as a simple function of a few calibrating parameters. As mentioned earlier, another way to approach this issue is to find a surrogate quantity whose
statistics are expected to be similar to the unknown true statistics.
In this research project, we proposed a method, which is based on such surrogate quantity to represent model error statistics. This method is based on computing error statistics
using two system realizations. The two system realizations are generated, for example,
by using two different but similarly skillful models. Here, the difference between the two
models is assumed to have the statistics, which are similar to those of the unknown true
error. The algorithm is developed in the framework of steady-state Kalman filtering. We
first develop an algorithm to accommodate this study and describe it in Chapter 5. In
Chapter 6 we implement this algorithm for the operational storm surge forecast system in
the Netherlands. To gain more improvement, we have extended the algorithm to work with
more realizations, resulting in an ensemble-based Kalman filter algorithm. This extended
algorithm is presented in Chapter 7.
3
The Dutch Continental Shelf Model
To provide the readers with information about the numerical model used in this study,
this chapter presents a description about the Dutch Continental Shelf Model (DCSM).
This model has been used operationally in the Netherlands since the mid-1980s for water level and storm surge forecasting. Some developments have been taking place on
the model since then and new developments are still on going (Verlaan et al., 2005).
The DCSM is based on the WAQUA simulation software package developed by the Department of Public Works (Rijkwaterstaat). The development of the exposition below is
mainly derived from Gerritsen et al. (1995), Verlaan et al. (2005) and the technical documentation of WAQUA.
3.1 Model area
The decision about the area to be covered by the DCSM was taken in the beginning of
the 1980’s. The area was chosen to model explicitly the nonlinearities of the surge-tide
interaction and to give a better response to approaching oceanic depression systems by
including their effect in the boundary condition. These requirements imply a model extent
to beyond the 200 m depth contour and therefore the model area includes the British isles.
This results in one open-sea model boundary along deep water (Figure 3.1).
3.2 Model equations
The basic equations of the DCSM are the 2D shallow water equations, which describe
the large-scale motion of water level and depth-integrated horizontal flow. They are well
suited for modelling tidal and meteorologically induced water levels and flows. The equations are derived from the principles of mass and momentum conservations, which in a
21
3.2 Model equations
22
62
200
0
0
02
20
20
200
0
0
50
20
60
50
50
50
50
50
56
50
50
50
50
50
50
54
50
latitude (o N)
50
50
25000
200
50
200
58
50
50
200
52
50
50
50
0
20
50
−10
−5
0
longitude (o W/E)
5
10
Figure 3.1: DCSM area with the 50 m and 200 m depth contour lines.
cartesian coordinate system read:
√
∂u
∂u
∂ξ
gu u2 + v 2
∂2u
∂u
+u
+v
+g
− fv +
−
ν(
∂t
∂x
∂y
∂x
C 2H
∂x2
√
∂v
∂v
∂v
∂ξ
gv u2 + v 2
∂2v
+u
+v
+g
+ fu +
−
ν(
∂t
∂x
∂y
∂y
C 2H
∂x2
∂ξ ∂Hu ∂Hv
+
+
= 0(3.1)
∂t
∂x
∂y
∂2u
1 τx
1 ∂pa
+ 2) =
−
(3.2)
∂y
ρw H ρw ∂x
∂2v
1 τy
1 ∂pa
+ 2) =
−
(3.3)
∂y
ρw H ρw ∂y
where:
x, y
u, v
ξ
D
H
g
= cartesian coordinates in horizontal plane
= depth-averaged currenct in x and y direction respectively
= water level above a reference plane
= water depth below a reference plane
= total water depth (H = D + ξ)
= gravity acceleration
The Dutch Continental Shelf Model
f
C
τx , τy
ρw
pa
ν
23
= coefficient for the Corriolis force
= Chezy coefficient
= wind stress components in x and y direction respectively
= density of sea water
= air pressure at the surface
= viscosity coefficient
Further, it is noted that the North West European continental shelf is too small and too
shallow to generate tides internally. It features a tide that is driven by and cooscillates
with that in the Atlantic Ocean. Therefore, the 2D shallow water equations do not include
a tide generating volume force. The effect of tidal forcing is included by specifying tidal
signal along the open boundaries.
The Chezy coefficient for the bottom friction is largely empirical parameter and assumed to be a function of depth:

D ≤ 40m
 65,
C=
65 + (D − 40), 40m < D ≤ 65m

90,
D > 65m
(3.4)
On the other hand, the shear stress at the surface is dependent on the wind velocity. Without wind forcing, the shear stress at the free surface should be zero. With wind, the wind
stress is computed using the wind velocity at 10 m above the sea surface by
q
2
τx = Cd ρa u10 u210 + v10
(3.5)
q
2
τy = Cd ρa v10 u210 + v10
(3.6)
where ρa is the air density, Cd the air-sea drag coefficient, while u10 and v10 are the 10 m
wind velocity components in x and y directions, respectively.
To make the mathematical problem well-posed, the equations are supplemented by
boundary conditions. At closed boundaries, such as coastlines and dams, the boundary
condition is specified as
v⊥ = 0
(3.7)
which means that no inflow or outflow can occur through these boundaries. At open sea
boundaries, two boundary conditions are specified. The first is by specifying the inflow as
vk = 0
(3.8)
while the second is the tidal water level, which in its simplest form is given by
ξb (t) =
J
X
j=1
where:
Aj cos(ωj t + Gj )
(3.9)
3.3 Numerical approximations
24
ξb(t) = water level elevation at time t
Aj = amplitude of tidal constituent j
ωj = angular velocity of harmonic constituent j
Gj = phase of harmonic constituent j
Ten tidal constituents are used; those are M2, S2, N2, K2, O1, K1, Q1, P1, ν2, and L2.
The tidal constants (Hj , Gj ) were estimated by using results from encompassing models,
matched with nearby coastal and pelagic tidal data. For simulations that include meteorological effects, the effects of atmospheric pressure are parameterized by a correction term
at the boundaries. This correction is a function of the deviation from the average pressure
Pavg :
1
∆ξ = − (P − Pavg )
(3.10)
ρg
with Pavg = 1012 hPa. This term is responsible in initializing the model with approaching
oceanic depression systems.
3.3 Numerical approximations
The above system of equations has to be solved numerically. The numerical methods were
chosen to satisfy the following demands: unconditionally stable, at least second order consistency, suitable for both time-dependent and steady state problems, and computationally
efficient (Stelling, 1984).
The grid sizes were selected to ensure maximum resolution allowed by the available
o
1 o
. Covering the area
computing power. This resulted in a spherical grid size of 81 × 12
o
o
of the North-East European continental shelf from 12 W to 13 E and 48o N to 62.33oN,
the discretization results in 173 × 201 computational grid cells. The standard staggered
C-grid is used for spatial discretization.
The time step used for the integration is 10 minutes. To ensure numerical stability, an
implicit method is needed for temporal integration. However, an implicit method requires
a lot of computational cost. Here, the alternating direction implicit (ADI) method, introduced by Leendertse (1967) and later extended by Stelling (1984), is used. The advantage
of the ADI-method is that it is computationally cheap, which was required at the time this
simulation package was developed. The ADI-method splits one time step into two stages.
Each stage consists of half a time step. In both stages, the terms of the model equations
are solved in a consistent way. In the first half time step, the x-derivatives are solved
implicitly and in the second half time step the y-derivatives are solved implicitly. The
v-momentum equation is solved first explicitly in the first half time step, so that the new v
velocity components are available for the cross terms in the u-momentum equation. The
u-momentum equation is then coupled with the mass conservation equation and solved
implicitly. Similar way is performed on the second half time step, but first for the umomentum explicitly, followed by v-momentum and conservation of mass implicitly. For
a complete time step, each separate term of the equations is still a second-order consistent
The Dutch Continental Shelf Model
25
approximation to the differential equations.
∆x
k
vm,n
ukm,n+1
k
ξm+1,n+1
Dm,n
k
vm+1,n
ukm,n
k
ξm+1,n
∆y
k
ξm,n
n
Dm,n−1
k
vm+1,n−1
m
Figure 3.2: The computational grid.
As an illustration, we give below the finite difference equations to approximate the
homogeneous part of the system of equations described in the previous section, excluding
the horizontal viscos terms. Here, we denote the index of a grid cell in the (x,y) coordinate
by (m, n) and the time index by k, as illustrated by Figure (3.2).
step 1: from time k to k +
1
2
1
solve for v k+ 2 explicitly
k+ 1
k+ 1
k+ 1
k
k
k
2
2
− vm,n−1
vm,n2 − vm,n
vm,n+1
ξm,n+1
− ξm,n
k+ 1
k
+ ukm,n S+x [vm,n2 ] + vm,n
+g
+ (3.11)
∆t/2
2∆y
∆x
q
k+ 1
k )2
gvm,n2 (ukm,n )2 + (vm,n
k
=0
f um,n +
(H v )km,n C 2
3.3 Numerical approximations
26
1
1
solve for uk+ 2 and ξ k+ 2 implicitly
k+ 1
k+ 1
k+ 1
2
ξm+1,n
um,n2 − ukm,n
uk
− ukm−1,n
− ξm,n2
k+ 21
k
k+1 m+1,n
+ um,n
+ vm,n Soy [um,n ] + g
− (3.12)
∆t/2
2∆x
∆y
q
k+ 1
k+1
1
(ukm,n )2 + (vm,n2 )2
gu
k+ 2
m,n
=0
f vm,n +
(H u )km,n C 2
k+ 1
k+ 1
k+ 1
k
2
((H u u)m,n2 − (H u u)m−1,n
ξm,n2 − ξm,n
) ((H v v)km,n − (H v v)km,n−1 )
+
+
= 0 (3.13)
∆t/2
∆x
∆y
step 2: from time k + 21 to k + 1
solve for uk+1 explicitly
k+ 1
k+ 1
k+ 1
k+1
k+1
2
2
− ξm,n2
uk+1
ξm+1,n
k+ 1 um+1,n − um−1,n
k+ 1
m,n − um,n
+ um,n2
+ vm,n2 S+y [uk+1
− (3.14)
]
+
g
m,n
∆t/2
2∆x
∆x
q
k+ 1
k+ 1
k+1
gum,n (um,n2 )2 + (vm,n2 )2
k+ 21
=0
f vm,n +
k+ 1
(H u )m,n2 C 2
solve for v k+1 and ξ k+1 implicitly
k+ 1
k+ 1
k+ 1
k+1
k+1
k+1
2
vm,n
− vm,n2
ξm,n+1
− ξm,n
v 2 − vm,n−1
k+ 12
k+1
k+1 m,n+1
+ um,n Sox [vm,n ] + vm,n
+g
+ (3.15)
∆t/2
2∆y
∆x
q
1
k+1
k+1 )2 + (v k+ 2 )2
gv
(u
m,n
m,n
m,n
=0
f uk+1
m,n +
k+ 1
(H v )m,n2 C 2
k+ 1
k+ 1
k+ 1
k+1
v k+1
2
ξm,n
− ξm,n2
((H u u)m,n2 − (H u u)m−1,n
) ((H v v)k+1
m,n − (H v)m,n−1 )
+
+
= 0 (3.16)
∆t/2
∆x
∆y
where S+x [ ] and Sox [ ] are numerical operators for the cross advection terms, defined by
3um,n −4um,n−1 +um,n−2
, vm,n ≥ 0
2∆x
(3.17)
S+x [um,n ] =
−3um,n +4um,n+1 −um,n+2
, vm,n < 0
2∆x
um,n+2 + 4um,n+1 − 4um,n−1 − um,n−2
Sox [um,n ] =
(3.18)
12∆x
and S+y [ ] and Soy [ ] are defined similarly.
For the DCSM, the system of equations above is solved in a curvilinear coordinate system. The conversion of the coordinate system introduces additional terms in the equations.
For the details of the numerical treatment, the readers are referred to Stelling (1984).
A state-space representation of the discretized system of equations above is obtained
by defining an n-vector x, which contains the water level and velocities at all grid points
in the DCSM area. Since the model is nonlinear, we can write the dynamics of the DCSM
as
x(tk ) = M[x(tk−1 )]
(3.19)
where M consists of the coefficients representing the ADI scheme and also the input
forcings.
The Dutch Continental Shelf Model
3.4 Operational setting
The DCSM has been implemented, for example, for performing sea level rise scenario
studies, forecasting storm surge and generating boundary conditions for local coastal models (Gerritsen et al., 1995 and the literatures therein).
The DCSM has been implemented at the Royal Netherlands Meteorological Institute
(KNMI) for daily operational forecasting, where it is part of the automatic production line
(APL). KNMI’s APL has been developed to produce regular numerical forecasts with a
minimum of human intervention. The heart of the APL is KNMI’s limited area atmospheric model, HIRLAM, which currently provides an analysis of the state of the atmosphere and a forecast for up to 48 h ahead, four time per day. The time interval is 3 h, with
a resolution of approximately 22 km. Therefore, the wind products of HIRLAM have to
be interpolated before they can be used as input forcing the DCSM. In time the data is
interpolated linearly. Conversion from the grid of the atmospheric model to the DCSM
grid proceeds by a bilinear interpolation in space. However, since the wind over land is
usually considerably lower than over the sea, near the coasts the interpolation has been
modified to use only sea points of the atmospheric model for sea points in the DCSM.
In the operational system, the DCSM is coupled with a data assimilation based on a
steady-state Kalman filter to improve the forecast accuracy (Heemink and Kloosterhuis,
1990). This filter assimilates the selected water level observations from the tide gauges
along the British and Dutch coasts. This Kalman filter can correct the forecasts up to 12
h ahead if the forecasts deviate from the available water level observations.
The Kalman filter is intended to reduce the forecast uncertainty due to errors in the
wind input forcing. To do this, information about the error covariance of the wind input
forcing is needed as input to the Kalman filter. Here, the wind error variance is explicitly
modelled by assuming spatially homogeneity and isotropy. In reality, these assumptions
are likely to be violated. Spatial correlation, for example, is likely to be different for
different direction, hence anisotropic. Moreover, the wind error variance is likely to vary
in space. It is known that a Kalman filter will give an optimal state estimate to the extent of
the accuracy of the specified statistical description of the model error. A better description
of the model error statistics will lead to a better filter performance.
In this study, we propose and investigate a method, which avoids explicit modelling
of the error statistics, in the framework of steady-state Kalman filtering. The method is
an iterative algorithm for computing the steady-state Kalman gain of a stochastic system,
based on two samples of the system. Assuming the error process is stationary, the error
covariance is computed by averaging over time.
It is also known that model error actually evolves in time together with the actual
system state (Dee, 1995). Therefore, to improve the filter performance even further, we
have also extended this algorithm to work with more samples. By working with more
samples, it is possible to compute the error covariance by averaging over a shorter time.
This will relax the assumption that model error is stationary and it will describe better the
evolution of error statistics in time. In turn, this should give more accurate analyses.
27
28
3.4 Operational setting
As described earlier, in the operational system, the DCSM is used to forecast the
meteorological component of the total sea level. On the other hand, the astronomical
component is analyzed and predicted using harmonic analysis, because harmonic analysis
is more accurate than the DCSM in predicting the astronomical tide. However, it is found
that the residual of harmonic analysis always contains some harmonic components. This
fact offers a possibility for improving astronomical tides prediction. The first direction
taken in this research for improving the sea level forecast is to apply a method to analyze
and predict the remaining harmonic components in the harmonic analysis residual. These
predicted harmonic components are used as correction to the harmonic analysis forecasts.
This investigation is described in the next chapter.
4
Analysis and prediction of tidal
components within observed surge
using Kalman filter∗
4.1 Introduction
The operational water level prediction in the Netherlands is based on water level decomposition into astronomical and meteorological components. The astronomical components
are analyzed and predicted using harmonic analysis. However, the harmonic analysis
can not predict completely the astronomical components from the total water level. This
is indicated by the fact that the frequency spectrum of the observed surge, defined as
the difference between total water level and astronomical components predicted by using
harmonic analysis, contains a number of spikes centered at astronomical constituent frequencies [see, for example, Figure (4.1)]. This serves as a motivation to seek for a way of
improving tidal prediction by exploiting the information carried by the spikes.
The objective of the study presented in this chapter is to improve astronomical tides
prediction by extracting the remaining astronomical tidal components within observed
surge. In doing so, a Kalman filter is used to estimate the time-varying tidal components.
The approach taken here is based on the work of Heemink et al. (1991).
Kalman filters have been successfully used in numerous applications. Its application
for the tidal analyses can be found for example in the work of de Meyer (1984), where
Kalman filter is used to investigate possible non-stationarity of a multi input-single output
model to study tidal characteristics. Yen et al. (1996) used the Kalman filter to make
short-term tidal prediction in the case of only short period of observation is available.
In our study, instead of working with full tidal signal, we analyze and forecast the tidal
components within observed surge.
The chapter is organized as follows. Section 4.2 desribes the basic idea used in our
study to predict the periodic components of observed surge. It shows how each spike
∗
Paper in revised version appears in the proceeding of HydroPredict 2008
29
4.2 Approach
30
0.08
0.07
0.06
Amplitude [m]
0.05
0.04
0.03
0.02
0.01
0
0
20
40
60
80
100
120
Frequency [deg/hour]
140
160
180
200
Figure 4.1: Spectral density of observed surge measured at Vlissingen, 00:00 1 January
1999 - 31 December 1999 23:00
or narrow-band process can be considered as a periodic function with slowly varying
harmonic parameters. It describes also how we obtain a state space representation of
observed surge and its the associated Kalman filter set of equations. We proceed with a
discussion about the characteristics of overall estimation system in Section 4.3. In Section
4.4 the results using observed surge is presented and discussed, while in Section 4.5 some
conclusions of the study are drawn.
4.2 Approach
In this section, we describe the approach used in this study. We first describe how a
narrow-band process is modelled. It is followed by a description about the state space
representation of observed surge. In the end, a Kalman filter formulation for estimating
the time-varying harmonic parameters is given.
4.2.1 Narrow-band Processes
To provide the readers with a more complete information, in this section we present the
derivation of how a narrow-band process can be considered as a harmonic oscillation with
slowly varying amplitude, as can be found in Heemink et al. (1991).
Consider a weakly stationary stochastic process γ(t) described by Wong (1971):
Z ∞
γ(t) =
eiωt dZ(ω)
(4.1)
−∞
where Z(ω) is a complex valued stochastic process with zero mean and with independet
increments such that:
(4.2)
E{dZ(ω)dZ(ω)} = S(ω)dω
Analysis and prediction of tidal components within observed surge using Kalman filter
31
with S(ω) as the two-sided energy spectral density function of the process γ(t).
Since γ(t) is a real process, then its Fourier transforms have the following porperties
(Maybeck 1979):
1. dZ(ω) is complex in general, with dZ(−ω) = dZ(ω), where dZ(ω) denotes complex conjugate.
2. The real part of dZ(.) is an even function of ω and the imaginary part is odd in ω
Using these properties we can rewrite Equation (4.1) as:
Z ∞
Z ∞
γ(t) =
cos(ωt)dV (ω) +
sin(ωt)dW (ω)
−∞
(4.3)
−∞
and dW (ω) = i dZ(ω)−dZ(−ω)
are mutually independent
where dV (ω) = dZ(ω)+dZ(−ω)
2
2
processes, both real and with independent increments such that:
E[dV (ω)dV (ω)] = E[dW (ω)dW (ω)] = S(ω)dω.
(4.4)
If γ(t) is a narrow-band process, the spectral density function has a form as shown in
Figure 4.2. Here ∆ω ≪ ωm . In this case Equation (4.3) can be rewritten as follows:
∆ω
0
−ωm
ωm
ω
Figure 4.2: Spectral density function of a narrow band process
γ(t) =
Z
∞
cos([ωm + µ]t)dV (µ) +
−∞
Z
∞
sin([ωm + µ]t)dW (µ),
(4.5)
−∞
Z ∞
Z ∞
γ(t) = cos(ωm t)[
cos(µt)dV (µ) +
sin(µt)dW (µ)]
−∞
−∞
Z ∞
Z ∞
+sin(ωm t)[
sin(µt)dV (µ) +
cos(µt)dW (µ)],
−∞
−∞
(4.6)
4.2 Approach
32
which eventually can be written as
γ(t) = Am (t)cos(ωm t) + Bm (t)sin(ωm t),
where the mutual independent stochastic processes:
Z ∞
Z ∞
Am (t) =
cos(µt)dV (µ) +
sin(µt)dW (µ),
−∞
−∞
Z ∞
Z ∞
Bm (t) =
−sin(µt)dV (µ) +
cos(µt)dW (µ)
−∞
(4.7)
(4.8)
−∞
vary slowly in time since their spectral density:
Sm (µ) = S(ω − ωm )
(4.9)
is concentrated near µ = 0.
From this, we conclude that a narrow band process can be considered as a harmonic
oscillation with slowly varying amplitude:
p
(4.10)
A(t) = Am (t)2 + Bm (t)2
and a slowly varying phase:
φ(t) = arctan[Bm (t)/Am (t)]
(4.11)
4.2.2 Surge model
The spectral density of a tidal signal consists of a number of narrow band peaks as for
example shown in Figure 4.1. This figure shows the spectral density of the water elevation
in Vlissingen, which was taken from 00:00 January 1, 1999 until 23:00 December 31,
1999 with time interval 10 minutes. Similar pattern is observed at other locations as well.
This fact suggests that the water level ξ(t) can be approximated by the sum of a number
of narrow band processes:
X
X
Ai (t)cos(ωi t) + Bi sin(ωi t)
(4.12)
ξ(t) ∼
γi (t) =
=
i
i
In order to derive a state space model for ξ(t), the power spectral density function
Si (µ) of Ai (t) and Bi (t) is parameterized according to:
Si (µ) =
σi2
a2i + µ2
(4.13)
For small values of ai , Si (µ) is a narrow-band spectrum. The autocovariance ri (t) of Ai (t)
and Bi (t) is given by
Z ∞
σ2
ri (t) =
eiµt Si (µ)dµ = i e−ai |t|
(4.14)
2ai
−∞
Analysis and prediction of tidal components within observed surge using Kalman filter
33
As shown by Jazwinski, processes Ai (t) and Bi (t) with autocovariance (4.14) can be
modelled by means of stochastic differential equations, which read
dAi (t) = −ai Ai (t)dt + σi dα(t)
dBi (t) = −ai Bi (t)dt + σi dβ(t)
(4.15)
where α(t) and β(t) are mutually independent stochastic processes with zero mean and
unit variance and with independet increments such that:
E[dα(t)dα(t)] = E[dβ(t)dβ(t)] = dt.
(4.16)
Since σi is constant, the stochastic differential equations above are the same in both the
Ito and Stratonowitz sense. Using the Euler scheme, these equations can be approximated
by:
A¯i (tk+1 ) = (1 − ∆tai )A¯i (tk ) + σi ∆α(tk ),
¯i (tk+1 ) = (1 − ∆tai )B
¯i (tk ) + σi ∆β(tk ).
B
(4.17)
(4.18)
¯i (tk ) represent the numerical approximation of Ai (tk ) and Bi (tk ) rewhere A¯i (tk ) and B
spectively. Furthermore, ∆t is the time step and ∆αi (tk ) and ∆βi (tk ) are assumed to be
mutually independent random variable with zero mean and variance ∆t.
The state space representation of these equations can be written as
X(tk ) = MX(tk−1 ) + Gη(tk−1)
(4.19)
where the state vector X(tk ) is defined as:
¯1 (tk ) . . . A¯i (tk )B
¯i (tk ) . . .]⊤ ,
X(tk ) = [A¯1 (tk )B
(4.20)
and η(tk ) consists of the random variables ∆αi (tk ) and ∆βi (tk ). Because ∆αi (tk ) and
∆βi (tk ) are assumed to be mutually independent random variable with zero mean and
variance ∆t, the covariance matrix Q of η(tk ) can be shown to be a diagonal matrix
of I∆t. M and G are coefficient matrices that consist of the terms (1 − ∆tai ) and σi
respectively.
The measurement Y(tk ) of water level ξ(t) is represented by the relation:
Y(tk ) = H(tk )X(tk ) + ν(tk )
(4.21)
where ν(tk ) is the measurement noise, assumed to be Gaussian white noise with variance
R, and H(tk ) is the measurement vector given as
H(tk ) = [cos(ω1 tk )sin(ω1 tk ) . . . cos(ωitk )sin(ωi tk ) . . .].
(4.22)
Now that we have both system and measurement models, we combine them using
Kalman filter to analyze and predict the astronomical components within observed surge.
4.3 System characteristics
34
4.2.3 Kalman filter
The objective of the approach here is to estimate the harmonic parameters X(tk ) from
all available realization ξ o (tk ) of observed surge Y(tk ) and use it to forecast the periodic
components of the harmonic analysis residual in the future. The estimation of the timevarying tidal components within observed surge is performed by using a Kalman filter.
Following the description in Chapter 2, the equations set of the Kalman filter for the
equations system (4.19)-(4.22) reads:
xf (tk ) = Mxa (tk−1 )
(4.23)
⊤
(4.24)
x (tk ) = x (tk ) + K(tk )(ξ (tk ) − H(tk )x (tk ))
(4.25)
f
a
⊤
P (tk ) = MP (tk−1 )M + GQG
a
f
o
f
Pa (tk ) = (I − K(tk )H(tk ))Pf (tk )
K(tk ) = Pf (tk )H(tk )⊤ [H(tk )Pf (tk )H(tk )⊤ + R]−1
(4.26)
(4.27)
with initial conditions
xa (t0 ) = xa0
a
P (t0 ) =
Pa0
(4.28)
(4.29)
Here, since we usually do not know the initial condition exactly, we set Pa (t0 ) to be very
large so that the effect of inexact initial condition will vanish quickly.
4.3 System characteristics
We have described the system model as well as the Kalman filter used for estimating
the time-varying harmonic parameters. In this section, we are going to discuss some
characteristics of the estimator scheme. Here, we analyze its frequency response and
examine the possible crossover between output of different tidal frequencies. In doing so,
the following set of tidal constituents is used: A0, O1, K1, MU2, M2, S2, 2SM2, 3MS4,
M4, MS4, M6, 2MS6, M8, 3MS8, 4MS10. For each tidal constituent, we set the colored
noise standard deviation σi = 0.02 m and ai = 2.3×10−5 min−1, which is equivalent with
a correlation time of 1 month. In addition, we also used an AR(1) process as in Equation
(4.17) to model the meteorological component with a = 7 × 10−4 min−1 that is equivalent
with a correlation time of 1 day, and σ = 0.1 m. The sampling time parameter ∆t is
10 minutes, which is equal to the sampling time of the operational water-level observing
system in the Netherlands.
4.3.1 Bode diagram
To have an idea about how the Kalman filter works for the system described previously,
it is useful to present how it will respond to different frequency components of the input
Analysis and prediction of tidal components within observed surge using Kalman filter
35
signal. To present the frequency response characteristics of the Kalman filter scheme
we used one of the most commonly used method, which is the Bode diagram (see, for
example, Ogata 1990).
Since generating Bode diagram requires the system to be linear and time-invariant, we
need to transform the estimation system such that it becomes time-invariant. To do this,
we make use of the rotational transformation matrix and the relation between the original
˜ k ) is written as:
X(tk ) and the new states X(t
˜ k)
X(tk ) = Tk X(t
(4.30)
where for each constituent with frequency ωi , the transformation matrix Tk is given by
cos(ωi tk ) −sin(ωi tk )
Tk =
(4.31)
sin(ωi tk ) cos(ωi tk )
The transformation is linear, hence the new system representation is also linear. Under
this transformation, the state elements corresponding to tidal constituent i at time tk read
A
cos(ω
t
)
+
B
sin(ω
t
)
i
i
k
i
i
k
˜ k) =
(4.32)
X(t
Bi cos(ωitk ) − Ai sin(ωi tk )
and the new system matrix is given by
˜ = T(k + 1) MT(k) =
M
−1
αcos(ωi∆t) αsin(ωi ∆t)
−αsin(ωi ∆t) αcos(ωi ∆t)
(4.33)
where α = 1 − ∆ta. The transformation is also performed on the error term η(tk ) and its
˜
coefficient matrix G, which gives η(t˜k ) and G:
σcos(ωi ∆t) σsin(ωi ∆t)
−1
˜
(4.34)
G = T(k + 1) GT(k) =
−σsin(ωi ∆t) σcos(ωi ∆t)
˜
and to the measurement matrix H
˜ = 1 0 1 0 ... 1 0 1 1
H
(4.35)
˜ η(tk−1 )
˜ X(t
˜ k) = M
˜ k−1 ) + G˜
X(t
˜ k )X(t
˜ k ) + ν(tk )
Y(tk ) = H(t
(4.36)
The transformed state space representation now reads
(4.37)
Since the coefficient matrices as described in Equation (4.33), (4.34), and (4.35) are
time-invariant, it is now possible to perform steady-state analysis. Based on this new
model representation, the corresponding Kalman filter equations can be derived. Express˜ a , the estimation system now reads
ing the estimate in terms of the analysis state x
˜ H)
˜ M˜
˜ xa (tk−1 ) + Ku(t
˜ k−1 )
˜ a (tk ) = (I − K
x
a
Y(tk ) = C˜
x (tk )
(4.38)
(4.39)
4.3 System characteristics
36
˜ is the steady state Kalman gain, u(tk ) the input to the system (i.e. observed surge
where K
measurement at time tk ), C is a diagonal matrix with diagonal elements (1 0 1 0 ... 1 0 1
1) and Y the output of the system. Each non-zero element of Y gives signal contributed
by a tidal constituent. The system of Equation (4.38) - (4.39) shows that the Kalman filter
scheme is a single-input multiple-output system.
Figure 4.3 and 4.4 present the Bode diagram for several components at one time step.
The figures show that at each output frequency the Kalman filter has maximum response
at the respective frequency and rapid decreasing side-lobes. From these diagrams we can
learn that in producing the estimates of each component, the system will take the energy
at the corresponding tidal frequency and its surrounding. It acts as a bank of band-pass
filter centered at the analysis component frequencies. Moreover, the diagram of any one
component shows significant decrease at other analysis component’s frequencies. This
shows that for any single output component the Kalman filter scheme tends to minimize
the effect of other analysis components.
O1
1
0.5
0
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
Frequency [deg/hour]
70
80
90
100
K1
1
0.5
0
MU2
1
0.5
0
M2
1
0.5
0
S2
1
0.5
0
Figure 4.3: Bode diagram, magnitude
It should be noted that the frequency response represented by the Bode diagram is
independent of measurement. It is completely determined by the choice of components
included in the Kalman filter and their statistics. Therefore we can adjust the filter even
before the measurement is available.
Analysis and prediction of tidal components within observed surge using Kalman filter
O1
200
0
−200
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
Frequency [deg/hour]
70
80
90
100
K1
200
0
−200
MU2
200
0
−200
M2
200
0
−200
S2
200
0
−200
Figure 4.4: Bode diagram, phase
4.3.2 Cross Over
From the previous section, we have learned that the estimation system operates as a set
of band-pass filters centered at the frequency of the tidal constituents included in the
estimation system. The bandwidth of the frequency response at each tidal constituent is
reasonably small to expect that the output will mainly consist of each component’s energy.
We have also performed an experimental study using generated signal to see if cross-over
between different tidal constituents occur. Here, we want to check if the existence of a
tidal constituent within a signal affects the estimate of other constituents.
Two main experiments are performed. The first one uses generated signal with constant harmonic parameters and the other with varying harmonic parameters. The signal is
generated using the measurement model given in Equation (4.21), with the parameters as
described previously. For the case of signal with constant harmonic parameters we used
for all components A = 0.7 and B is chosen such that the amplitude equals one and set
both ai and σi equal 0.
The experiment consists of several steps. At each step, a signal is generated using one
tidal constituent. The signal is then used as input to the Kalman filter. The output of the
Kalman filter, which is the estimates of the harmonic parameters of all tidal constituents,
37
38
4.3 System characteristics
is recorded at all time. Using these harmonic parameters, we reconstruct the signal as
contributed by each output tidal constituent. To obtain an idea about the size of the contribution, we use the root-mean-square (RMS) of the output signal corresponding to each
tidal constituent. The experiment is repeated for all tidal constituents mentioned earlier.
When there is no cross-over, the output will be zero at all frequency constituents, except
at the constituent with the same frequency as the input signal. The signal is generated for
the period of 120 days. The first 40 days are excluded in computing RMS of the amplitude
estimates to eliminate the initialization effect. The results are shown in Table 4.1 for the
signal with constant harmonic parameters and Table 4.2 for the varying one.
Output
O1
K1
MU2
M2
S2
2SM2
3MS4
M4
MS4
M6
2MS6
M8
3MS8
4MS10
A0
Weather
Input
O1
K1 MU2 M2
S2 2SM2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0
0.7389 0.0536 0.0045 0.0042 0.0040 0.0038 0.0016 0.0016 0.0016 0.0010 0.0010 0.0008 0.0008 0.0006 0.0024
0.0543 0.7547 0.0049 0.0046 0.0043 0.0041 0.0017 0.0017 0.0016 0.0010 0.0010 0.0008 0.0008 0.0006 0.0023
0.0048 0.0052 0.8212 0.0640 0.0321 0.0215 0.0024 0.0023 0.0023 0.0013 0.0012 0.0009 0.0009 0.0007 0.0013
0.0044 0.0047 0.0622 0.7951 0.0621 0.0312 0.0024 0.0023 0.0022 0.0012 0.0012 0.0009 0.0009 0.0007 0.0012
0.0041 0.0044 0.0313 0.0623 0.7979 0.0625 0.0025 0.0024 0.0023 0.0013 0.0012 0.0009 0.0009 0.0007 0.0012
0.0041 0.0043 0.0217 0.0324 0.0647 0.8315 0.0027 0.0026 0.0025 0.0013 0.0013 0.0009 0.0009 0.0007 0.0012
0.0018 0.0019 0.0025 0.0026 0.0027 0.0028 0.8688 0.0677 0.0340 0.0024 0.0023 0.0012 0.0012 0.0009 0.0007
0.0017 0.0018 0.0023 0.0024 0.0025 0.0026 0.0655 0.8372 0.0655 0.0024 0.0023 0.0012 0.0012 0.0008 0.006
0.0018 0.0018 0.0024 0.0024 0.0025 0.0026 0.0341 0.0678 0.8706 0.0025 0.0024 0.0013 0.0013 0.0009 0.0007
0.0012 0.0012 0.0013 0.0014 0.0014 0.0014 0.0024 0.0025 0.0026 0.8953 0.0699 0.0025 0.0024 0.0013 0.0005
0.0012 0.0012 0.0013 0.0013 0.0014 0.0014 0.0023 0.0024 0.0025 0.0700 0.8958 0.0026 0.0025 0.0013 0.0005
0.0009 0.0009 0.0010 0.0010 0.0010 0.0010 0.0013 0.0013 0.0013 0.0025 0.0026 0.9015 0.0704 0.0024 0.0003
0.0009 0.0009 0.0009 0.0009 0.0010 0.0010 0.0013 0.0013 0.0013 0.0024 0.0025 0.0704 0.9018 0.0025 0.0003
0.0007 0.0007 0.0008 0.0008 0.0008 0.0008 0.0009 0.0009 0.0010 0.0013 0.0014 0.0026 0.0027 0.9665 0.0003
0.0043 0.0039 0.0022 0.0021 0.0020 0.0019 0.0011 0.0011 0.0010 0.0007 0.0007 0.0005 0.0005 0.0004 0.3041
0.1576 0.1472 0.0812 0.0782 0.0756 0.0734 0.0403 0.0395 0.0389 0.0265 0.0262 0.0199 0.0197 0.0159 0.3773
Analysis and prediction of tidal components within observed surge using Kalman filter
Table 4.1: Amplitude RMS with constant harmonic parameters
39
40
Table 4.2: Amplitude RMS with varying harmonic parameters
Output
4.3 System characteristics
O1
K1
MU2
M2
S2
2MS2
3MS4
M4
MS4
M6
2MS6
M8
3MS8
4MS10
A0
Weather
Input
O1
K1 MU2 M2
S2 2SM2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0 Weather
0.5297 0.2056 0.0202 0.0181 0.0175 0.0191 0.0106 0.0092 0.0099 0.0079 0.0072 0.0066 0.0059 0.0060 0.0083 0.0819
0.2058 0.5524 0.0268 0.0207 0.0193 0.0195 0.0100 0.0101 0.0102 0.0078 0.0073 0.0065 0.0062 0.0061 0.0098 0.0876
0.0215 0.0255 0.6451 0.2802 0.1620 0.1086 0.0165 0.0142 0.0156 0.0100 0.0096 0.0077 0.0071 0.0068 0.0059 0.0661
0.0212 0.0207 0.2602 0.6050 0.2661 0.1561 0.0147 0.0146 0.0144 0.0091 0.0091 0.0070 0.0070 0.0066 0.0049 0.0522
0.0217 0.0203 0.1469 0.2559 0.6099 0.2743 0.0163 0.0145 0.0156 0.0090 0.0089 0.0073 0.0070 0.0067 0.0050 0.0419
0.0224 0.0245 0.1139 0.1544 0.2793 0.6652 0.0189 0.0171 0.0168 0.0107 0.0099 0.0076 0.0075 0.0071 0.0054 0.0527
0.0100 0.0111 0.0174 0.0185 0.0174 0.0205 0.7167 0.3243 0.1996 0.0203 0.0201 0.0113 0.0109 0.0091 0.0038 0.0369
0.0101 0.0099 0.0148 0.0151 0.0147 0.0170 0.2985 0.6626 0.3112 0.0204 0.0188 0.0105 0.0106 0.0091 0.0035 0.0338
0.0115 0.0116 0.0155 0.0168 0.0166 0.0181 0.1872 0.3169 0.7305 0.0223 0.0218 0.0115 0.0109 0.0092 0.0036 0.0377
0.0088 0.0086 0.0120 0.0103 0.0105 0.0107 0.0214 0.0216 0.0234 0.7599 0.3644 0.0252 0.0237 0.0143 0.0031 0.0322
0.0088 0.0085 0.0101 0.0108 0.0110 0.0110 0.0202 0.0203 0.0222 0.3557 0.7741 0.0264 0.0253 0.0143 0.0032 0.0330
0.0069 0.0071 0.0080 0.0083 0.0082 0.0087 0.0124 0.0113 0.0123 0.0258 0.0270 0.7707 0.3742 0.0289 0.0029 0.0266
0.0071 0.0072 0.0077 0.0079 0.0078 0.0083 0.0111 0.0115 0.0128 0.0243 0.0255 0.3643 0.7860 0.0304 0.0028 0.0256
0.0073 0.0075 0.0076 0.0072 0.0075 0.0078 0.0108 0.0100 0.0108 0.0159 0.0159 0.0311 0.0330 0.9125 0.0030 0.0290
0.0118 0.0115 0.0080 0.0072 0.0070 0.0075 0.0055 0.0050 0.0054 0.0046 0.0045 0.0038 0.0037 0.0039 0.3486 0.1445
0.4358 0.4308 0.3043 0.2716 0.2648 0.2891 0.2109 0.1910 0.2060 0.1759 0.1732 0.1469 0.1448 0.1497 0.6457 0.8689
Analysis and prediction of tidal components within observed surge using Kalman filter
For the case of signal with constant harmonic parameters as shown in Table 4.1 we see
that for each tidal component as input the cross-over is of very small percentage. They
are only about 0.1%. The greatest cross-over occur at the neighboring components, which
are very close in frequency to the input signal. This is consistent with the fact that the
accuracy of a frequency analysis is determined by the closest two frequency lines within
the system. For such output components, the cross-over may reach 2%.
However for the case of varying harmonic parameters, which is more realistic than the
constant parameters case, the cross-over is more pronounced with the neighboring components being the worst. The cross-over at the neighboring component can reach 40% as
shown in Table 4.2. This occurs because a signal generated using varying amplitude and
phase will occupy a certain frequency band instead of a single frequency line, eventhough
it is generated by using a single frequency. When the variation parameters are chosen such
that the signal has a rather wide band, a significant amount of energy may extend until the
neighboring tidal frequencies. As the result, the output at these neighboring components
will be non-zero. It should be noted here that when the input is the meteorological component, modelled by using AR(1) model, all the output components will yield significant
non-zero values. This may suggest that in fact the meteorological component also has
energy within tidal frequencies.
4.4 Results and discussion
We have implemented the Kalman filter scheme using the observed surge, the same way
as we performed the experiments using generated signal described in the previous section.
One of the problems of using real signal is to select the tidal constituents to be included in
the analysis. As a result of non-linear processes in the shallow-water area, the water level
may contain a lot of tidal constituents. Visual inspection on the spectrum of the surge may
reveal that it has many spikes around tidal frequencies. Therefore, we also performed a
study using models with different set of tidal constituents.
Three sets of tidal constituents are used in the study. The first set of components,
called Set A, consists of one component per tidal species: A0, M2, M4, MSN6, M8,
4MS10, 3MNKS12. The second set, Set B, is the same with the one used in the previous
section. For the third set, Set C, we use 38 tidal constituents that are identified by visual
inspection on the spectrum of observed surge. In addition to the tidal components, an
AR(1) model is used for representing the meteorological component for each set.
Based on the three sets of tidal constituents, we have developed the corresponding
Kalman filter schemes and used them to estimate the harmonic parameters of the tidal
components in the observed surge. For each set of components, the parameters ai and σi ,
are chosen by trial and error to match the theoretical spectrum of a signal generated with
these parameters with the spectrum of the observed surge.
For illustration, Figure 4.5 and 4.6 present the estimates of harmonic parameter of
several components obtained by using the Kalman filter scheme with Set B. We see from
41
4.4 Results and discussion
42
3MS8
M8
2MS6
M6
MS4
M4
3MS4 2SM2
S2
M2
MU2
K1
O1
these figures that the harmonic parameters of the tidal components in the surge are varying
in time.
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0
50
100
150
200
250
Time since 1 January 1999 [day]
300
350
400
Figure 4.5: Amplitude estimates of several components using Set B, Vlissingen. The y-axis
gives the amplitude measured in meter.
The estimates of the harmonic parameters obtained by using the three Kalman filter
schemes are then used to predict the tidal components within the surge. The performance
of the prediction is measured in term of variance reduction percentage defined as
R = (1 −
var(S − C)
)100%
var(S)
(4.40)
3MS8
M8
2MS6
M6
MS4
M4
3MS4 2SM2
S2
M2
MU2
K1
O1
Analysis and prediction of tidal components within observed surge using Kalman filter
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
200
0
−200
0
50
100
150
200
250
Time since 1 January 1999 [day]
300
350
400
Figure 4.6: Phase estimates of several components using Set B, Vlissingen. The y-axis gives
the phase measured in degree.
To have also an idea about how the performance changes with respect to the length
of prediction period, we checked the performance for five different prediction periods: 10
minutes, 1, 6, 12, and 24 hours. Besides Vlissingen, we also used data taken in four other
locations: Scheveningen, Den Helder, Terschellingen, and Euro Platform. The prediction
performance is evaluated in term of the percentage of variance reduction, defined as: The
prediction performance using these data are depicted in Figure 4.7 - 4.11.
These figures shows that the performance of Set C is the best among the three sets used
in the experiment, in both the short and longer term prediction. Set A tends to perform
43
4.4 Results and discussion
44
90
80
70
Variance reduction [%]
60
50
40
30
20
10
10 min
1 h
6 h
12 h
24 h
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(a)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
Aug
Sep
Oct
Nov
Dec
(b)
90
80
70
Variance reduction [%]
60
50
40
30
20
10
10 min
1 h
6 h
12 h
24 h
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
(c)
Figure 4.7: Variance reduction obtained using several correctors with (a) Set A, (b) Set
B, and (c) Set C, Vlissingen
better than Set B in the short term prediction, but worse in longer term. The findings
suggest that the more components included in the analysis, the better the performance of
the Kalman filter scheme. This may be due to the fact that model with more components
describes the data more completely.
Analysis and prediction of tidal components within observed surge using Kalman filter
45
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(a)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(b)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(c)
Figure 4.8: Variance reduction obtained using several correctors with (a) Set A, (b) Set
B, and (c) Set C, Scheveningen
Examining the results obtained using Set C, we see that the best performance is
achieved in the period between May and September. In some locations it also includes
the period of April and October. In this period of time, the weather was relatively calm.
The results indicate that the variance reduction is relatively large and the performance
4.4 Results and discussion
46
90
80
70
Variance reduction [%]
60
50
40
30
20
10
10 min
1 h
6 h
12 h
24 h
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(a)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
Aug
Sep
Oct
Nov
Dec
(b)
90
80
70
Variance reduction [%]
60
50
40
30
20
10
10 min
1 h
6 h
12 h
24 h
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
(c)
Figure 4.9: Variance reduction obtained using several correctors with (a) Set A, (b) Set
B, and (c) Set C, Den Helder
degrades relatively slowly in forecast time. Here, the variance reduction achieved using
24-hour prediction ranges between 20% in Scheveningen up to 60% in Europlatform. The
performance difference between analysis locations also suggest difference in local tidal
characteristics. It seems that the astronomical tides at off-shore is relatively easier to
Analysis and prediction of tidal components within observed surge using Kalman filter
47
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(a)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(b)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(c)
Figure 4.10: Variance reduction obtained using several correctors with (a) Set A, (b) Set
B, and (c) Set C, Terschelingen
predict than at coastal area.
However, in other period the short term performance is worse and the prediction performance degrades more quickly in prediction time. As can be seen in Figure 4.8, 4.9, and
4.11, in some locations the 24-hour prediction even yields negative variance reduction. A
4.4 Results and discussion
48
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(a)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(b)
90
10 min
1 h
6 h
12 h
24 h
80
70
Variance reduction [%]
60
50
40
30
20
10
0
−10
−20
Feb
Mar
Apr
May
Jun
Jul
Period
Aug
Sep
Oct
Nov
Dec
(c)
Figure 4.11: Variance reduction obtained using several correctors with (a) Set A, (b) Set
B, and (c) Set C, Europlatform
visual inspection on the time series shows that within this period the meteorological component tends to have very large amplitude. This unsatisfactory performance is likely to
be due to large cross-over of the meteorological component into the frequency bands of
the astronomical constituents. The experiment described in Section 4.3.2 shows that, with
Analysis and prediction of tidal components within observed surge using Kalman filter
meteorological component as input signal, the Kalman filter gives non-zero estimates of
all other tidal constituents. It indicates that the frequency band-width of a large meteorological component extends to higher frequencies, which in turn affects the estimates
of harmonic parameters of the other tidal constituents. Moreover, since meteorological
components usually last within a short period like one day, its tidal frequencies energy
also last very shortly. This means that during this period the harmonic parameters will
vary relatively quickly.
It should be noted that the Kalman filter scheme used in this study has been successful
in extracting the harmonic components of the observed surge. This is confirmed by the
fact that most of the energy at tidal frequencies have been eliminated from the observed
surge, as can be seen in Figure 4.12. However, due to cross-over, the energy within the
tidal bands may be contributed from several components. It is not possible for this method
to separate the components having the same frequency. Since the prediction scheme used
in this method is based on the signal decomposition into its frequency components, the
prediction using this scheme is only possible if the harmonic parameters of these frequency components are reasonable stationary. However the results show that it seems that
the data is not stationary especially during stormy period. In this period, the assumption
that the harmonic parameters are slowly-varying during the prediction period does not
hold anymore. The experiment results indicate that, within stormy period, the harmonic
parameters seem to be strongly non-stationary. This fact suggests that the surge-tide interaction within this period is significant. Therefore, to obtain a better prediction the study
about the nonlinear surge-tide interaction seems to be a must.
Another point worth mentioning is the fact the Kalman filter scheme takes only a
relatively short period of observation to yield the estimates. Therefore it is not possible
to have well separated frequency lines in the output. To obtain well-separated frequency
lines one needs a long observation period as in classical harmonic analysis. However this
has disadvantage that one will miss the short-duration variation in the parameters.
4.5 Conclusion
This chapter presents a method for analysis and prediction of the time varying tidal components from a harmonic analysis residual. The method is based on modelling the residual as the sum of some tidal oscillations with slowly varying harmonic parameters. The
slowly varying parameters are modelled by using stochastic difference equations and estimated by using a Kalman filter. This method has been implemented by using real data
taken from several locations in the North Sea. For the calm period May-October 1999,
it yields improvement ranging from 20% to 60% depending on the location. However,
it does not perform well during the other period. This may be due to the nonlinear interaction between meteorological and astronomical components. Another possible cause
is the cross-over between different components, which can not be separated without any
additional independent information. This hampers our approach from producing accurate
49
4.5 Conclusion
50
Vlissingen
Amp [m]
Amp [m]
Vlissingen
0.04
0.02
0
0.04
0.02
0
Scheveningen
Amp [m]
Amp [m]
Scheveningen
0.04
0.02
0
0.04
0.02
0
Den Helder
Amp [m]
Amp [m]
Den Helder
0.04
0.02
0
0.04
0.02
0
Terschelling N
Amp [m]
Amp [m]
Terschelling N
0.04
0.02
0
0.04
0.02
0
Europlatform
Amp [m]
Amp [m]
Europlatform
0.04
0.02
0
0
20
40
60
80 100 120
Frequency [deg/hour]
140
160
180
200
0.04
0.02
0
0
20
40
60
80 100 120
Frequency [deg/hour]
140
160
180
200
Figure 4.12: Fourier spectrum of observed surge without correction (left panels) and
with correction obtained using Set C (right panels)
prediction during stormy periods.
5
Two-sample Kalman filter for steady
state data assimilation∗
5.1 Introduction
One of the main goals of data assimilation is to improve the forecast performance of
a dynamical model. Many data assimilation algorithms are based on Kalman filtering,
where the state of a system is estimated by combining all the information that is available
about the system in accord with their statistical uncertainty. The main computational
issue in Kalman filter based data assimilation is the propagation of the forecast error
covariance matrix. A number of methods have been proposed to solve this problem,
for example the ensemble Kalman filter (EnKF) proposed by Evensen (1994). In EnKF,
the forecast error covariance matrix is approximated by using a Monte Carlo method.
Another method, which can also be considered as an ensemble method, is the reduced
rank square root (RRSQRT) Kalman filter proposed by Verlaan and Heemink (1997). The
RRSQRT Kalman filter approximates the error covariance by a matrix of a lower rank.
An overview about the ensemble assimilation development can be found for example in
Evensen (2003).
A less computationally demanding assimilation algorithm is based on the steady state
solution of the Kalman filter as suggested by Heemink and Kloosterhuis (1990). This
approach is built on the fact that for systems where the covariance of the forecast error
is almost time invariant, it is possible to compute the Kalman gain off-line until it has
reached approximately a constant solution. Applying the constant Kalman gain in a data
assimilation system reduces the computational demands to almost the same as the standard model execution without data assimilation. However, this approach still requires the
solution of the Riccati equation used subsequently in the Kalman filter formulation or it
requires a more elaborate scheme for the generation of the steady state gain. Heemink
and Kloosterhuis (1990) used simpler dynamics for propagating the covariance matrix
and a Chandrasekhar-type algorithm to compute a steady state Kalman gain. Another
∗
This chapter is based on Sumihar et al. 2008, Monthly Weather Review, 136, pg.4503-4516
51
52
5.1 Introduction
approximation can be introduced by defining the error at a coarser grid size (Fukumori
and Malanotte-Rizzoli 1995). The usage of a simpler dynamical model propagator for
the error propagation is also suggested by Dee (1991). Sørensen et al. (2004a) computed
the steady gain as a long-term average of the Kalman gains estimated by an EnKF and
demonstrated its performance in the North and Baltic sea system. The EnKF, however,
requires a large number of ensemble members to get the ensemble mean statistics.
In this chapter, a new method for computing the steady gain is proposed, which is
based only on two forecast realizations. If the covariance of forecast error is almost time
invariant we can use time average instead of ensemble mean. This has the advantages
of being computationally very cheap and very easy to implement. The two forecast realizations may be generated with two differing but similarly skillful prediction models.
Moreover, it is also possible to use the algorithm for the case where the two realizations
are generated by running the dynamical model twice, driven by two different inputs. In
oceanography, for example, it becomes increasingly popular to compute the forecast error
statistics by using two different wind fields, produced by two different atmospheric models, as input to an oceanic model under study (Alves and Robert 2005, Leeuwenburgh
2007). In this case no explicit model error specification is required.
The proposed algorithm is based on the assumption that the error process of the system of interest is weakly stationary. That is, the mean and covariance of the forecast error
of the dynamical system being studied is constant in time. However, the algorithm may
also be applicable for a system where the error statistics vary slowly in time. The proposed algorithm makes use of two different forecasts performed for the same period. The
difference between the two forecasts provides information on the forecast error statistics.
In order to estimate the forecast error statistics from the time series of these differences,
it is necessary to assume that the noise process is ergodic. In this case the covariance
estimates can be computed by averaging the realization over time. In practice, ergodicity
assumption may be taken for a stationary random process with a short correlation time. If
we can select some samples of the random process at time interval sufficiently longer than
its correlation time, the samples are effectively uncorrelated. This allows us to consider
the series as independent realizations of the same distribution. In this case, averaging over
time is equivalent to averaging over ensembles. The covariance estimate is used to define
a gain matrix for a sequential steady-state analysis system. The proposed algorithm also
consists of an iterative procedure for improving the covariance estimates. The procedure
requires a fixed observing system during the iteration.
The algorithm proposed in this chapter may be applied to systems where a steady state
Kalman filter can be applied. A number of applications of the steady state Kalman filter
can be found for example in shelf sea modellings (e.g. Canizares et al. 2001, Verlaan et al.
2005, Sørensen et al. 2006) and coastal sea modellings (e.g. Heemink 1990, Oke et al.
2002, Serafy and Mynett 2008). The application of an approximate Kalman filter, which
includes the use of stationary error covariance, can also be found in oceanography (e.g.
Fukumori et al. 1993). The effectiveness of the stationary assumption was examined by
Fu et al. (1993) where the resulting solution was found to be statistically indistinguishable
Two-sample Kalman filter for steady state data assimilation
from the one obtained using the full Kalman filtering. It is interesting to mention here that
the utility of a stationary covariance matrix in the Kalman gain was demonstrated even in
the presence of evolving properties of the observation system (Fukumori 1995, Fukumori
and Malanotte-Rizzoli 1995, Hirose et al. 1999). The proposed algorithm is expected to
be applicable for any system similar to those mentioned above.
Some twin experiments are performed to evaluate the performance of the proposed
algorithm. This allows us to evaluate the results by comparing them to the ”truth”. The
experiments are intended to demonstrate that for a number of applications, the results
produced by the proposed algorithm converge to the optimal ones. For these experiments,
the perturbations are drawn from a Gaussian distribution with mean zero and given covariance matrix. The results are compared to those produced by a classical Kalman filter
algorithm.
The chapter is organized as follows. Section 2 explains the algorithm along with its
mathematical formulations. The first experiment, which uses a simple one-dimensional
wave model, is discussed in Section 3. The second experiment uses the operational model
for storm surge prediction, the Dutch Continental Shelf Model, which is described and
discussed in Section 4. An experiment with the three-variable Lorenz model is given in
Section 5. The chapter concludes in Section 6 with a discussion of the results.
5.2 Algorithm
The algorithm proposed in this chapter is belonging to the class of iterative methods for
finding steady-state Kalman gain. A number of such methods have been proposed, for
example by Mehra (1970), Mehra (1972), and Rogers (1988), which are based on some
algebraic manipulations on the Kalman filter equations. However, these methods still
require propagating matrices having the same dimension as the covariance matrix into
the dynamical system of interest. Therefore, they are not applicable for earth systems
problems, which usually have big dimension.
Our algorithm is based on estimating the forecast error covariance matrix using two
forecast realizations, which are perturbed by statistically independent perturbations. This
covariance matrix is used to define the Kalman gain matrix for a sequential steady state
analysis system. The algorithm consists of an open-loop step and a closed-loop step. In
the open-loop step, the covariance is computed without making use of any observations
of the process. The closed-loop step consists of an iterative procedure for improving the
covariance estimate by assimilating available observations. The result of the open-loop
step serves as the starting point for the closed-loop iteration.
53
5.2 Algorithm
54
5.2.1 Statistics from two independent realizations
Consider a discrete random process Xf (tk ) ∈ Rn that can be decomposed into deterministic and stochastic components as follows
˜ k)
Xf (tk ) = x
¯(tk ) + X(t
(5.1)
˜ k ) ∈ Rn a weakly stationary random
where x
¯(tk ) ∈ Rn is a deterministic process and X(t
process. Suppose we can draw two realizations of Xf (tk ), xf1 (tk ) and xf2 (tk ), obtained
˜ k ) added to the deterministic part.
from two statistically independent realizations of X(t
We define e1,2 (tk ) as the difference between these two series:
e1,2 (tk ) = xf1 (tk ) − xf2 (tk )
(5.2)
It is easy to see that e1,2 (tk ) is a weakly stationary random process and has the following
properties:
E[e1,2 (tk )] = 0
E[e1,2 (tk )e⊤
1,2 (tk )]
f
= 2P (tk )
(5.3)
(5.4)
where E[ ] is the expectation operator and Pf (tk ) ∈ Rn×n is the covariance matrix
of the random process Xf (tk ) defined as Pf (tk ) = E[(Xf (tk ) − E[Xf (tk )])(Xf (tk ) −
E[Xf (tk )])⊤ ]. Since the random process is weakly stationary, its covariance matrix is
constant, Pf (tk ) = Pf .
Given a sample e1,2 (tk ), k = 1, ..., N, and assuming that the random process is ergodic, we can calculate the covariance by averaging the realization over time:
N
1 1 X
e1,2 (tk )e⊤
P =
1,2 (tk )
2 N − 1 k=1
f
(5.5)
Note that two samples of Xf (tk ) are required to eliminate the trend within each sample, so that we can use time-averaging for computing the covariance.
5.2.2 Open-loop estimate
We first discuss the open-loop estimate, where the estimation of the covariance matrix of
a random process is done without making use of any observations. Consider a random
process governed by a dynamic model M as the following:
Xf (tk+1 ) = MXf (tk ) + Bu(tk ) + W(tk )
(5.6)
where Xf (tk ) ∈ Rn denotes the state vector, u(tk ) ∈ Rp the deterministic forcing term,
B ∈ Rn×p the transition matrix between the forcing and the state, and W(tk ) ∈ Rn is a
white noise vector with covariance Q ∈ Rn×n . From the governing equation of Xf (tk )
we can derive the evolution equation of the covariance Pf :
Pf (tk+1 ) = MPf (tk )M⊤ + Q
(5.7)
Two-sample Kalman filter for steady state data assimilation
55
If M is stable it will reach a steady state condition where Pf (tk+1 ) = Pf (tk ) = Pf
according to Pf = MPf M⊤ + Q.
Now let xf1 (tk ) and xf2 (tk ), k = 1, ..., N, be two independent samples from this process. If the process is ergodic, we can estimate Pf by using Equation (5.5).
5.2.3 Closed-loop estimate
Suppose now the observation yo (tk ) ∈ Rm of the dynamical system (5.6) is available and
we assume that the observation is related to the unknown true state xt (tk ) through
yo (tk ) = Hxt (tk ) + V(tk )
(5.8)
where H ∈ Rm×n is a linear observation operator and V(tk ) ∈ Rm is a white noise vector
with a constant covariance R ∈ Rm×m representing observation uncertainty. Combining
the observations and the dynamical system, we write the governing equation for Xf (tk )
as follows
Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + W(tk )
(5.9)
Xa (tk+1 ) = Xf (tk+1 ) + K(yo (tk+1 ) − HXf (tk+1 ) − V(tk+1))
(5.10)
It is easy to show that the corresponding covariance matrices evolve in time according to
a
Pf (tk+1 ) = MPa (tk )M⊤ + Q
(5.11)
f
(5.12)
⊤
P (tk+1 ) = (I − KH)P (tk+1 )(I − KH) + K R K
⊤
For a stable and time-invariant system, steady state condition will occur, where Pf (tk+1 ) =
Pf (tk ) = Pf and Pa (tk+1 ) = Pa (tk ) = Pa .
Let xf1 (tk ) and xf2 (tk ), k = 1, ..., N, be two independent samples from this process.
Then if we assume that the process is ergodic, we can estimate Pf by using Equation
(5.5).
The constant gain matrix K ∈ Rn×m is chosen such that it minimizes the covariance
Pa . It can be shown that such K is given by
K = Pf H⊤ (HPf H⊤ + R)−1
(5.13)
Note that the closed-loop system of equations (5.9)-(5.10) is actually a forecast-analysis
system.
5.2.4 Proposed algorithm
The algorithm is initialized by generating two independent realizations of the open-loop
process (5.6). This is done by running the open-loop system twice for the same period
and by perturbing both runs using statistically independent perturbations drawn from a
given distribution. Having two independent realizations, the covariance is computed using Equation (5.5) and subsequently the open-loop gain matrix by using Equation (5.13).
5.2 Algorithm
56
The next step is the closed-loop iteration. In the first iteration, the gain matrix computed in the open-loop step is inserted into Equation (5.10) and the closed-loop system,
Equation (5.9)-(5.10), is run twice for the same period. The forecasts in both runs are
perturbed using statistically independent perturbations drawn from the model error distribution. Moreover, independent perturbations representing observation noise are also
generated from a given distribution. The two independent realizations of the closed-loop
forecasts (5.9) are used to compute the forecast error covariance and the corresponding
gain matrix by using Equation (5.5) and (5.13) respectively. This gain matrix is subsequently inserted into the closed-loop system (5.10) for the next iteration. The closed-loop
estimation is repeated until the covariance does not change anymore. At each iteration,
the gain matrix computed in the previous iteration is inserted into the closed-loop system
(5.10).
To summarize, the proposed algorithm consists of the following steps:
• Open-loop step:
– generate two realizations of open-loop process, Equation (5.6): xf1 (tk ) and
xf2 (tk ), k = 1, ..., N
– estimate Pf using Equation (5.5)
– estimate K using Equation (5.13)
• Closed-loop step:
– insert K into Equation (5.10)
– generate two realizations of closed-loop process, Equation (5.9): xf1 (tk ) and
xf2 (tk ), k = 1, ..., N
– estimate Pf using Equation (5.5)
– estimate K using Equation (5.13)
– repeat closed-loop step until Pf , and therefore also K, have converged
From the previous section, since K is chosen to minimize the covariance Pa , the covariance Pf at one iteration step will be smaller than the one at the previous iteration. Hence,
it can be seen that if the iteration process has converged, then K is the optimal gain. Note
that for linear, time-invariant systems, the corresponding steady-state Kalman gain is a
solution to the iteration.
This algorithm has much in common with the analysis-ensemble method (Fisher,
2003). The novel aspects of this algorithm are the use of the open-loop estimate as an
initial guess for the closed-loop iterations and the iterative improvement of the estimated
covariances. Our algorithm is also similar as the ensemble Kalman filter (Evensen, 2003).
The difference is that instead of averaging over ensembles, the forecast error covariance
is estimated by averaging over time.
Note that the equations of the closed loop system are the same as in the Kalman filter
formulation. The difference is that in our algorithm K is updated every iteration step,
Two-sample Kalman filter for steady state data assimilation
57
while in Kalman filter it is updated every time step. It is a well known result that for a
stable and time-invariant system the Kalman gain will converge. For the same system the
gain computed by the proposed algorithm is expected to converge as well.
The computational cost of the proposed algorithm depends on the length of the two
samples as well as on the number of observations used for assimilation. It should be noted
here that the computation of the gain matrix K requires only the computation of the two
terms Pf H⊤ and HPf H⊤ , which we compute as
N
1 1 X
(e1,2 (tk ))(He1,2 (tk ))⊤
P H =
2 N − 1 k=1
f
⊤
(5.14)
and
N
1 1 X
HP H =
(He1,2 (tk ))(He1,2 (tk ))⊤
2N −1
f
⊤
(5.15)
k=1
For a limited number of observations per analysis step this is a limited amount of work.
Moreover, since the statistical computations are performed off-line, the computational
cost is not an issue as long as it is feasible for the system of interest. Once the gain
matrix is computed and implemented in a data assimilation system, the computational cost
will be the same as an assimilation system using a steady state Kalman filter or optimal
interpolation. Furthermore, the implementation is simple since only the analysis-equation
(5.10) is added to the code of the model.
From the description above, we can see that the algorithm is applicable for an assimilation system with fixed observation network. Moreover, it also assumes the error process
of the system under study to be weakly stationary. The stationarity assumption impedes
the algorithm from having the ability to include a flow-dependence in the forecast error
statistics. However, in a number of real time forecasting systems, it has been shown that it
is sufficient to use a stationary covariance for data assimilation (e.g. Heemink and Kloosterhuis 1990, Sørensen et al. 2004b). In these applications, a steady state Kalman filter is
found to be sufficiently accurate. In particular, Canizares et al. (2001) used an ensemble
Kalman filter for a shelf sea model and showed that the forecast error covariance matrix in
this specific case tends to a quasi-steady state after a few days of asimilation. Moreover,
Sørensen et al. (2006) studied and compared the properties of EnKF, RRSQRT Kalman
filter and steady state Kalman filter using an idealised bay and demonstrated the effectiveness of the steady state Kalman filter over the other methods for the given system. Hence,
the steady state Kalman filter is recommended due to its low computational cost, especially if an operational setting is considered. A constant forecast error covariance matrix
is also used in other data assimilation systems, which work with for example optimal interpolation (e.g. Demirov et al. 2003, Borovikov et al. 2005, Sun et al. 2007) and 3DVAR
(e.g. Daley and Barker 2001, Devenyi et al. 2004).
Although in this chapter the algorithm is developed for using two realizations, it can
be extended for a larger number of realizations. With a larger number of realizations, it is
possible to use samples over a shorter time. This may relax the stationarity assumption.
58
5.3 Experiment using a simple wave model
Note that for a sufficiently large number of realizations, it is possible to compute the
covariance estimate by averaging over ensembles over one simulation time step. In this
case the algorithm reduces to the ensemble Kalman filter. The choice of using only two
realizations is set by a practical consideration. The algorithm is intended to be used for
a future study using an operational storm surge prediction model, where two wind fields
produced by two different meteorological models are available. Assuming that the two
wind fields are generated from the same atmospheric random process, the two storm surge
forecasts are produced by running the storm surge model twice, each run is driven by one
of the two wind fields. In this case, no model error statistics have to be specified explicitly.
5.3 Experiment using a simple wave model
To check the validity of the algorithm proposed in this chapter we performed an experiment using a simple wave model. This model is a simplified version of the De St. Venant
equation, which can be applied for modelling tidal flow in a long and narrow estuary. The
governing equations of this model are
∂ξ
∂u
+D
=0
∂t
∂x
∂ξ cu
∂u
+g
+
=0
∂t
∂x D
ξ(x = 0, t) = ξb (t) + wb (t)
(5.16)
(5.17)
(5.18)
ξ(x, t = 0) = 0
(5.19)
u(x, t = 0) = 0
(5.20)
u(x = L, t) = 0
(5.21)
where ξ(x, t) is the water level above a reference plane, u(x, t) is the average current
velocity over a cross section, D is the water depth, g is the acceleration of gravity and c
is the friction constant. This system of equations is solved numerically by using a finite
difference method. A Crank-Nicholson scheme with the implicity factor θ = 0.4 is used
and the discretization in space is based on the use of a staggered grid.
The uncertainty of the model is assumed to be caused by an inaccurate sea-estuary
boundary. A colored noise process, denoted by wb (tk ), modelled as an AR(1) process is
used to model this uncertainty:
wb (tk ) = αb wb (tk−1 ) + ǫ(tk )
(5.22)
∆t
where αb is a correlation parameter defined by αb = e− Tc , ∆t is the time step, Tc is
the correlation time parameter and ǫ is a Gaussian white noise process. The correlation
time is set to 2 hours and the standard deviation of ǫ is chosen to be 20 cm for this study.
Moreover, a Gaussian white noise process with standard deviation of 2 cm is used to
represent the observation error.
For this experiment, the values of the other parameters are D = 10 meter, c = 0.0002s−1,
L=60 km, time step ∆t=1 minutes, and number of grid points for spatial discretization n
Two-sample Kalman filter for steady state data assimilation
59
= 80, hence ∆x = L/(n − 1). Moreover, at the sea-estuary boundary, x = 0, a periodic
water level is generated according to ξb (tk ) = 0.5sin(2πtk /T ), where T =3 hours.
Using this model, the open- and closed-loop processes described in Section 5.2 are
carried out. At each iteration, the model is run twice, each with a different noise realization, for the same simulation period of 600 hours. An artificial measurement generated
at location x = 0.3L is inserted into the model at each time step in the closed-loop runs.
The gain matrix computed at one iteration is inserted into the model used in the next iteration. The procedure is repeated until no significant change in the gain matrix is apparent.
The results in term of the water level gain matrices are presented in Figure 5.1a-5.1d. We
observe that there is no significant difference between the gain matrix estimated after the
second closed-loop iteration from the one estimated after the third iteration. This indicates
that the algorithm has converged.
1
(a) 0.5
0
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
x [km]
40
50
60
1
(b) 0.5
0
1
(c) 0.5
0
1
(d) 0.5
0
1
(e) 0.5
0
Figure 5.1: Estimated gain of water-level ξ (a)open-loop, (b)-(d)closed-loop estimates,
1st-3rd iteration steps, (e) computed using RRSQRT with 50 modes. The vertical line
shows the location of the assimilation station.
To evaluate the performance of the algorithm, we compare the gain matrix estimated
5.4 Experiment using the Dutch Continental Shelf Model
60
by using our algorithm to the one obtained by using a Reduced Rank Square-Root (RRSQRT)
Kalman filter (Verlaan and Heemink, 1997). The result, shown in Figure 5.1e, has been
obtained by running the RRSQRT Kalman filter until the Kalman gain matrix has become
constant. The number of modes used to produce this result is 50, which turned out to be
sufficient to estimate the Kalman gain accurately. Comparing the gain matrix estimated
at the third closed-loop iteration with the one obtained using the RRSQRT Kalman filter,
we see that they are very similar with each other. This indicates that for this model the
algorithm proposed in this chapter can reproduce the steady-state Kalman filter gain.
5.4 Experiment using the Dutch Continental Shelf
Model
The second experiment is carried out by applying the proposed algorithm to the Dutch
Continental Shelf Model (DCSM). This is an operational model used in the Netherlands
for real-time storm surge prediction. In the operational setup, a steady state Kalman filter
based on the work of Heemink and Kloosterhuis (1990) has been implemented to improve
the forecast’s accuracy of the model. Hence, in this study, it is possible to compare the
gain matrix computed by using our proposed algorithm to the one computed using this
steady state Kalman filter.
The governing equations used in the DCSM are the nonlinear 2-D shallow water equations. However, for the propagation of the forecast error covariance matrix a linearized
system of equations is used instead (see e.g. Verlaan 1998; ten Brummelhuis and Heemink
1990; Heemink and Kloosterhuis 1990):
∂ξ
∂u
∂v
+D
+D
=0
∂t
∂x
∂y
∂ξ
cf u τx
∂u
+g
− fv +
−
=0
∂t
∂x
D
D
∂v
∂ξ
cf v τy
+g
+ fu +
−
=0
∂t
∂y
D
D
(5.23)
(5.24)
(5.25)
where:
u, v = depth-averaged currenct in x and y direction respectively
ξ = water level above the reference plane
D = water depth below the reference plane
g = acceleration of gravity
f = coefficient for the Corriolis force
cf = coefficient for the linear friction
τx , τy= wind stress in x and y direction respectively
These equations are discretized using an Alternating Directions Implicit (ADI) method
and a staggered grid that is based on the method by Leendertse (1967) and Stelling (1984).
Two-sample Kalman filter for steady state data assimilation
61
In the implementation, the spherical grid is used instead of the rectangular (e.g. Verboom
et al., 1992).
Two sources of uncertainty are usually considered: uncertainty at the open boundary
and uncertainty of the meteorological forcing. However in this experiment we assume
that the source of uncertainty is only due to an erroneous wind forcing. To account for
this uncertainty noise processes wu and wv are added directly to the u and v velocities
instead of to the depth average momentum equations. It is generally believed that the
noise process is correlated in time, hence an AR(1) process is chosen to represent the
uncertainty in each direction. Denoting wu (x, y, tk ) and wv (x, y, tk ) as the zonal and
meridional directions respectively, the noise processes are written as
wu (x, y, tk ) = αw wu (x, y, tk−1) + ǫu (x, y, tk )
(5.26)
wv (x, y, tk ) = αw wv (x, y, tk−1) + ǫv (x, y, tk )
(5.27)
where αw is a correlation parameter and ǫu , ǫv are white noise processes. The spatial
characteristic is specified by using a spatial correlation given by
√
E[wu (x1 , y1 , tk )wu (x2 , y2 , tk )]
2
2
p
= e− (x1 −x2 ) +(y1 −y2 ) /dw
(5.28)
2
2
E[wu (x1 , y1 , tk ) ]E[wu (x2 , y2 , tk ) ]
√
E[wv (x1 , y1 , tk )wv (x2 , y2 , tk )]
2
2
p
= e− (x1 −x2 ) +(y1 −y2 ) /dw
(5.29)
E[wv (x1 , y1, tk )2 ]E[wv (x2 , y2 , tk )2 ]
E[wu (x1 , y1, tk )wv (x2 , y2 , tk )]
p
=0
(5.30)
E[wu (x1 , y1 , tk )2 ]E[wv (x2 , y2 , tk )2 ]
where
p
E[∆u(x1 , y1 , tk )2 ] =
p
E[∆v(x2 , y2 , tk )2 ] = σw
(5.31)
and dw is a characteristic distance for the spatial correlation. The noise term is introduced
on a coarser grid to reduce the number of noise inputs. The noise at other grid points are
interpolated subsequently by using bilinear interpolation.
The DCSM covers an area in the north-east European continental shelf, i.e. 12oW
to 13o E and 48oN to 62o N, as shown in Figure 5.2. The resolution of the spherical grid
is 1/8o × 1/12o , which is approximately 8 × 8 km. With this configuration there are
201 × 173 grid with 19809 computational grid points. The time step is ∆t = 10 minutes
and the friction coefficient cf = 0.0024. For the wind input noise parameters the following
values are used: σǫ = 0.003 m/s/time-step, αw = 0.9 per time-step, which corresponds to
a correlation time of 1.6 hours, and dw = 19 grid cells. The noise is defined at a coarser
grid, where each coarser grid consists of 28 × 28 computational grid cells. Moreover,
a white noise process with standard deviation σm = 0.10 meter is used to represent the
uncertainty of the observations. There are eight observation stations which are included
in the assimilation, five of which are located along the east coast of the UK and the others
along the Dutch coast (see Figure 5.2).
Using the specifications described above, three months artificial observation data are
generated at these assimilation stations. The generated observations are used in the closedloop iteration to improve the gain matrix estimates. Figure 5.3 shows the gains in term of
5.4 Experiment using the Dutch Continental Shelf Model
62
0
62
200
20
20
200
0
0
20
60
50
50
50
50
200
200
50
50
50
50
50
54
a
50
200
52
50
50
50
2
50
50
50
56
50
latitude (o N)
50
50 5
0
200
50
1
58
8
e
3
b d 7
4
5 c 6
f
g
h
50
50
50
0
20
−10
−5
0
longitude (o W/E)
5
10
Figure 5.2: DCSM area with 50 m and 200 m depth contours and some water-level
observation stations. Assimilation stations: 1. Wick, 2. North Shields, 3. Lowestoft,
4. Sheernes, 5. Dover, 6. Vlissingen, 7. Hoek van Holland, 8. Den Helder. Validation
stations: a. Aberdeen, b. Euro Platform, c. Cadzand, d. Roompotsluis, e. IJmuiden, f.
Harlingen, g. Huibertgat, h. Delfzijl.
water level computed by using our algorithm as well as by using the operational steady
state Kalman filter. Here we show only the gains for the assimilation station Wick. By
visual inspection we see that the iteration is convergent. Moreover, the results computed at the fourth closed-loop iteration shows similar structure and magnitude as the
one computed by using the steady state Kalman filter (Heemink and Kloosterhuis, 1990).
Some minor differences are apparent at points further away from the assimilation station.
This may be due to sampling error, which is a common difficulty in ensemble-based approaches. The limited number of samples may produce spuriously large magnitude of
covariance estimates between greatly separated grid points where the covariance is actually small. However, despite of these differences it is clear from the pictures that in
general the algorithm that we propose in this chapter can reproduce the gain estimated by
Two-sample Kalman filter for steady state data assimilation
using the steady state Kalman filter.
Figure 5.3: Water-level gain for assimilation station Wick: (a) open-loop estimate, (b)-(e)
1st-4th closed-loop iteration estimates, and (f) computed using steady-state Kalman filter.
63
5.4 Experiment using the Dutch Continental Shelf Model
64
3
obs
det
open−loop
2.5
2
Water level [m]
1.5
1
0.5
0
−0.5
−1
−1.5
00:00
06:00
12:00
12−Jan−2005
18:00
24:00
Figure 5.4: Water level timeseries at Wick on 12 January 2005: observed data, deterministic model without assimilation, with assimilation using the open-loop gain
Besides performing visual checks on the estimated gain, another evaluation was also
carried out. This evaluation is done by implementing the gain matrices computed using
our proposed algorithm in a data assimilation system similar to the operational one, where
the nonlinear version of system of equations (5.28)-(5.30) has been used (see Stelling,
1984). For this evaluation, the real water level data observed within the period of 1
November 2004 - 1 February 2005 have been assimilated. Figure 5.4 presents three waterlevel timeseries at Wick for the period of 12 January 2005 00:00 - 24:00. These timeseries
refer to water levels obtained from observations, forecasts using deterministic model without data assimilation, and forecasts with data assimilation using the gain matrix obtained
in the open-loop step. This figure demonstrates that the differences between the forecasts
with data assimilation and the observations are always smaller than the differences between the forecasts without data assimilation and the observations. To acquire a better
idea about the filter performance, the root-mean-square of the water level innovation is
calculated over the whole simulation period:
v
u
N
u1 X
t
RMS =
(ξ o(tk ) − ξ a (tk ))2
(5.32)
N
k=1
where ξ o is the observed water level and ξ a is the analyzed water level. To check whether
the data assimilation works also at other locations, the RMS water level innovation is
also computed in some validation stations, where the observation data has not been used
for the assimilation. In this experiment we used 8 assimilation stations and 8 validation
stations. Figure 5.5 and 5.6 show the RMS water level innovation at the assimilation
and validation stations respectively. In these figures we also present the RMS innovation
Two-sample Kalman filter for steady state data assimilation
65
of the water level obtained by using the steady-state Kalman filter. It is clear from these
pictures that using the gain matrices computed at both the open- and closed-loop steps, the
filter reduces the RMS values of the water level innovations at both the assimilation and
validation stations. At most assimilation stations, the largest reduction of the RMS water
level innovation is obtained by the assimilation setup with the gain matrix computed at the
open-loop step. This should be expected as the open-loop gain is bigger in the absolute
sense than the closed-loop ones. As a result the assimilation system with the open-loop
gain forces the states to be closer to the observations. The closed-loop gains however are
spatially smoother than the open-loop one. Moreover, we can also see that the RMS water
level innovation converges to the one obtained using the steady-state Kalman filter.
30
25
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
Steady−state KF
cm
20
15
10
5
0
wick
nort
lows
sher
dovr
vlis
hoek
denh
Figure 5.5: RMS of innovations with respect to water level observations at assimilation
stations
5.5 Experiment using the three-variable Lorenz model
To verify whether the algorithm proposed in this chapter can also be applied for an unstable dynamical system, we also performed experiments using a three-variable Lorenz
model. The Lorenz model is a simplified system of the flow equations governing thermal
convection. It was introduced by Lorenz (1963) and has been the subject of extensive
studies motivated by its chaotic and strongly nonlinear nature and yet small in dimension.
Due to these properties, it is often used as a testbed for examining various data assimilation methods for systems with strongly nonlinear dynamics (e.g. Miller et al. 1994,
Evensen 1996).
5.5 Experiment using the three-variable Lorenz model
66
30
25
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
Steady−state KF
cm
20
15
10
5
0
abdn
eurp
cadz
room
ijmd
harl
huib
delf
Figure 5.6: RMS of innovations with respect to water level observations at validation
stations
The governing equations of the Lorenz model are
dx
= σ(x − y)
dt
(5.33)
dy
= ρx − y − xz
(5.34)
dt
dz
= xy − β
(5.35)
dt
where x, y, and z are the dependent variables, varying over dimensionless time t. For this
experiment, we applied the parameter values, as first used by Lorenz to obtain chaotic
solutions: σ = 10, ρ = 28, and β = 8/3. The model was integrated with a second order Euler numerical scheme with a time step ∆t = 1/60 time units. To account for model uncertainty, three independent white noise processes are added to each
variable at each time step. The white noise processes
√ are drawn from a Gaussian distribution with zero mean and variance equal to 0.5 ∆t. Using this system, we computed a true reference solution for a duration of 40 time units and with initial values
(xo , yo , zo ) = (1.508870, −1.531271, 25.46091). A sequence of observations is generated
at intervals of 0.25 time units by adding to the reference solution a generated random number representing the observation error. For this study, the observations are generated for
all variables x, y, and z and the observation noise is generated from an unbiased Gaussian
distribution with variance equal to 2.0 independently for each variable.
We implemented the proposed algorithm with the Lorenz model described above to
estimate the corresponding steady state gain matrix. The two forecasts are generated by
running the system twice for the same period, where each run is perturbed independently
by noise, drawn from a distribution with the same statistics as has been used to generate
Two-sample Kalman filter for steady state data assimilation
the true solution. Since the system is unstable, the two realizations of the open-loop
system follow completely different trajectories. Therefore, the open-loop estimate can
not be computed since the difference between the two realizations is not stationary. To
solve this problem, the open-loop step is skipped and a reasonable gain matrix should
be used as the starting point for the closed-loop iteration. This will induce a feedback
mechanism, which makes the overall system stable. For the experiment presented in this
chapter we used the gain matrix K = 0.5I as the starting point for the closed-loop iteration,
where I is a 3 × 3 identity matrix. The closed-loop estimation was iterated 15 times and
the results are presented in Figure 5.7. This figure shows the elements of the 3 × 3 gain
matrix K, where the bars grouped per element show the estimates of the respective element
produced from consecutive iterations. It can be seen from the figure that the estimates of
the gain matrix converge after seven iterations. The method to compute the forecast error
covariance matrix in this experiment is similar to the method used in Yang et al. (2006b)
to compute the estimate of the background covariance for a 3DVAR assimilation system.
The difference is that they used the time average of the difference between the estimate
and the true solutions. In reality, however, the true solution is never known.
The next step is to use the gain matrix estimated above in a data assimilation experiment to examine its performance. Figure 5.8a shows both the true reference as well as
the model solution without any assimilation for variable x, while Figure 5.8b presents the
results when the constant gain matrix obtained from the 15th closed-loop iteration is used
for assimilation. The model solution without any data assimilation follows completely
different trajectory from the true one. On the other hand, Figure 5.8b demonstrates that
in this experiment the assimilation system is able to track the true solution. It is generally
able to reproduce the correct amplitudes in the peaks of the solution. There are a few
points where the solution deviates rather significantly from the true solution, for example
at time t = 17, t = 24, t = 31, and t = 37. However, the solution is quickly recovered
and it starts to trace the true solution again. For comparison we also conducted a data
assimilation experiment using the ensemble Kalman filter (Evensen, 2003). Here the ensemble consists of 1000 members and the results are presented in Figure 5.8c. Qualitative
examinations on these plots indicates that the EnKF performs better in tracking the true
solution, especially at the times where the solution obtained with assimilation using the
constant gain matrix deviates rather significantly from the true one. To acquire a better
insight about the performance, the RMS error of all variables with respect to the true solution is shown in Figure 5.9. It is clear from this figure that the solution obtained using
EnKF is more accurate than the results using the algorithm proposed in this chapter. It
seems that for a strongly nonlinear system like the Lorenz model, the assimilation system
should include a flow dependent forecast error covariance estimate for accurate results.
However, the computational cost of the proposed algorithm is much cheaper than the
EnKF. In practice, a decision has to be made involving the trade-off between the desired
accuracy and the computational cost required to achieve it.
67
5.6 Conclusion
68
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−0.1
K(1,1)
K(1,2)
K(1,3)
K(2,1)
K(2,2)
K(2,3)
K(3,1)
K(3,2)
K(3,3)
Figure 5.7: The estimated gain matrix K of Lorenz model
20
(a)
x
10
0
−10
−20
0
5
10
15
20
25
30
35
20
40
(b)
x
10
0
−10
−20
0
5
10
15
20
25
30
35
20
40
(c)
x
10
0
−10
−20
0
5
10
15
20
Time (scaled)
25
30
35
40
Figure 5.8: Timeseries of variable x: (a) without assimilation, (b) with assimilation
using two-sample method, (c) with assimilation using EnKF. The dashed line shows the
true reference solution, the solid line is the estimated solution, and the diamonds show the
simulated observation
5.6 Conclusion
In this chapter a new filter algorithm is proposed. The algorithm is based on estimating the
error covariance by using two forecast samples. This covariance is used to define a gain
Two-sample Kalman filter for steady state data assimilation
69
14
Without assimilation
Two−sample method
EnKF
12
10
RMSE
8
6
4
2
0
x
y
z
Figure 5.9: RMSE of forecast of variables x, y , and z without assimilation, with
assimilation using two-sample method, and with assimilation using EnKF
matrix for a steady state data assimilation system. Intended for a steady-state operation,
it is applicable for models where approximately a statistically stationary condition will
occur. Its applicability also requires the models to have short memory to allow for the
ergodicity assumption. The algorithm consists of an iterative procedure for improving the
estimate of covariance.
Twin experiments have been performed to evaluate the performance of the algorithm.
In the first experiment a simple one-dimensional wave model is used, while for the second one, we used the operational storm surge model: the Dutch Continental Shelf Model
(DCSM). The evaluation is done by comparing the gain matrix computed using the proposed algorithm to the one computed by using the exact steady state Kalman filter. Both
experiments show that the results using the proposed algorithm converge to the ones produced by using the classical Kalman filter.
An experiment using the three-variable Lorenz model has also been conducted to see
the possibility of applying the proposed algorithm in an unstable dynamical system. Since
the system is unstable, the open-loop estimate is skipped and the closed-loop iteration
should be initialized with a reasonable gain matrix. Using a diagonal matrix as the starting point, the closed-loop iteration has been shown to converge. Moreover, the data assimilation system with the constant gain matrix, estimated using the proposed algorithm
seems to do a reasonably good job in tracking the true solution. This result demonstrates
the potential use of the proposed algorithm also for unstable systems.
In the experiment using the DCSM, the two forecasts were generated by running the
model twice for the same period and by perturbing both runs using statistically independent perturbations drawn from a known distribution. In practice however this distribution
is unknown. Therefore, in the next study, we are going to explore the possibility of using two wind fields produced by two different atmospheric models to generate proxies
for the forecast error of the DCSM. The two wind fields are considered as two realiza-
70
5.6 Conclusion
tions of a random atmospheric process. Using the algorithm proposed in this chapter, the
two forecasts will be generated by running the model twice, each run will use one of the
two wind fields as input without any artificially generated perturbations nor assuming any
distribution. With this approach explicit modelling of model error covariance is avoided.
6
Storm surge forecasting using a
two-sample Kalman filter∗
6.1 Introduction
In the Netherlands, storm surge forecasting system has both social and economic importance. It is required for the operation of some moveable storm surge barriers for flood
prevention. Moreover, the passage towards Rotterdam harbor requires water level prediction especially for very large ships. The storm surge forecast is performed by using
a numerical model based on shallow water equations called the Dutch Continental Shelf
Model (DCSM).
The main source of uncertainty in a storm surge forecast model is the meteorological
forcing and meteorological effects outside the domain of the storm surge model (Gerritsen et al. 1995). To cope with this error, the operational system assimilates the observed
water level into the model. The data assimilation used in the operational system is based
on a steady state Kalman filter (Heemink 1988; Heemink 1990). Like many other data
assimilation algorithms, the steady state Kalman filter requires specification of the model
error statistics. However, the true error statistics are not known and difficult to estimate.
One usually resorts to make some parameterizations or to model the structure of the error
covariance. In the operational system, this is done by assuming certain structure of the
model error covariance, like spatially uniform error variance and isotropic spatial correlation.
In this paper, we explore another way of working around the estimation of the model
error covariance for the DCSM. The idea of the approach followed in this study is to
use the difference between two similarly skillful models (realizations) to represent the
unknown true model error. Here the error covariance is computed by averaging the difference between the two realizations over time. In the field of ocean modelling, for example,
the two realizations can be obtained by driving the ocean model under study by using two
wind fields produced independently by two meteorological centers. This approach is also
∗
This chapter is based on Sumihar et al. 2009, submitted to Monthly Weather Review
71
72
6.1 Introduction
used, for example, by Alves and Robert (2005) and Leeuwenburgh (2007). However, their
approach is in the framework of ensemble Kalman filtering, while ours is of steady-state
Kalman filtering.
For our study, two wind fields from two different meteorological centers are available as input forcing to the DCSM. Those are the Royal Dutch Meteorological Institute
(KNMI) version of HIRLAM (Und´en et al. 2002) and the UK Meteorological Office
North Atlantic European model (Lorenc et al., 1991). In the remainder of this paper,
these wind fields are called simply HIRLAM and UKMO respectively. Here it is assumed
that each wind field is deviated from the true one by a correlated noise, where both the true
wind field and noise are unknown. The difference between two model results, obtained
by driving the DCSM by HIRLAM and UKMO wind field, can be used as proxy to the
unknown true error. The error covariance is estimated from this difference by averaging
over time. The error covariance in turn is used to compute a Kalman gain. The computation of the forecast error covariance, and subsequently the Kalman gain, is done by using
a two-sample Kalman filter, proposed by Sumihar et al. (2008).
The two-sample Kalman filter is an iterative algorithm for computing a steady state
Kalman gain of a stochastic system. At each iteration, the algorithm uses two samples
of the stochastic process of interest to compute the covariance of the process. In turn,
the covariance is used to define a Kalman gain. The covariance estimate is improved
at the next iteration, by feeding back the Kalman gain into the system and regenerating
two independent samples of the system. The iteration is stopped when the estimate has
converged.
The approach described in this paper offers several advantages. The main advantage
of our approach is that it avoids explicit modelling of the error covariance. It avoids, for
example, the necessity to assume isotropic spatial correlation or to parameterize the error
covariance. The second advantage is its ease of implementation. The computer code of
the two-sample Kalman filter is easy to develop and it is separated from the model code.
It only requires the output of the model. Once the Kalman gain is ready, its implementation for data assimilation only requires a few additional codes to the existing computer
programs of the model. Another advantage is that it is computationally very cheap. The
Kalman gain is computed off-line and only once. Therefore, running the filter adds only
a small fraction to the running time of the model itself.
The use of a steady state filter has been proven effective in many systems like, for
example, shelf sea systems (e.g. Canizares et al. 2001, Verlaan et al. 2005, Sørensen et al.
2006) and coastal systems (e.g. Heemink 1990, Oke et al. 2002, Serafy and Mynett 2008).
Its application can also be found in oceanography (Fukumori et al. 1993), where the resulting solution was found to be statistically indistinguishable from the one obtained by
using full Kalman filtering (Fu et al. 1993). Moreover, Sørensen et al. (2006) compared
the properties of a steady state Kalman filter, to the more sophisticated methods: EnKF
(Evensen 1994) and RRSQRT Kalman filter (Verlaan and Heemink 1997) and demonstrated the effectiveness of the steady state Kalman filter over the other methods for the
given system. Hence, the steady state Kalman filter is recommended due to its low com-
Storm surge forecasting using a two-sample Kalman filter
putational cost, especially if a real time application is considered. A disadvantage is that
the applicability of steady-state Kalman filter depends on the application. If the errorcovariances or observation network are not approximately stationary the approach breaks
down.
The plan of this chapter is as follows. The approach used in this study is described in
Section 2. It consists of the description about the two-sample Kalman filter algorithm as
well as a system transformation. The two-sample Kalman filter is based on the assumption
that model error is uncorrelated in time. However, the error in wind field is generally believed to be correlated in time. Therefore, a system transformation is required to transform
the system representation into the one, which is driven by uncorrelated noise. Section 3
describes a twin experiment using a simple one-dimensional wave model to demonstrate
the performance of the two-sample Kalman filter in combination with the system transformation. The experiments using the DCSM are described and discussed in Section 4.
Section 5 briefly concludes the paper.
6.2 Approach
6.2.1 Introduction
In this section, we give a description about the approach used in our study. First, we
describe briefly the two-sample Kalman filter algorithm. As we will see, like any other
Kalman filtering algorithms, the two-sample Kalman filter algorithm requires the system
of interest to be driven by white noise process. However, in certain applications, such as
storm surge prediction, it is generally believed that the noise process is correlated in time.
In fact, it should be considered to be autocorrelated in time and space, because the statistics of many natural processes can be approximated this way (Lermusiaux et al. 2006).
In this case, we need to transform the model representation such that it is driven by white
noise. In our study, like in the operational system at KNMI, we assume that the error can
be represented as an AR(1) process. Therefore, here we describe a system transformation
for the case when noise is governed by an AR(1) process. The transformation, however,
will be similar for other ARMA type error model as well.
6.2.2 Two-sample Kalman filter algorithm
The two-sample Kalman filter is an iterative procedure for computing a steady-state Kalman
gain for a system with time invariant forecast error covariance (Sumihar et al. 2008). The
idea of this algorithm is to generate two independent forecast samples of the system under
study and use their difference to compute the forecast error covariance by averaging over
time. The covariance is in turn used to compute a gain matrix estimate. The estimate is
then improved iteratively until the solution has converged.
73
6.2 Approach
74
Suppose we have a stochastic process Xf (tk ) governed by
Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + η(tk )
(6.1)
Xa (tk+1 ) = Xf (tk+1 ) + K(yo (tk+1 ) − HXf (tk+1 ) − ν(tk+1 ))
(6.2)
where M is model dynamics, B input forcing matrix, u(tk ) input forcing, η(tk ) whitenoise process representing model and/or input forcing error, ν(tk ) white-noise process
representing observation error, yo (tk ) observation, H observation matrix, Xa (tk ) analysis
state vector, and K is a constant gain matrix. For a stable and time-invariant system,
steady state will occur. Since η(tk ) and ν(tk ) are white noise processes, it can be shown
that at steady state, the covariance matrices of the processes Xf (tk ) and Xa (tk ) are given
respectively by
Pf = MPa M⊤ + Q
(6.3)
Pa = (I − KH)Pf (I − KH)⊤ + K R K⊤
(6.4)
The constant gain matrix K is defined to minimize the variance of Xa (tk ), which is equivalent to minimizing the trace of Pa :
K = Pf H⊤ (HPf H⊤ + R)−1
(6.5)
The objective of the two-sample Kalman filter algorithm is to obtain K by using two
samples of the process Xf (tk ), k = 1, ..., N.
Suppose we have two realizations of the random proccess Xf (tk ): xf1 (tk ) and xf2 (tk ),
k = 1, ..., N. We define e1,2 (tk ) as the difference between these two series:
e1,2 (tk ) ≡ xf1 (tk ) − xf2 (tk )
(6.6)
The readers can verify that e1,2 (tk ) has the following statistics:
E[e1,2 (tk )] = 0
(6.7)
f
(6.8)
E[e1,2 (tk )e⊤
1,2 (tk )]
= 2P
where Pf ∈ Rn×n is the covariance of random process Xf (tk ) defined as Pf = E[(Xf (tk )−
E[Xf (tk )])(Xf (tk ) − E[Xf (tk )])⊤ ].
Given a sample e1,2 (tk ), k = 1, ..., N, and assuming that the time interval is sufficiently long that they are reasonably independent of each other, we can estimate the
covariance, Pf , by averaging over time:
Pf =
N
1 1 X
e1,2 (tk )e⊤
1,2 (tk )
2 N k=1
(6.9)
The two-sample Kalman filter algorithm consists of generating two samples of Xf (tk )
from the dynamics (6.1)-(6.2), utilizing the two samples to compute the forecast error
covariance Pf using equation (6.9), and using Pf to determine the gain matrix K with
Storm surge forecasting using a two-sample Kalman filter
equation (6.5). The value of K is fed back to the analysis equation (6.2) and all these
steps are repeated until the estimate of K has converged. The solution of the two-sample
Kalman filter algorithm is the value of K at the final iteration. Nevertheless, for practical
purposes, one can also use the estimate of K from the intermediate iteration steps.
The two-sample Kalman filter algorithm should be started from an initial value of K.
Basically, any value of K that stabilizes the overall dynamics can be chosen. One can use
the gain matrix already used in an existing data assimilation system based on an optimum
interpolation, for example. However, for the applications that we have worked with, we
have found that K = 0 is a good initial guess (Sumihar et al. 2008). In this study, we also
use K = 0 as initial value. For accordance with our previous work, we call throughout
the step with K = 0 as the open-loop step and the following ones as closed-loop steps.
This algorithm is similar with the analysis-ensemble method (Fisher 2003), where
analyses from different time levels are used to compute the error covariance. The new
aspects of the two-sample Kalman filter algorithm are the use of the open-loop estimate
and the iterative improvement of the estimated covariance. It also shares similarity with
the ensemble Kalman filter (Evensen 2003) in the sense that it uses ensemble of model
state realizations for computing error covariance. But the two-sample Kalman filter uses
only two samples long in time of the model state and the averaging is performed over time.
In fact, this algorithm can be extended for more than two samples. Using more samples,
the averaging can be done over a shorter time. When the sample size is big enough, it
reduces to the EnKF. The choice of using two samples is set by a practical consideration.
Here, we want to investigate the possibility of using two different wind fields to determine
error covariance of a storm surge forecast model.
The use of two forecast samples is similar to the NMC method (Parish and Derber
1992). The difference is that the NMC method uses the differences between two forecasts
at different ranges valid at the same time. In practice, however, the NMC method requires
a factor to rescale the estimated error statistics, which is obtained empirically (Yang et al.
2006a). The two-sample Kalman filter removes this requirement, since due to its iterative
update, the resulting covariance estimate is already optimal.
6.2.3 System transformation
The main source of error in a storm surge forecasting system is the uncertain input wind
forcing (Gerritsen et al. 1995). In our study, two wind fields are available. Each wind
field is available in a finite dimension in contrast to the real atmospheric system which has
infinite dimension. Only variability on spatial scales greater than a few grid cells can be
reproduced reliably. Moreover, there are also errors due to parameterizations and omitted
physics. As a result, there is always uncertainty in the wind field. For accounting this
uncertainty, we consider the wind field here as deviated from the ’true’ wind field by a
colored noise process, which is assumed to be governed by an AR(1) process. As we have
seen in the previous section, the two-sample Kalman filter algorithm requires the system
of interest to be driven by white noise process. This section describes how we transform
75
6.2 Approach
76
the system representation into the one driven by white noise process.
Suppose that we have a dynamic system Φ driven by an uncertain input forcing u
ˆ(tk ) ∈
q
R as the following
z(tk+1 ) = Φz(tk ) + Fˆ
u(tk )
(6.10)
ˆ (tk ) = u
¯ (tk ) + w(t
ˆ k)
u
(6.11)
where z(tk ) ∈ Rp denotes the system state at time tk and F is the p × q input forcing
matrix. The uncertain input forcing u
ˆ(tk ) is modelled as a stochastic process, which can
¯ (tk ) ∈ Rq and noise w(t
ˆ k ) ∈ Rq terms. Both
be written as the sum of the true forcing u
¯ (tk ) and w(t
ˆ k ) are unknown. We suppose further that the noise vector w(t
ˆ k ) in equation
u
(6.11) is governed by an AR(1) process
ˆ k ) = Aw(t
ˆ k−1 ) + ηˆ(tk )
w(t
(6.12)
where A is a time-invariant p×p diagonal matrix whose elements determine the correlationˆ ∈ Rp×p .
time, and ηˆ(tk ) ∈ Rp is a white noise vector with constant covariance Q
The objective of the transformation is to obtain another representation of the dynamics
(6.10)-(6.11) such that it is driven by a white noise process. The standard approach of
working with a system driven by colored-noise is to augment the state vector z(tk ) with
ˆ k ). However, in this case it is not possible to use this approach directly,
the noise vector w(t
ˆ (tk ), without any access to the realizations of
since we only have the realizations of u
ˆ k ) separately from those of u
ˆ (tk ). In the following, we propose a method to find an
w(t
ˆ k ) by utilizing two samples of u
ˆ (tk ). These, in turn, will be used in
approximate of w(t
the state augmentation procedure.
ˆ (tk ), namely
Suppose we have two independent realizations of the input forcing u
ˆ 1 (tk ) and u
ˆ 2 (tk ). We introduce
u
1
ˆ 2 (tk ))
u(tk ) ≡ (ˆ
u1 (tk ) + u
2
˜ 1 (tk ) ≡ u
ˆ 1 (tk ) − u(tk )
w
˜ 2 (tk ) ≡ u
ˆ 2 (tk ) − u(tk ).
w
(6.13)
(6.14)
(6.15)
˜1
The term u represents the average of the two input realizations. On the other hand, w
˜ 2 can be shown to evolve in time according to
and w
˜ 1 (tk ) = Aw
˜ 1 (tk−1 ) + η˜1 (tk−1 )
w
(6.16)
˜ 2 (tk ) = Aw
˜ 2 (tk−1 ) + η˜2 (tk−1 )
w
(6.17)
˜ 1 and w
˜ 2 as the transformed
Due to the symmetry with noise process (6.12), we call w
colored noise and η˜1 (tk ) and η˜2 (tk ) as the transformed white noise processes. The transformed white noise processes can be shown to be a combination of the unknown original
ηˆ1 (tk ) and ηˆ2 (tk )
1
η˜1 (tk ) = (ˆ
η1 (tk ) − ηˆ2 (tk ))
2
1
η2 (tk ) − ηˆ1 (tk ))
η˜2 (tk ) = (ˆ
2
(6.18)
(6.19)
Storm surge forecasting using a two-sample Kalman filter
77
Using these new variables, we define the transformed input forcings as
˜ 1 (tk ) = u(tk ) + w
˜ 1 (tk )
u
(6.20)
˜ 2 (tk ) = u(tk ) + w
˜ 2 (tk )
u
(6.21)
Notice the symmetry between the transformed input forcing and the original one (6.11).
We note here that with this transformation we actually retain the original input forcings.
˜ k ) separately.
But now we have the components u(tk ) and w(t
˜ k ) are available sepSince with this transformation the realizations of u(tk ) and w(t
arately, we can now use them in a state augmentation procedure. Denoting x(tk ) ≡
0
Φ F
F
z(tk )
, η(tk ) ≡
,M ≡
, and B ≡
, we can rewrite the
˜ k)
w(t
η˜(tk )
0 A
0
dynamical system (6.10)-(6.11) as
x(tk+1 ) = Mx(tk ) + Bu(tk ) + η(tk )
(6.22)
Equation (6.22) is the transformed dynamical system representation, which is driven by
white noise process. It is now similar to the system representation (6.1), on which the
two-sample Kalman filter is based.
6.2.4 Summary
The two-sample Kalman filter algorithm is used in this study in combination with the
system transformation described previously. In short, here we describe the two-sample
Kalman filter algorithm in combination with the system transformation above on a stepby-step basis:
• Initialization:
– compute and store input forcing average using equation (6.13): u(tk ), k =
1, ..., N
– compute and store the approximate white noise processes using equation (6.14)
and (6.16) as well as equation (6.15) and (6.17): η˜1 (tk ) and η˜2 (tk ), k =
1, ..., N
– specify initial guess of Kalman gain K
• Closed-loop step:
– insert K into equation (6.2)
– generate two realizations of Xf (tk ) of dynamics (6.1) using u(tk ), η˜1 (tk ) and
η˜2 (tk ): xf1 (tk ) and xf2 (tk ), k = 1, ..., N
– estimate Pf using equation (6.9)
– estimate K using equation (6.5)
78
6.3 Experiment with a simple wave model
– repeat closed-loop step until Pf , and therefore also K, have converged
It is important to note here that under the transformation explained in Section (6.2??),
we retain the original open-loop realizations. The closed-loop realizations, however, will
be different from the original ones. Nevertheless, it can be shown that the difference
between the two transformed samples are the same as the difference between the original
ones. Therefore, using the two-sample Kalman filter, we will obtain the same estimate of
forecast error covariance estimate and hence, the same Kalman gain estimate.
6.3 Experiment with a simple wave model
Before we apply our method to the DCSM, we first tested if the combination between the
sample transformation and two-sample Kalman filter will produce a correct solution. This
check is performed by using twin experiment with a simple one-dimensional wave model.
It is a simplified version of the De St. Venant equation, which can be used for modelling a
tidal flow in a long and narrow estuary. The governing equations of the one-dimensional
wave is as follows:
∂u
∂ξ
+D
=0
(6.23)
∂t
∂x
∂u
∂ξ
cu
+g
+
=0
(6.24)
∂t
∂x D
ξ(x = 0, t) = ξb (t)
(6.25)
ξ(x, t = 0) = 0
(6.26)
u(x, t = 0) = 0
(6.27)
u(x = L, t) = 0
(6.28)
where ξ(x, t) is the water level above the reference plane, u(x, t) is the average current
velocity over a cross section, t is the time, x is the position along the estuary, D is the water
depth, g is the gravitational acceleration and c is a friction constant. For this study, the
system of equations (6.23)-(6.28) is solved numerically using a Crank-Nicholson scheme
with implicity factor θ = 0.4 and the discretization in space is based on a staggered grid.
In this experiment, the following values for the parameters are used: D = 10 meter,
c = 0.0002s−1, L=60 km, time step ∆t=1 minutes, and number of grid points N = 80 for
the spatial discretization, hence ∆x = NL−1 . Moreover, at the sea-estuary boundary we
assume that the water level is periodic according to ξb (tk ) = 0.5 sin( 2π
t ), where T =3
T k
hours.
The uncertainty of the model is assumed to be due to the inaccuracy of the set-up at
the sea-estuary boundary. An additive colored noise modelled as an AR(1) process is used
to represent this uncertainty:
wb (tk ) = αb wb (tk−1 ) + ǫ(tk )
(6.29)
where αb determines the correlation time, which in this experiment is set to 2 hours and ǫ
is a Gaussian white noise process with standard deviation 20 cm.
Storm surge forecasting using a two-sample Kalman filter
The experiment setup is as follows. An artificial measurement of water level is generated at each time step, at location x = 0.3L. This is done by running the wave model
and adding to the simulated water level corresponding to this location, at each time step,
a random number drawn independently from a Normal distribution with zero mean and
standard deviation 2 cm, to simulate observation noise. The artificial observation is generated for a 600 h simulation period.
The discretized version of the system of equations above can be written in different
ways. For our purpose, two model representations are considered. The first representation
is where the noise process wb is available separately and augmented to the state vector.
On the other hand, in the second representation, the noise process wb is considered to be
added directly to the input forcing ξb , hence they are not available separately. The first
representation is driven by white noise, while the second by colored noise.
The two-sample Kalman filter algorithm is then implemented to each to these model
representations. At each iteration, two independent samples of each representation are
generated for the same simulation period. Since in the first representation, the model is
driven by white noise, we can apply the two-sample Kalman filter algorithm directly. This
will provide the ’true’ solution. We call the samples of the first representation as original
samples. For the samples of the second representation, we apply the system transformation described in Section (6.2.3) prior to implementing the two-sample Kalman filter.
Here we call them as the transformed samples. In this experiment, both original and transformed samples are generated using the same noise process realizations. Therefore, the
solutions based on the original and transformed samples should be equal if the transformation is correct.
The gain matrices computed in this experiment are depicted in Figure 6.1. Pictures
on the left-hand side panel show the gain matrices associated with water level, while
those on the right-hand side show the ones associated with depth-average current velocity.
Visual inspection on these pictures indicates that at each iteration step the gain matrices
obtained using both the original and the transformed samples are very similar to each
other. The two-sample Kalman filter uses the difference between the two sample of each
representation. Since the model is linear, the difference between the transformed samples
will result in the same error propagation model, as obtained by the original samples.
Hence, the solution computed using both the transformed and original samples are the
same. Note that the gain estimates resulted from the open-loop step by using transformed
samples is exactly equal with the ones from using original samples. However, there is a
minor difference in the closed-loop solutions, which may be due to rounding error and the
fact that we used different realizations of observation noise for all samples. This result
confirms that the transformation is correct.
To make the presentation complete, we compare the gain estimated by using our algorithm to the one obtained by using the RRSQRT Kalman filter (Verlaan and Heemink,
1997) as shown in Figure 6.1e. The gain is obtained by running the RRSQRT Kalman filter until it reaches the steady state. The RRSQRT Kalman filter in this experiment uses 50
modes, which is sufficient to estimate the steady-state Kalman gain accurately. It can be
79
6.3 Experiment with a simple wave model
80
K−wl
K−u
1
1
(a) 0.5
0.5
0
0
0
20
40
60
1
1
(b) 0.5
0.5
0
0
20
40
60
0
20
40
60
0
20
40
60
0
20
40
60
0
20
40
60
0
0
20
40
60
1
1
(c) 0.5
0.5
0
0
0
20
40
60
1
1
(d) 0.5
0.5
0
0
0
20
40
60
1
1
(e) 0.5
0.5
0
0
0
20
40
x [km]
60
x [km]
Figure 6.1: Estimated gain of water-level ξ (left-hand panels) and velocity u (right-hand
panels): (a)open-loop, (b)-(d)closed-loop estimates, 1st-3rd iteration steps, (e) RRSQRT
with 50 modes. The full-line represents the gain computed by using the transformed
samples, the dashed-line using the original samples, and the vertical line shows the
location of the assimilation station.
seen here that the solutions of the two-sample Kalman filter converge to the ones obtained
by using the RRSQRT Kalman filter. This shows that for this model, using the transformed sample and the algorithm proposed in this paper, we can reproduce the optimal
gain estimated by using the steady-state Kalman filter.
Storm surge forecasting using a two-sample Kalman filter
6.4 Experiments with the Dutch Continental Shelf
Model
6.4.1 Introduction
In this section we present and discuss some experiments using the DCSM (Gerritsen et al.
1995). We first describe briefly the DCSM and the two wind fields used in this study.
The exposition continues with presenting some estimates of the wind noise statistics. The
solution of the two-sample Kalman filter is given afterwards. We implemented the resulting Kalman gain estimates in a data assimilation system. The performance of the data
assimilation system is evaluated. The ultimate goal of the DCSM is to make storm surge
forecast. Hence, we also show and discuss the forecast performance of DCSM in this
section. In the end, a comparison is made between the performance of our system to the
one based on the assumption that the system noise is isotropic.
6.4.2 The Dutch Continental Shelf Model
The DCSM is a large-scale water level forecast and storm surge forecasting model that
encloses the North West European shelf, including the British isles (Figure 3.1). It was
developed as a joint activity of the National Institute of Coastal and Marine Management
(RIKZ), Delft Hydraulics and KNMI. This model is used to produce six-hourly surge
forecasting in an automatic prodction line at KNMI.
The error in DCSM is assumed to come from the uncertain wind input. To account for
this, an error term should be added to the governing equations. There are three possible
ways of doing this. We may try to add an error term to the wind velocity, to the wind stress
or to the depth-average water current. The operational system represents this uncertainty
as an error term added to the depth-average water current. The temporal correlation of the
wind noise is accounted for by using an AR(1) model. Moreover, the noise is assumed to
be isotropic, that is the spatial correlation length is independent from the flow direction.
In this study, AR(1) model is also used to represent the time correlation of the wind
noise. However, we model the uncertain wind input as an error term added to the wind
stress. Moreover, we do not assume anything about the structure of the spatial correlation.
We do assume, however, that the difference between the two wind fields described below
are characteristics of the true error. This assumption is validated in the next subsection.
6.4.3 Wind stress error statistics
The operational DCSM uses the forecasts of the Royal Netherlands Meteorological Institute (KNMI) high-resolution limited area model (HIRLAM) as input forcing. The
HIRLAM system is a complete numerical weather prediction system including data assimilation with analysis of conventional and non-conventional observations and a limited
81
6.4 Experiments with the Dutch Continental Shelf Model
82
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
UKMO
UKMO
area forecasting model with a comprehensive set of physical parameterization. The system is owned and developed jointly by the National Meteorological Services in Denmark,
Finland, Iceland, Ireland, Netherlands, Norway, Spain and Sweden. The complete scientific documentation of HIRLAM system can be downloaded from http://hirlam.org. The
Dutch version of HIRLAM used in this study covers an area between 12o W to 13oE and
48oN to 62.33oN. This wind product was used to calibrate the DCSM (Gerritsen et al.
1995). In fact, this is a dedicated version of HIRLAM used for driving the DCSM.
Another wind field available through KNMI for our study as input to DCSM is the one
produced by the UK Met Office (UKMO) North Atlantic and European model. This is an
operational forecast model in the 6- to 48-hour period. It covers an area between 70o W to
35oE and 32.5oN to 75o N, with grid size 0.11o × 0.11o . The wind data is projected into
the DCSM area by using bilinear interpolation.
In this study we used the wind fields from both sources and applied them to the algorithm explained in Section (6.2.2). Both wind fields are available within the period of 01
November 2004 until 28 January 2005 with a time step of three hours.
An inspection on the two wind fields indicates that the magnitude of the UKMO wind
stress tend to be smaller than that of HIRLAM. Figure (6.2) illustrates this condition. The
picture on the left hand side shows a typical scattered plot of the wind stress of the two
wind fields at a location above the sea area of DCSM. This scatter plot clearly shows that
there is a systematic difference between UKMO and HIRLAM wind stresses.
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1
−1
−0.5
0
HiRLAM
0.5
1
−1
−1
−0.5
0
HiRLAM
0.5
1
Figure 6.2: Scattered-plot of wind stress HiRLAM vs UKMO at Hoek van Holland, (left)
original and (right) after correction. Full line shows best fit of the dots, while dashed line
gives reference to perfect positive correlation.
Before the two wind fields can be used in the experiment, we need to make some
corrections to reduce the systematic difference between the two wind fields. In our study,
Storm surge forecasting using a two-sample Kalman filter
83
this is done by adjusting the UKMO wind stress to have the same scale with that of
HIRLAM. The reason is because HIRLAM wind field was used in the calibration of the
DCSM, hence it is expected to give better results. The adjustment of the UKMO wind
stress is performed by using linear regression and the correction is imposed to wind stress
data at all grid points on the wet area of DCSM. The picture on the right hand side in
Figure (6.2) presents the scattered plot of wind stress at the same location as on the left
hand side after correction.
Figure (6.3) presents the contour of the slope of the line of best fit used for correcting
the UKMO wind stresses. The contour of the slope, both in zonal and meridional directions, shows that the biggest slope and the steepest transition reside around the land-sea
boundaries. This may be due to the difference in schemes used for interpolating the wind
fields into the grid points of the DCSM. The UKMO wind fields are available in a coarser
grid and in our study the interpolation is done without taking the land-sea difference into
consideration. On the other hand, for HIRLAM wind fields, the interpolation near the
coasts has been modified to use only sea points of the atmospheric model for sea points
in the storm surge model (Gerritsen et al. 1995; Verlaan et al. 2005).
62
1.2
1.2
1.2
1
1.2
2
1.6
8
2
1.
1.6 1.61.6
1.8 4
48
1.4
1.2
2
1.8
1.8
1.2
1.4
4
1.
4
1.
1.4
1.
1.8
1.6
1
1.8
1.6
2.2 .8
1.2
2.2
1.6
Latitude [deg]
8
1.
1.4
2
1.8
1.4
Latitude [deg]
1.4
.6
1.4
1
1.2
6
50
1.2
1.41.6
1.8
1.8
2
1.
4
2 2.
1.8 2 .2
1.2 .6
22
1 2
.4
1.6 1.4
1.2
1.6
1.4 1.8
2
1.6
1.6
1.2
1.8
1.6
1.4
2
2.2
1.8 1.6
1.6
1.2
1.2
52
1.4
1.6
11.6.8
8
2.4
1.2
1.4
54
1.4
1.6
4
6
1.
1.2
56
1.
1.6
2.2
50
1.61.8
1.
8
1.
1.41.6
1.8
1.6
1.8
1.2
1.4
54
1.6 2
1.8
1.2
1.4
1.6
0.8
1.6
2
2.4
1.4 1.6
2
1.4
52
1.4
1.2
1.4
58
1.8
1
1.6 1.8
2.6
2.6 2.2
1.4
1.6
1.4
1
2.6 .8
1
1.4
1.6
1.8
2 2.2
1.4
58
56
1.2
0.8
60
1.8
1.6
1.8
1.6
1.2
1.4
60
1.
1.4
1.4
62
48
−10
−5
0
Longitude [deg]
(a)
5
10
−10
−5
0
Longitude [deg]
5
10
(b)
Figure 6.3: Contours of slope for (a) u- and (b) v-direction respectively used for correcting
UKMO wind stresses. Contour interval is 0.2.
In this study, each of the two wind fields are assumed to deviate from the true one
by a time-correlated noise term as in equation (6.11). Using HIRLAM and corrected
UKMO wind fields, the average wind field is computed using (6.13) and the approximate
of colored noise realization is computed using equation (6.14) and (6.15). Figure (6.4)
presents some typical autocorrelation plots of the computed colored noise obtained in
this experiment. By curve fitting to a theoretical exponential autocorrelation function, the
correlation time of the noise is found to be about 5 hours.
6.4 Experiments with the Dutch Continental Shelf Model
1
1
0.8
0.8
0.6
0.6
Autocorrelation
Autocorrelation
84
0.4
0.4
0.2
0.2
0
0
−5
−4
−3
−2
−1
0
Time [day]
(a)
1
2
3
4
5
−5
−4
−3
−2
−1
0
Time [day]
1
2
3
4
5
(b)
Figure 6.4: Autocorrelation function of the wind noise at Hoek van Holland, (a) udirection and (b) v-direction. The full-line is the empirical autocorrelation function
obtained from the difference between UKMO and HiRLAM, while the dashed-line is
autocorrelation function of a theoretical AR(1) process with correlation-time about 5
hours.
Figure (6.5a)-(6.5b) show the estimates of the spatial correlation of wind stress error
in zonal and meridional directions, respectively, associated with the point located at Hoek
van Holland. The spatial correlation varies smoothly over the model domain. Moreover,
the structure shows a local characteristic, where the highest correlation is at the point
of reference and it decreases for the locations further away. We also observe that the
correlation profile of the zonal direction tends to stretch in the east-west direction, while
that of the meridional direction spread more in south-north direction. This shows that the
noise has different characteristics from the ones used in the operational system, where
the spatial correlation is assumed to be isotropic. We also show in Figure (6.5c)-(6.5d)
the estimates of spatial cross-correlation of wind stress error in the two directions, with
respect to the error at Hoek van Holland. The figures show that the cross correlations are
effectively zero everywhere. This result validates the assumption that wind stress error in
the two directions are uncorrelated of each other.
We have tried also to validate the correlation structure against observation. A set
of observed 10 m wind speed data is available in our study to validate the correlation
structure. Figure (6.6) depicts the location of the wind observing stations. The wind data
is available in the period of November 2004 until January 2005, with a time interval of 1
h. To match with the wind data of UKMO and HIRLAM, which are available with time
interval of 3 h, we smoothen the observed wind data by using running average technique,
with a time window of 6 h. In performing the validation, we assume that observed data
Storm surge forecasting using a two-sample Kalman filter
62
85
62
0
0
0
0
−0.1
−0.1
−0.1
60
0
0
−0.1
0.
1
0
1
0.2
0
.1
0
0
0
−0
50
0.1
0
0.1
0.6
0.9
0.4
0.7
0
−0
.1
0.2
50
5
0.
0.8
.1
0.
0.
1
0.7
0.4
−0
0.3
0
52
0.6
52
0.1
0.2 0.3
0.4 0.5
0.8
0.9
0
0
0.1
0.2
0
0
0
0.1
0.1
1
−0.
54 0
0.2
0
0
0
0
0.2
0
54
0
0
56
0.3
0
0.1
.1
0.2
0
0
Latitude [deg]
0
0.1
−0
0
0
0.2
56
.1
58
1
0.
0.1
0
−0.1
−0.1
Latitude [deg]
−0
0
58
−0.1
0
−0.1
−0.1
60
0
0
0
0
48
0
48
−10
−5
0
Longitude [deg]
5
10
−10
−5
0
Longitude [deg]
(a)
5
10
(b)
62
62
0
0
0
0.1
−0
0
.1
0
0
.1
0
0
−0.1
0
0.1
0
−0.1
0.1
0
0
0
0
0
0
0.1
1
0.
52
0
0
0
52
54
0
0
0
0.2
0.1
0
54
−0.1
0
0.1
−0.1
−0.1
Latitude [deg]
0
0.1
0
0
56
0.1
56
0
Latitude [deg]
1
0.
0.1
58
0
0
−0
0
0.1
0
0
0.1
.1
0
−0
0
0
−0.1
−0.1
.1
−0
0.1
58
60
.1
0
−0
60
0
0
0.1
50
0.2
0
50
48
48
−10
−5
0
Longitude [deg]
(c)
0.1
−
0
0
1
0.
0
0
5
10
0
−10
−5
0
Longitude [deg]
5
10
(d)
Figure 6.5: Spatial correlation of the wind noise associated with Hoek van Holland,
(a) u-direction and (b) v-direction, and spatial cross-correlation (c) vuref and (d) uvref .
Contour interval is 0.1.
is generated from the same distribution from where UKMO and HIRLAM wind data are
drawn. Therefore, we can compute the observed correlation in the same way as used to
compute the correlation from the UKMO-HIRLAM difference.
An ideal validation of wind error correlation would be to compare the correlation
structure estimate to those obtained from observation with the technique described by e.g.
Hollingsworth and L¨onnberg (1986), Daley (1991) and Xu and Wei (2001). However,
it requires observation data from a lot of different locations. The fact that we only have
observed wind data from a few locations prevents us from using this technique. Therefore,
we set the objective of the validation here as to see if the resulting correlation structure is
6.4 Experiments with the Dutch Continental Shelf Model
86
54
3
1
o
latitude ( N)
2
10 9 11
6 8
7 5
4
52
50
0
5
longitude (o W/E)
Figure 6.6: Locations of wind observing stations: 1. K13, 2. Meetpost Noordwijk,
3. Huibertgat, 4. Cadzand, 5. Vlissingen, 6. Oosterschelde, 7. Vlakte van de Raan, 8.
Schaar, 9. L.E.Goeree, 10. Europlatform, 11. Hoek van Holland.
better supported by the available data than the one based on the isotropic assumption as
used in the operational system.
As pointed earlier, unlike the one based on isotropic assumption, the spatial correlation
structure estimated by using UKMO-HIRLAM difference in zonal direction is different
from the one in meridional direction. In this validation, we would like to check if the
available data supports this. For producing a clear illustration, we should choose a reference station, whose relative locations with respect to other stations is such that the error
correlation in one direction is significantly different from the one in other direction. Based
on this idea, we select station K13 as the reference station for computing correlation of
wind stress error at different locations. Figure (6.7) shows the spatial correlation estimates
computed from observation-HIRLAM difference, UKMO-HIRLAM difference, and from
Storm surge forecasting using a two-sample Kalman filter
a system based on an isotropic assumption. For the isotropic assumption based system,
the correlation length is set to 19 computational grid cells. In this pictures, the correlation
of wind stress error is represented as vector composed by the correlation in eastward and
northward directions.
87
6.4 Experiments with the Dutch Continental Shelf Model
88
K13
K13
54
52.2
latitude (o N)
latitude (o N)
53
52
51
OBS
UKMHIR
ISO
OBS
UKMHIR
ISO
50
3
4
5
o
longitude ( W/E)
6
51.2
3.2
7
(a)
4.2
o
longitude ( W/E)
(b)
K13
K13
54
52.2
latitude (o N)
latitude (o N)
53
52
51
OBS
UKMHIR
OBS
UKMHIR
50
3
4
5
o
longitude ( W/E)
(c)
6
7
51.2
3.2
4.2
o
longitude ( W/E)
(d)
Figure 6.7: Correlation of wind stress error with respect to wind stress error at stations
K13: (a) uu+vv spatial auto-correlation, (b) zoomed version of (a), (c) vuref +uvref
spatial cross-correlation, (d) zoomed version of (c). Each vector represent the resultant of
correlation vector in u− and v−directions, where the u component is defined as eastward
direction and v component is the northward direction. The north-eastward vector around
the below-right corner of each panel shows the vector of perfect correlation, i.e. the
correlation in both u- and v -directions is equal to one.
Figure (6.7a)-(6.7b) show the spatial auto correlation of the wind stress error. The
figure clearly shows that, at most locations, the observed correlation in the meridional
direction is larger than in the zonal one. An exception is the correlation with the error at
station Huibertgat, where the correlation in zonal direction is larger than meridional. This
Storm surge forecasting using a two-sample Kalman filter
may be due to the relative location of Huibergat with respect to K13. Nevertheless, the
figures show that wind stress error spatial correlation is indeed direction dependent. As
can be seen from the figures, this feature is better represented by the correlation structure
computed from UKMO-HIRLAM difference than the one based on isotropic assumption.
Figure (6.7c)-(6.7d) show the spatial cross correlation of the wind stress error. Here
we show only those, which correspond to the ones computed from observation-HIRLAM
and UKMO-HIRLAM differences. The cross-correlation based on isotropic assumption
is equal to zero. From these figures we see that the observed correlation is small, although
they are non-zero at most of the locations. Note that at the reference location (K13), the
cross-correlation is very close to zero. The same characteristic can be observed for the
ones computed from UKMO-HIRLAM difference, but with relatively smaller magnitude.
We have not yet understood why the cross-correlation at other locations is non-zero. This
may be due to the fact that we use a limited sample size for estimating a small correlation.
Nevertheless, since the cross correlation is practically small, although they are non-zero,
we may expect that this will not give a deteriorating effect on the resulting analysis system,
which is based on the assumption that the cross correlation of the wind stress error is zero.
6.4.4 Water level observation data
The observation data available for data assimilation experiments in this study is the observed water level obtained from sixteen observing stations located along the British and
Dutch coast as shown in Figure (3.1). Eight of these stations are used as assimilation
stations; these are the stations where observed data is assimilated into the DCSM: five
stations are located along the east coast of the UK and three others are along the Dutch
coast. These are the same observation stations used in the operational data assimilation
system. The other stations are used as validation stations, where observed data are used
only for validation of the model results but not used in data assimilation. The water level
at all of the locations is measured every 10 minutes. The observed water level data from
all assimilation stations are used both for computing the steady-state Kalman gain using
the two-sample Kalman filter as well as for data assimilation experiments.
The water level data contains two components: surge and astronomical tide. In the
operational system, the astronomical tide is predicted using harmonic analysis at each observation location using hundreds of tidal constituents. The harmonic analysis prediction
for astronomical tide is more accurate than using the DCSM. Since the data assimilation
is intended for correcting the storm surge error, in the operational system the astronomical
tide component of the observed data is replaced by the one predicted using the DCSM,
obtained by running the DCSM without any wind input forcing (Gerritsen et al. 1995). In
this study, this adjusted observation is used.
89
90
6.4 Experiments with the Dutch Continental Shelf Model
6.4.5 Computing Kalman gain using the two-sample Kalman filter
The DCSM and wind data described previously have been used to compute the estimate of
Kalman gain using the approach described in Section (6.2). The experiment setup in this
study is as follows. At each iteration step, the model is run twice for the same simulation
period. In each run, the DCSM is driven by one of the two wind fields. Independent
random numbers drawn from a Gaussian white noise with standard deviation σobs 10 cm is
added at each time step to the observed water level at each assimilation station to represent
observation noise. The simulation is run for the period of 01 November 2004 until 28
January 2005. To eliminate the spin-up effect, the first four days simulation results are
excluded in computing the covariance.
Figure (6.8) presents the contour plots of the water-level gain at each iteration step associated with assimilation station Hoek van Holland. The pictures show that the estimate
of the gain profile is smooth and confined to a limited area around the assimilation station.
Near the assimilation station the gain has the biggest value and it decreases over distance
from the assimilation station. Moreover, the gain at locations far from the assimilation
station is practically zero. This means that correction associated with certain assimilation
station is propagated only to the nearby locations and that no correction will be imposed
to the locations further away from the assimilation station. The pictures show also that
before convergence, the gain computed at one iteration step becomes smoother and more
localized than the ones computed at the previous steps. The open-loop gain has the biggest
magnitude compared to the ones computed in the closed-loop steps. In the open-loop run,
the difference between the two samples is the biggest. On the other hand, the inclusion of
observation data in the closed-loop runs has brought both samples closer to the data and
hence, to each other. This in turn will reduce the covariance of the difference between the
two samples and hence the gain will become smaller. The gain profile does not change
significantly starting from the third closed-loop step. It indicates that the filter converges.
The structure of the Kalman gain with respect to other variables are more complex
than the ones for water-level (not shown). This is due to the fact that the other variables
are related nonlinearly to the water level at assimilation stations through the nonlinear
model dynamics. We observed that the elements of Kalman gain associated with flow
velocities u and v are confined to an area around respective assimilation stations more
limited than the ones associated with water level. The biggest change in the Kalman gain
estimate occurs from the open-loop to the first closed-loop iteration. At the next iterations,
only slight difference in the solution can be noticed. The structure of the gain remains the
same from one iteration to the other. Only the magnitude changes with iteration.
The structure of the elements of the Kalman gain estimates associated with the noise
term has also a smooth structure around the respective assimilation stations (not shown).
Starting from the first closed-loop iteration, the structure of the elements of the Kalman
gain associated with the noise terms remains, but the magnitude slightly increases. This
indicates that for producing better model results, more correction on the erroneous wind
Storm surge forecasting using a two-sample Kalman filter
62
91
62
0
0
60
60
0
0
0
0
58
0
01
0.
54
0
0.02
0
0.01
2
0.0
04
0.
0.03
0.
07
52
0
0
0.06
0.
04
0
05
0.
0
54
0.03
52
0
0
56
5
Latitude [deg]
0
0
0
Latitude [deg]
0
0
56
0.0
58
0
0
50
50
0
0
0
0
48
48
−10
−5
0
Longitude [deg]
5
10
−10
−5
0
Longitude [deg]
(a)
5
10
5
10
5
10
(b)
62
62
60
60
0
0
0
0
58
58
0
Latitude [deg]
0
54
0.01
2
0.0
56
0
0
54
0.01
03
02
0. .03
0
0.04
50
4
52
0.
0.0
52
0
0
56
0
Latitude [deg]
0
0
50
0
0
0
0
48
48
−10
−5
0
Longitude [deg]
5
10
−10
−5
0
Longitude [deg]
(c)
(d)
62
62
0
60
0
0
60
0
58
0
58
0
0
0
0
Latitude [deg]
0
0
54
56
0
Latitude [deg]
0
56
54
0.01
0.01
0.0.03
4
02
0.
0
0.0.03
4
02
0.
52
0
52
50
50
0
0
48
0
0
48
−10
−5
0
Longitude [deg]
(e)
5
10
−10
−5
0
Longitude [deg]
(f)
Figure 6.8: Water-level gain for assimilation station Hoek van Holland: (a) Open-loop
estimate, (b)-(f) Closed-loop estimates, 1st-5th iteration steps. Contour interval is 0.01.
6.4 Experiments with the Dutch Continental Shelf Model
92
input is needed. The magnitude of the increment itself decreases with iteration, indicating
that the iteration is convergent from below. However, it also shows significantly non-zero
values at locations further away from the assimilation stations. It is not clear if it is the
true structure or if it is a spurious correlation due to sampling error, which is a common
problem in an ensemble-based data assimilation system. Some occurrence may be related
to spurious correlation, for example the anomalously big values at those location around
land-sea area. However, despite of this, in this experiment there is no indication found that
the iteration will blow up. Moreover, our previous work (Sumihar et al. 2008), which used
the same length of samples as used here, showed no indication of any serious spurious
correlation problem. Hence, for this study we retain the estimates of Kalman gain as they
are and use them in the subsequent data assimilation and forecast experiments.
6.4.6 Hindcast experiment
We have computed the estimates of Kalman gain using the two-sample Kalman filter. We
proceed now with implementing the Kalman gain for data assimilation. For this experiment, we use the HiRLAM wind fields as input forcing to the DCSM. As the first test
of the performance of the data assimilation system, we use the water level data from the
same period as is used for computing the Kalman gain. The observation data is assimilated
every time step (10 min). For this study, the only objective information available for evaluation is the observed water-level data. The test is performed by comparing the resulting
analyses to the observed data obtained at all available locations. The performance of the
data assimilation system is evaluated in term of the rms of water level residual computed
over the whole period of the experiment.
The results for the assimilation stations are shown in Figure 6.9. This figure shows the
rms of water-level residual of the analyses obtained by data assimilation using the Kalman
gain estimated at each iteration. Shown also in this figure are the results of deterministic
forecast, which are obtained without data assimilation. Here we see that the data assimilation system has successfully brought the model closer to the observed data. It is found
that the data assimilation has reduced the water-level residual by approximately 40%-70%
depending on the locations. For most of the stations, the rms water-level residual obtained
by data assimilation using the open-loop gain is found to be the smallest. This is because
at open-loop computation, the two realizations of the storm surge system are still relatively very different from each other. Therefore, the variance of their difference is large
and so are the value of the gain matrix elements corresponding to these locations. On the
other hand, in the closed-loop computations, since observations are assimilated, the two
realizations become similar and the variance decrease. As the consequence, the open-loop
gain pulls the model towards the observations more than the closed-loop ones. The rms
increases at the next iterations and converges after the 3rd closed-loop step.
Figure 6.10 shows the results for the validation stations. Here we see that the data
assimilation has steered the model closer to the observed data, even at the locations where
data is not used in the assimilation. At these locations, the data assimilation has reduced
Storm surge forecasting using a two-sample Kalman filter
93
20
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
18
RMS water−level residual [cm]
16
14
12
10
8
6
4
2
0
wick
nort
lows
sher
dovr
vlis
hoek
denh
Figure 6.9: RMS analysis residual of water level at assimilation stations, with σobs = 0.10
m.
the water-level residual by approximately 30%-70%. The fact that data assimilation has
improved the model results at validation stations shows that the correction imposed by
the steady-state Kalman filter are not limited only to the assimilation stations, but also to
other locations. This may indicate that the forecast error covariance matrix computed here
has a consistent structure so that it propagates the correction to the other part of the model
area properly.
6.4.7 Forecast experiment
The DCSM is intended for making forecast of storm surge. It is therefore important to
test the performance of the data assimilation system by checking the resulting forecast
quality. In fact, the forecast residual is a more important measure of the quality of the
data assimilation system than is the fit to data in the analysis itself. It is a direct measure
against independent observation data.
Here, the data assimilation system is used to initialize the DCSM before generating
forecast. The model is initialized by driving the DCSM with HIRLAM analyses wind
and assimilating the water level data continuously every time step for the whole period
of the experiment. The forecast experiment is performed by using HIRLAM forecast
wind from the period of September 2007 until January 2008. HIRLAM forecast wind is
available every 6 hours and each forecast extends to a period of 48 hours. Considering
some missing wind data, in total we have 600 forecast runs.
6.4 Experiments with the Dutch Continental Shelf Model
94
30
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
RMS water−level residual [cm]
25
20
15
10
5
0
abdn
eurp
cadz
room
ijmd
harl
huib
delf
Figure 6.10: RMS analysis residual of water level at validation stations, with σobs = 0.10
m.
Figure 6.11 shows the rms of forecast water-level residual versus forecast time, for the
observing station located at Hoek van Holland. The results shown here are obtained by
applying a 6 h running average on the time series of the rms residual. In this picture, we
show the forecast performance initialized by analyses using Kalman gain obtained from
each iteration of the two-sample Kalman filter. For comparison, we show also the deterministic forecast, which is obtained without data assimilation. The forecast performance
in general decreases in forecast time. However, an obvious improvement due to data assimilation can be seen for the forecast horizon up to 15 hours. The performance becomes
worse than the deterministic forecast at forecast time 15-24 hours. The performance with
data assimilation is expected to converge to the deterministic forecast if we extend the
forecast time further.
The degrading performance after 15 hours can be explained as follows. The data
assimilation system is supposed to correct the error due to the uncertain wind input. However, there is also another source of error, namely the surge from outside the model area.
The data assimilation may have incorrectly updated the model state in this area where
there is no assimilation station. When this incorrectly updated wave arrives at the forecast
station, it degrades the forecast performance (Gerritsen et al. 1995). The performance
becomes worse than the deterministic forecast after this time. After sometime, however,
the effect of initial condition vanishes and the performance is expected to converge to the
deterministic one.
We also observe in Figure 6.11 that the forecast performance is improved from one
Storm surge forecasting using a two-sample Kalman filter
95
hoekvhld
RMS water level residual [m]
0.2
0.15
0.1
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
0.05
0
0
6
12
18
24
30
Forecast horizon [h]
36
42
48
Figure 6.11: Forecast performance at Hoek van Holland.
iteration to the next ones. Using the open-loop gain, although the rms residual at initial
condition is smaller than the ones using the gains from latter iterations, the forecast performance deteriorates quickly in forecast time. As a result, the running average of the rms
residual produces bigger values than that of the latter iterations. Largest improvement can
be seen to occur at the first two iterations. However, the difference is no longer significant
after 2nd closed-loop iteration.
Figure 6.12 shows the forecast performance at one of the validation stations, where
data is not used for assimilation. Here we see that in general it has similar pattern like the
one at Hoek van Holland. This confirms once again that the data assimilation using our
approach above has successfully improved the short term forecast of the DCSM, at both
assimilation and validation stations.
The time interval where the forecast performance is improved over deterministic forecast is limited by the physics of wave propagation in the model area and dependent on the
locations. In principle, it is determined by the time required by a gravity wave to travel
from the open ocean to the location of interest. For the stations along the Dutch coast, the
travel time is about 12 hours. For the operational system, in practice, one uses the forecast
with data assimilation up to 15 hours and use the deterministic forecast afterwards (Gerritsen et al. 1995). It should be noted here that this limiting characteristic time is actually
not prohibitive for the operational purposes. The forecast system is used, for example,
to give accurate forecasts at least 6 hours ahead for the proper closure of the moveable
6.4 Experiments with the Dutch Continental Shelf Model
96
ijmdbthvn
RMS water level residual [m]
0.2
0.15
0.1
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
0.05
0
0
6
12
18
24
30
Forecast horizon [h]
36
42
48
Figure 6.12: Forecast performance at IJmuiden.
storm surge barriers in the Eastern Scheldt and the New Waterway. It is also used as an
aid to decide whether a warning should be issued, while warnings are issued at least 6
hour ahead to provide time for the necessary preparations (Verlaan et al. 2005).
6.4.8 Comparison with isotropic assumption based steady-state Kalman
filter
In this section, we compare the performance of the data assimilation above (hereafter,
UKMHIR) to the one based on an isotropic assumption (hereafter, ISO). For a fair comparison, the statistics of the model error for the ISO system are set to resemble those
obtained by using HIRLAM and UKMO wind stress difference, except now the error
spatial correlation is set to be isotropic. Here, the spatial correlation length is set to 19
computational grid size (about 150 km) for both zonal and meridional directions. The
standard deviation of the error is set to be spatially homogeneous and equal to the spatial
average of the ones obtained by using HiRLAM and UKMO wind fields difference.
Like the UKMHIR, the steady-state Kalman gain for the ISO system is obtained by
using the two-sample Kalman filter as well. At each iteration, the DCSM is run twice for
the same period and at each run the model is perturbed by independent random numbers
drawn from a Normal distribution having the statistics specified above. The random field
is generated by using covariance matrix decomposition technique (e.g. Fenton and Grif-
Storm surge forecasting using a two-sample Kalman filter
fiths 2007). Due to computational resource limitation, the random field is generated on
a coarser grid, which is five times the computational grid. A test has been performed by
estimating the spatial correlation from the resulting random field and confirmed that it has
the desired statistics. Note that in the operational system, the random field is defined on
a coarser grid, which is 28 times as large as the computational grid. To simulate the observation error, independent random numbers are added to the observed water level with
the same statistics as used in the UKMHIR. Like the previous experiment, the two-sample
Kalman filter is run until the fifth closed-loop step. It is found that the iteration converges
after the third loop.
For this comparison, the gain estimate obtained at the fifth closed-loop iteration from
both systems are implemented for data assimilation. The comparison is performed by
evaluating the forecast quality of the DCSM, initialized by the UKMHIR and ISO data
assimilation systems. The evaluation is performed by using the observed data from the
same period as used in the experiment described in Section (6.4.7). In this way, we ensure
that the data used for comparison is independent from those used to compute the Kalman
gain.
We found that in general, the UKMHIR performs better than the ISO for short term
forecast. Figure 6.13 shows a result, which illustrates this. The forecast based on UKMHIR
system is better than ISO for the short run forecast up to 15 hours. Afterwards, it becomes
worse up to 24 hours and then converge to the one of ISO. The overshoot in the period between 15-24 hours of the UKMHIR is bigger than that of ISO. This may indicate that the
UKMHIR data assimilation system performs improper adjustment on the model variables
further away from the assimilation stations worse than the ISO. As mentioned earlier, the
performance degradation after 15 hours is likely to be caused by unnatural correction of
the external surge by the data assimilation. It seems that the unnatural correction imposed by the UKMHIR is bigger than by the ISO. Nevertheless, as far as the operational
purposes are concerned, this overshoot should not really be an issue. Moreover, the effect of initial condition vanishes after sometime and the performance of the two systems
converges to each other afterwards.
6.4.9 Tuning of observation error statistics
A question occurred to us in the course of the research process whether the specified
observation error variance is already optimal. To check this, we evaluate the statistics of
the innovation realizations (Maybeck 1979). Ideally, when a Kalman filter has been well
tuned, the number of innovation realizations lying outside the theoretical +/- 1 σ should
be about 32%.
Figure 6.14a illustrates the innovation sequence associated with the setup with observation error standard deviation σobs =10 cm. In this figure, we see that with σobs =10
cm, only a little fraction of the innovation sequence lies outside the +/- 1 σ band. This
indicates that the specified observation error variance is too large and the filter setup requires additional tuning. As a follow up to this finding, we have set the observation error
97
6.4 Experiments with the Dutch Continental Shelf Model
98
hoekvhld
RMS water level residual [m]
0.2
0.15
0.1
0.05
Deterministic
UKMO−HIRLAM
ISOTROPIC
0
0
6
12
18
24
30
Forecast horizon [h]
36
42
48
Figure 6.13: Forecast performance at Hoek van Holland, initialized by data assimilation
systems based on UKMO-HiRLAM difference and isotropic assumption. For each system,
the steady-state Kalman gain is obtained from the 5th closed-loop iteration of the twosample Kalman filter. Shown also for comparison is the deterministic forecast without
data assimilation.
variance to σobs =5 cm and checked the statistics of the innovation realizations as shown
in Figure 6.14b. Here, we see that the fraction of the innovation sequence lying outside
the +/- 1 σ is closer to the ideal one. This result indicates that we may still retune the
observation error variance even smaller to obtain an optimal setup.
We have also performed additional hindcast and forecast experiments as above, with
observation error standard deviation σobs =5 cm. Here, the computation of the Kalman
gain estimate as well as the hindcast experiment are performed in the period of November
2004 until January 2005. The rms residual obtained from the hindcast experiment at
assimilation and validation stations are shown in Figure 6.15 and 6.16, respectively. Here,
we see that the two-sample Kalman filter is also convergent and the water level residual
converges to some values smaller than those obtained using σobs =10 cm, as shown in
Figure 6.9 and 6.10. However, it produces larger rms water level residual in the first few
iterations, as can be seen at station Aberdeen, Harlingen and Delfzijl. This may be due to
fact that, because more weight is now given to observation, correction imposed on these
locations is larger, while the gain estimates are not yet close to optimal values.
Forecast experiment is performed in the period of August 2007 until January 2008,
Storm surge forecasting using a two-sample Kalman filter
0.15
99
residual
std
Percentage outside one−sigma: 11.8714
Water level residual [m]
0.1
0.05
0
−0.05
−0.1
−0.15
30
31
32
33
34
35
36
37
Number of days since 01 November 2004
38
39
40
(a)
0.15
residual
std
Percentage outside one−sigma: 26.7634
Water level residual [m]
0.1
0.05
0
−0.05
−0.1
−0.15
30
31
32
33
34
35
36
37
Number of days since 01 November 2004
38
39
40
(b)
Figure 6.14: Sequence of water level innovation and the +/- 1σ interval for Hoek van
Holland with (a) σobs = 0.10 m and (b) σobs = 0.05 m. Data assimilation is performed
by using Kalman gains from the 5th closed-loop iterations, computed for each setup with
different observation noise standard deviation, σobs .
which is different from the period used in preparing the Kalman gain and hindcast experiments. This gives the opportunity to validate the forecast system against data which is
independent from the ones used to prepare the Kalman gain. Figure 6.17 shows the rms
of forecast water level residual versus forecast horizon. Comparing this result to the one
obtained using σobs =10 cm as shown in Figure 6.11, we see that here the performance
difference between data assimilation systems using the gain estimate from the first few
iterations is more significant. Together with the hindcast results, it indicates that using
a smaller observation error standard deviation, the two-sample algorithm requires more
6.4 Experiments with the Dutch Continental Shelf Model
100
20
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
18
RMS water−level residual [cm]
16
14
12
10
8
6
4
2
0
wick
nort
lows
sher
dovr
vlis
hoek
denh
Figure 6.15: RMS analysis residual of water level at assimilation stations, with σobs =
0.05 m.
30
RMS water−level residual [cm]
25
20
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
15
10
5
0
abdn
eurp
cadz
room
ijmd
harl
huib
delf
Figure 6.16: RMS analysis residual of water level at validation stations, with σobs = 0.05
m.
iterations to give a satisfactory results. It also produces a slightly larger overshoot at forecast time around 15 h. This may be due to a larger unnatural correction imposed on the
area near the open ocean, because the data assimilation system puts more weight to the
observation. As a result, it deteriorates the forecast at this time more than the one with
σobs =10 cm.
Storm surge forecasting using a two-sample Kalman filter
101
hoekvhld
RMS water level residual [m]
0.2
0.15
0.1
Deterministic
Open−loop
1st Closed−loop
2nd Closed−loop
3rd Closed−loop
4th Closed−loop
5th Closed−loop
0.05
0
0
6
12
18
24
30
Forecast horizon [h]
36
42
48
Figure 6.17: Forecast performance at Hoek van Holland, with σobs = 5 cm.
We conclude this section by stating that setting the observation error to σobs =5 cm
causes the innovation statistics to be more consistent with the ideal one. It also slightly
improves the analysis results at locations further away from assimilation stations. However, the impact on forecast errors is small.
6.4.10 Discussion
An issue of ensemble-based data assimilation is the possible spurious correlation due to
sampling error. This problem can be eliminated by using large number of samples. In
the two-sample Kalman filter, this means to use very long time series of the two model
realizations. In principle, since the algorithm is intended to be run off-line and only once,
there should not be any computational problem. Nevertheless, if the samples are not
sufficiently long that spurious correlation problem occurs, we may combine this algorithm
with the covariance localization technique proposed by Gaspari and Cohn (1999), which
has been examined by e.g. Hamill and Withaker (2001), Mitchell and Houtekamer (2000),
Houtekamer and Mitchell (1998, 2001), Sørensen et al. (2004a), and Buehner (2004). The
main idea of this method is to use a cut-off radius, beyond which observations are not used,
to avoid having to estimate the small correlations associated with remote observations.
Another point of discussion here is whether the solution is in fact optimal. It should
be noted here that the two-sample Kalman filter will give an estimate of a steady-state
6.5 Conclusion
102
Kalman gain, which is optimal for a given model error covariance. Different model error
covariance specification will give different solution. The example of the UKMHIR and
ISO systems illustrate this. For both systems, the Kalman gain is taken from the fifth
closed-loop step. That is, it is taken after the iteration converge; hence, it is the optimum
solution for the respective error statistics specification. The example in Section (6.46.4.8)
illustrates how model error specification impacts the performance of the resulting data
assimilation system. In this case, it is found that the UKMHIR approach performs better
than the ISO. However, it does not necessarily mean that the UKMHIR gives the optimum
solution for the DCSM. The optimum solution will be achieved only if the true error covariance is used. In practice, this is not possible to achieve, since true state is never known.
Moreover, available observation data is usually much less than the model variables. One
can only try to approximate the unknown true error covariance.
The specification of error covariance is important for the data assimilation. A wrong
specification of model eror covariance may give negative impact on the analysis as illustrated in Dee (1995). For the experiments performed in this study, however, there is no
indication of negative impact of both data assimilation systems. This may indicate that
the error covariance used in this study is reasonably good for the DCSM.
A complicating factor is that the model error covariance may vary in time. It is already
understood, at least conceptually, that model error at each time is dependent on the actual
system state (e.g. Dee 1995) in an unknown manner. Hence, approximating the error
covariance by a constant one may never result in the most optimal solution. Therefore,
this motivates a further study to extend the approach described here to work with more
samples. By using more samples, it is possible to compute the covariance by averaging
over a shorter time. The covariance is then recomputed using data from different analysis
time window. In this way, the covariance estimate will vary depending on the time used
for the computation.
6.5 Conclusion
We have presented in this paper an application of the two-sample Kalman filter. The implementation is done with an operational storm surge prediction model, the Dutch Continental Shelf Model (DCSM). Two wind fields are available for driving the model, namely
HIRLAM and UKMO. The two forecast samples required for computing the Kalman gain
are generated by running the model twice, each run driven by either one of the two windfields. The basic assumption in this work is that the difference between the two wind fields
is characteristic for the unknown true error. In this paper we suggest to use a surrogate
quantity for the unknown true error in order to estimate the forecast error covariance used
to define a steady-state Kalman gain. The main advantage of this approach is that we can
avoid explicit modelling of the error covariance.
It is generally believed that the error in the wind fields is correlated in time. Since
the two-sample Kalman filter is developed based on the assumption that the system is
Storm surge forecasting using a two-sample Kalman filter
driven by white noise, a modification on the system representation is required before we
can implement the algorithm. The objective of the transformation is to construct a new
model representation, in which the driving error process becomes white noise. In this
paper we present a transformation for the case where the source of uncertainty is in the
input forcing. The transformation is done by making use of two samples of the uncertain
input forcing. The method for transformation is presented for the case where noise can
be represented by an AR(1) process. However, similar technique can be used for other
ARMA type model error as well.
The system transformation in combination with the two-sample Kalman filter algorithm is verified by using twin-experiment with a simple one-dimensional wave model.
Here it is possible to generate samples for both the transformed as well as the original
model representation. The verification is done by comparing the results to a reference,
obtained using the original representation. The experiments show that the results are consistent with the reference.
Finally the algorithm is implemented with the DCSM. Some care must be taken beforehand because there is a systematic difference between HIRLAM and UKMO wind
fields. We perform a correction on the UKMO wind fields to eliminate this systematic
difference. Using HIRLAM and corrected UKMO wind fields, we compute the estimate
of the steady-state Kalman gain. It turns out that the algorithm converged after three iterations. An evaluation is performed by implementing the gain estimates in a data assimilation system. We checked the performance of the assimilation system on both assimilation
stations, where data is used for the assimilation, and validation stations, where data is
not used for the assimilation. The experiments show that the assimilation system with
the estimated gain is successful in bringing the model closer to data in all observation
stations. Another evaluation is also used by using independent data, which is not used in
the computation of the Kalman gain. Here, the data assimilation system is used to give
better estimate of the initial condition for making forecasts. It is found that the forecast
initialized by using this data assimilation performs better than the one without data assimilation. In the end, a comparison is made between the data assimilation system proposed
here to the one based on an isotropic assumption. The short-run forecast based on the approach proposed here is found to perform better than the one with isotropic assumption.
An additional experiment is performed on tuning the observation error standard deviation.
It shows that for the setup used in this study, a more accurate analysis can be obtained by
reducing the standard deviation of the observation error. It also indicates the significance
of latter iterations of the two-sample Kalman filter algorithm in producing more accurate
analyses.
It is important to note here that the two-sample Kalman filter will produce a gain
matrix, which is optimal for the given specified model error covariance. Therefore, it
will be optimal only if the true error is indeed statistically stationary and the true error
covariance is known. Since it is not possible to obtain the information about the true
error, we have to approximate them. It is known that the model error changes with time
along with the actual system state. For our next study, we are going to extend our approach
103
104
6.5 Conclusion
to work with more than two samples. By using more samples, it is possible to compute
the covariance by averaging over shorter time. In this way we can relax the stationarity
assumption.
7
Ensemble storm surge data
assimilation and forecast
7.1 Introduction
Probabilistic forecast system has been used operationally in the atmospheric field since the
past decade. Some examples of meteorological centers where such system is operational
are the National Centers for Environmental Prediction (NCEP), the Canadian Meteorological Center (CMC), and the European Centre for Medium-Range Weather Forecasts
(ECMWF). The development of a probabilistic forecast system is motivated by the fact
that forecast is always uncertain due to inaccurate initial condition and inevitable approximation taken to describe the physical system. The significance of a probabilistic forecast
is that it can give users some idea about the forecast uncertainty and predictability of the
system. However, a full probabilistic forecast, which evolves the full probability function
of the system state in time, is computationally not feasible.
A common approach to work around this problem is to use Monte Carlo method.
The idea of this approach is to represent the probability distribution of the system state
forecasts by using ensemble of possible forecasts drawn from that probability distribution.
The forecast ensemble is generated by perturbing the uncertain components of the system,
such as initial and boundary conditions, and propagate each sample in time by using the
forecast model. Two advantages can be expected from an ensemble forecast system. First,
the mean of the ensemble forecast provides a net improvement in average forecast skill.
Leith (1974) showed that the mean of the ensemble forecast is statistically better than any
individual forecast. Second, the spread of the forecast ensemble at a single time offers
an information about the uncertainty of the forecast at that time. It may give a qualified
estimate of the confidence in the forecast (Frogner and Iversen 2001). This may help the
decision making process related to the product of the forecasts.
At the Norwegian Meteorological Institute, an ensemble prediction system has been
operational for a number of years and it is called the LAMEPS (Frogner and Iversen 2002;
Frogner et al. 2006). Studies about the LAMEPS performance in predicting geopotential
height, MSL pressure, and precipitation events can be found in Frogner and Iversen (2002)
and Frogner et al. (2006). Here, we perform a study using the 10 m wind speed ensemble
105
7.2 Data assimilation algorithm
106
generated by the LAMEPS and use them as meteorological forcing for a storm surge
forecasting model.
In this paper, we present our work with a storm surge forecast model, called the Dutch
Continental Shelf Model (DCSM), which is operational in the Netherlands. It is known
that one of the main sources of uncertainty in operational water level forecasts is uncertainty in the meteorological forcing (Gerritsen et al. 1995). Here, we explore the
possibility of representing this uncertainty by using ensemble of meteorological forecasts
produced by the LAMEPS. The idea is to generate storm surge forecasts ensemble by
forcing the DCSM using each member of the 10 m wind speed LAMEPS ensemble. The
main objective of this study is to investigate the possible use of the resulting forecasts
ensemble to estimate the forecast error covariance for data assimilation. Here, the data
assimilation is carried out in the framework of ensemble Kalman filtering. In this framework, the forecast error covariance to define a Kalman gain and the Kalman gain is used
to update each member of the forecasts ensemble by assimilating observational data.
The main topic of this study is about forecast error covariance modelling. In the
operational storm surge forecast system, the data assimilation coupled to the DCSM is
based on a steady state Kalman filtering and the forecast error covariance is assumed to
be isotropic. In a previous study by Sumihar et al. (2009) the isotropic assumption is
relaxed by computing the error covariance from the difference between the wind products of two meteorological centers: UK Met Office and the Royal Dutch Meteorological
Institute (KNMI). This method is similar to multi-model ensemble approach (Hagedorn
et al. 2005), but only two models are used and the forecast error covariance is computed
by averaging the ensemble difference over time. In this paper, we extend this work and
use more ensemble members for computing the covariance. By using more ensemble
members, the time averaging can be done over shorter time. This offers an additional advantage to generate flow dependent error covariances. Like the previous work, the main
advantage of this approach is that explicit modelling of the forecast error covariance is
avoided, because information about the forecast error covariance is contained within the
ensemble members.
We shall first describe the data assimilation procedure used in this study and error
modelling approach in Section 7.2. The set-up of the experiments is given in Section 7.3.
Section 7.4 presents the estimates of the statistics of the wind stress error. The results of
hindcast experiments are given and evaluated in Section 7.5, while Section 7.6 presents
a number of forecast experiments. Section 7.7 describes and discusses the comparison
of the ensemble based data assimilation with a steady-state Kalman filter. The chapter
concludes in Section 7.8.
7.2 Data assimilation algorithm
The data assimilation algorithm used in this study is an extension of the two-sample
Kalman filter algorithm proposed by Sumihar et al. (2008). The two-sample Kalman
Ensemble storm surge data assimilation and forecast
107
filter is an iterative procedure for computing the steady-state Kalman gain of a stochastic
process by using two samples of the process. The computation of the steady-state Kalman
gain is performed by making use of the difference between the two samples to compute
the forecast error covariance. The two-sample Kalman filter is based on the assumption
that the error process is stationary and ergodic. Therefore, it is possible to compute the
covariance by averaging over time. Here, the algorithm is extended to work with n samples, where n > 2, so that the averaging can be done over a shorter time. Moreover,
the computation of the Kalman gain is done at each analysis period. Hence, it can relax
the assumption that the error process is stationarity over a long time. Nevertheless, we
assume that within each analysis window, the error covariance is constant.
It is known that the error in wind forcing is correlated in time. Like in Sumihar et al.
(2009), we assume that each member of the wind ensemble is deviated from the ’true’
wind field by a colored noise term, represented as an AR(1) process. Since in the framework of Kalman filtering the error process, which perturbs the system should be white
noise, we need to transform the representation of the wind forcing ensemble in such a
way that it is driven by white noise.
In this section, we shall first introduce a generalization of the sample transformation
used in Sumihar et al. (2009). Subsequently, we describe the analysis algorithm, which is
an extension of the two-sample Kalman filter. The section concludes with a summary of
the overall algorithm used in this study.
7.2.1 System representation transformation
One of the main sources of error in a storm surge forecasting system is the uncertain
input wind forcing (Gerritsen et al. 1995). To account for this uncertainty, we use an
ensemble of wind field and consider each wind field as deviated from the ’true’ wind field
by a colored noise process, governed by an AR(1) process. As we have stated earlier, a
Kalman filter algorithm requires the system of interest to be driven by white noise process.
This section describes how we transform the system representation into the one driven by
white noise process.
Suppose that we have a dynamic system Φ driven by an uncertain input forcing u
ˆ(tk ) ∈
Rq as the following
z(tk+1 ) = Φz(tk ) + Fˆ
u(tk )
(7.1)
ˆ (tk ) = u
¯ (tk ) + w(t
ˆ k)
u
(7.2)
where z(tk ) ∈ Rp denotes the system state at time tk and F is the p × q input forcing
matrix. The uncertain input forcing u
ˆ (tk ) is modelled as a stochastic process, which can
¯ (tk ) ∈ Rq and noise w(t
ˆ k ) ∈ Rq terms. Both
be written as the sum of the true forcing u
¯ (tk ) and w(t
ˆ k ) are unkown. We suppose further that the noise vector w(t
ˆ k ) in equation
u
(7.2) is governed by an AR(1) process that reads
ˆ k ) = Aw(t
ˆ k−1 ) + ηˆ(tk )
w(t
(7.3)
7.2 Data assimilation algorithm
108
where A is a time-invariant p×p diagonal matrix whose elements determine the correlationˆ k ) ∈ Rp×p .
time, and ηˆ(tk ) ∈ Rp is a white noise vector with covariance Q(t
The objective of the transformation is to obtain another representation of the dynamics
(7.1)-(7.2) such that it is driven by a white noise process. The standard approach of
working with a system driven by colored-noise is to augment the state vector z(tk ) with
ˆ k ). However, in this case it is not possible to use this approach directly,
the noise vector w(t
ˆ (tk ), without any access to the realizations of
since we only have the realizations of u
ˆ k ) separately from those of u
ˆ (tk ). In the following, we propose a method to find an
w(t
ˆ k ) by utilizing n samples of u
ˆ (tk ). These, in turn, will be used in the
approximate of w(t
state augmentation procedure.
ˆ i (tk ), i =
Suppose we have n independent members of the input forcing ensemble, u
1, ..., n. We introduce
n
1X
ˆ i (tk )
u(tk ) ≡
u
n i=1
(7.4)
˜ i (tk ) ≡ u
ˆ i (tk ) − u(tk )
w
(7.5)
˜ i can
The term u represents the average of the n input realizations. On the other hand, w
be shown to evolve in time according to
˜ i (tk ) = Aw
˜ i (tk−1 ) + η˜i (tk−1 )
w
(7.6)
Notice the symmetry between Equation (7.6) and the noise process (7.3). Due to this
˜ i as the transformed colored noise and η˜i (tk ) as the transformed white
symmetry, we call w
noise processes. Using these new variables, we define the transformed input forcings as
˜ i (tk ) = u(tk ) + w
˜ i (tk )
u
(7.7)
˜ i (tk ) =
Equation (7.7) shows that the original set of input forcing is retained; that is u
ˆ i (tk ), i = 1, ..., n. However, under this transformation, we can work with the average
u
and colored-noise terms separately. Note that if we have n independent realizations of
ˆ i (tk ), by using Equation (7.4)-(7.5) we can compute the input forcing ensemble average,
u
˜ i . Moreover,
u and the approximate of colored-noise for each member of the ensemble, w
by using Equation (7.6), we can compute the approximate of white-noise realizations, η˜i
of the respective ensemble member.
We can now use the transformed input forcing in a stateaugmentation
procedure.
For
0
zi (tk )
, ηi (tk ) ≡
,M ≡
each ensemble member, by denoting xi (tk ) ≡
˜ i (tk )
w
η˜i (tk )
Φ F
F
, and B ≡
, we can rewrite the dynamical system (7.1)-(7.2) as
0 A
0
xi (tk+1 ) = Mxi (tk ) + Bu(tk ) + ηi (tk )
We now have a system representation, which is driven by white noise.
(7.8)
Ensemble storm surge data assimilation and forecast
109
7.2.2 Analysis algorithm
The algorithm described here is an extension of the two-sample Kalman filter algorithm
(Sumihar et al. 2009). To maintain the similarity with our previous works, we start the
description by supposing that we have a stochastic process Xf (tk ) governed by
Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + η(tk )
a
f
o
f
X (tk+1 ) = X (tk+1 ) + K(y (tk+1 ) − HX (tk+1 ) − ν(tk+1 ))
(7.9)
(7.10)
where M is model dynamics, B input forcing matrix, u(tk ) input forcing, η(tk ) whitenoise process representing model and/or input forcing error, ν(tk ) white-noise process
representing observation error, yo (tk ) observation, H observation matrix, Xa (tk ) analysis
state vector, and K is a constant gain matrix. For a stable system with a slowly varying error statistics, we can assume that within any time-window with sufficiently short
time length, the error covariance is constant. Within each time-window, the covariance
matrices of the processes Xf (tk ) and Xa (tk ) are given respectively by
a
f
Pf = MPa M⊤ + Q
(7.11)
⊤
(7.12)
⊤
P = (I − KH)P (I − KH) + K R K
The objective of the algorithm described here is to find such K that minimizes the error
variance within the corresponding time-window. The algorithm consists of an iterative
procedure for improving the estimate of K. Starting from an initial value of the gain
matrix, Ko , the solution of Equations (7.11)-(7.12) are computed, so we have Pfo and
Pao . In turn, Pfo is used to compute the gain matrix for the next iteration, K1 , using an
analysis-variance-minimizing equation:
K = Pf H⊤ (HPf H⊤ + R)−1
(7.13)
The new gain matrix is inserted into the equations system (7.11)-(7.12) and new solutions
are computed. The procedure is repeated until the iteration has converged. Since K is
defined to minimize the analysis error variance of the next iteration, we have Pao ≥ Pa1 ≥
... ≥ Paj . Because Pa > 0, the iteration will converge. When the iteration has converged,
the solution of the algorithm is an optimal gain matrix, that is the one which gives minimum analysis error variance. Using this gain matrix in the analysis equation above, we
will obtain analysis state with minimum error variance.
In this paper, we use the idea above using n realizations of the process Xf (tk ) to
approximate the solutions of equations (7.11)-(7.12) for obtaining Pf . Suppose we have
n realizations of the random proccess Xf (tk ): xf1 (tk ), ..., xfn (tk ), within a time window
k = 1, ..., N. An unbiased estimator of the forecast error covariance using these samples
can be obtained by averaging over time and ensemble members:
Pf =
N
n
1 X 1 X f
(x (tk ) − x
¯f (tk ))(xfi (tk ) − x
¯f (tk ))⊤
N k=1 n − 1 i=1 i
(7.14)
7.2 Data assimilation algorithm
110
where x
¯f (tk ) is the forecast ensemble mean, given by
n
x
¯f (tk ) =
1X f
x (tk )
n i=1 i
(7.15)
Note that in practice we do not need to compute the full covariance matrix Pf . Instead, it
is sufficient to compute Pf H⊤ and HPf H⊤ . For a reasonable number of observations, this
can reduce the computational burden considerably.
A common problem of working with a small number of ensemble members is the spurious correlation at distant locations due to sampling error. To work around this problem,
a covariance localization should be applied. This can be done by using Schur product
of the empirical covariances with an analytical correlation function ρ with local support
(Gaspari and Cohn 1999). The Schur product is defined by
(ρ ◦ Pf )ij = ρ(i, j)Pfij
(7.16)
and the analytical correlation function used in this study is the Gaussian function
ρ(z, L) = exp(−
z2
)
2L2
(7.17)
where z is the distance between a location to a point of reference and L is the decorrelation
length.
7.2.3 Summary
The analysis procedure consists of splitting the time period into several short time windows, where at each time window, the forecast error covariance can be assumed to be
constant. At each time window, the analysis is performed on each ensemble member using the algorithm above. After the analysis, the analysis ensemble at the end of each time
window is used as the initial conditions for the model at the next time window. To summarize, for each time window, the data assimilation procedure can be described on the
step by step basis as follows:
• Initialization:
– compute and store input forcing average using equation (7.4): u(tk ), k =
1, ..., N
– compute and store the approximate white noise processes for each ensemble
member using Equation (7.6): η˜i (tk ), i = 1, ..., n, k = 1, ..., N
– specify initial guess of Kalman gain K
• Closed-loop step:
– insert K into equation (7.10)
Ensemble storm surge data assimilation and forecast
– generate n realizations of Xf (tk ) of dynamics (7.9)-(7.10) using u(tk ), η˜i (tk ):
xfi (tk ), i = 1, ..., n, k = 1, ..., N
– estimate Pf using equation (7.14)
– estimate K using equation (7.13)
– repeat closed-loop step until Pf , and therefore also K, have converged
In our experience, the iteration converges after three or four steps. But in practice, it
may not be necessary to iterate the procedure until convergence is reached. The solutions
from intermediate iteration steps may also give a satisfactory results.
The analysis system described above can be seen as a generalization of the ensemble
Kalman filter (Evensen 1994). When the ensemble size is sufficiently large, the averaging
can be performed over the ensemble at a single time. Hence, it reduces to the EnKF. This
procedure shares similarity as well with the method proposed by Xu et al. (2008a,2008b),
in the sense that it uses ensemble from extended time to compute the forecast error covariance.
The data assimilation procedure described here is also similar to the one described
by Sørensen et al. (2004b), that both procedures are based on the assumption that the
covariance is only slowly varying in time, hence also the gain matrix. The difference
is that, in their work, the (weighted) averaging is done in term of gain matrix between
instantaneous gain matrix and the smoothed gain matrix corresponding to the previous
time step (time regularization). In our algorithm, the averaging is done over covariance
estimate within a time window.
7.3 Setup of experiment
The experiment setup used in this study consists of the DCSM as storm surge forecast
model, an ensemble of 10 m wind speed forecasts produced by LAMEPS as input forcing,
and in-situ observations of water level for both data assimilation and validation of the
resulting storm surge forecasts ensemble. The period of the experiment is between August
2007 until January 2008. This section describes briefly each of these components.
7.3.1 The Model
The results presented in this paper are obtained by using the Dutch Continental Shelf
Model (DCSM). This model is used as one of the main components of the storm surge
forecasting system, which is operational at the Dutch Royal Meteorological Institute,
KNMI. The development of the model was started in the late 1980s (Verboom et al. 1992;
Gerritsen et al. 1995). The DCSM is based on the two dimensional shallow water equations, which describe the large-scale motion of water level and depth-integrated horizontal
flow. The time step for numerical integration of the shallow water equations is 10 minutes.
The implementation of the DCSM is based on the software package WAQUA, which is de-
111
7.3 Setup of experiment
112
veloped based on the works of Leendertse (1967) and Stelling (1984). The DCSM covers
the area of the northwest European continental shelf to at least the 200 m depth contour,
i.e. 12oW to 13o E and 48oN to 62.33oN (Figure 3.1). The resolution of the spherical grid
is 1/8o × 1/12o .
The DCSM is driven by two input forcing: one at the open boundaries and the other
on the sea surface. At the open boundaries, the tidal constants for ten tidal constituents,
M2, S2, N2, K2, O1, K1, Q1, P1, ν2, and L2 are specified. These tidal components were
estimated on the basis of results from encompassing models, matched with nearby coastal
and pelagic tidal data. The prescription of ten tidal constituents along the sea boundary
gives a sufficiently detailed tidal input for tidal simulations in the North Sea, including
the nonlinear Southern part (Gerritsen et al. 1995).
The input forcing on the sea surface is expressed in term of wind stress and computed
from 10 m wind speed data. In this study, we use an ensemble of 10 m wind speed data,
generated by the Norwegian Meteorological Institute LAMEPS, as input forcing to the
DCSM.
7.3.2 10 m wind speed forecasts ensemble
The ensemble of wind input used in this study is obtained from the ensemble weather forecasting system, called LAMEPS, which is operational at the Norwegian Meteorological
Institute. It is an ensemble of the Norwegian version of the limited-area model HIRLAM.
LAMEPS consists of 20 perturbed forecast members and one unperturbed (control) forecast. The horizontal resolution of the model is 0.25 degrees or approximately 28 km, and
the model has 31 vertical levels. The area covered by LAMEPS is depicted in Figure 7.1.
LAMEPS is a result of continuation and improvement of the targeted ensemble prediction system, TEPS (Frogner and Iversen 2001). In TEPS, the ensemble members are
generated by maximizing the total energy norm of perturbations in a targeted specific area
of interest: northern Europe and parts of the north Atlantic Ocean (Frogner and Iversen
2001). It uses a singular vector (SV)-based method to identify the directions of fastest
growth (Buizza and Palmer 1995). Singular vectors maximize growth over a finite time
interval and are consequently expected to dominate forecast errors at the end of that interval and possibly beyond. Here the optimisation interval is 48 h (Frogner and Iversen
2001). LAMEPS uses the same perturbations as TEPS, but it uses a forecast model with
finer spatial resolution. The reason for using finer resolution model is to obtain better
descriptions of ground-surface properties and dynamical features responsible for significant weather (Frogner and Iversen 2002). It is hoped that a mesoscale LAMEPS utilizing
global perturbations targeted on northern Europe would enable better forecasts of possible extreme events compared to TEPS or pure deterministic forecasts. The development
of LAMEPS was encouraged by a preliminary feasibility study by Frogner and Iversen
(2002).
Each LAMEPS member is generated by perturbing only the initial and lateral bound-
Ensemble storm surge data assimilation and forecast
Figure 7.1: LAMEPS and DCSM area.
ary fields of the HIRLAM analysis. The perturbations are taken from the difference of
each TEPS member relative to the TEPS control. These are provided once every day at
12 UTC. However, LAMEPS is started with a time-lag of 6 hours from these data due to a
delay in producing and disseminating them. The run at 18 UTC has forecast length of 60
hours with lateral boundary conditions from TEPS used every third hour (Frogner 2008,
personal communication).
Some studies have been done to evaluate the quality of the ensemble in terms of
precipitation rates, mean-sea-level pressure, geopotential height, two-metre temperature
(Frogner and Iversen 2002; Frogner et al. 2006). In our study, we are interested especially
in the 10 wind speed output of the LAMEPS to be used as input to the DCSM. Although
no study has been published yet regarding the verification of LAMEPS with respect to
wind, the LAMEPS has been operational and routine verification is also done on wind
every 3 months.
Shown also in Figure 7.1 is the DCSM area. To use the ensemble wind output of the
LAMEPS, the wind field is projected to the DCSM area using a routine available at the
Norwegian Meteorological Institute. For the area of DCSM, which is covered by the area
113
7.4 Wind stress error statistics
114
of LAMEPS, the wind speed is interpolated linearly to the DCSM grid. For the area of
DCSM outside LAMEPS area, which resides on the lower left and right corners of the
DCSM area, the wind field is determined by Archimedean extrapolation. It is an iterative
method, where the wind field in the area of extrapolation is initialized by the average
value of the wind field in the active area. The estimate is then improved by an iterative
procedure by making use of the value of the wind field nearby the respective points in the
extrapolation area. The extrapolation is likely to introduce more uncertainty in the wind
field. However, it is expected not to give significant influence to the area of our interest,
which is around the Dutch coast.
LAMEPS data is available with time interval of six hours, while the DCSM is integrated with time step of 10 min. To provide wind data with time interval 10 min, the
resulting wind stress is interpolated linearly in time. The interpolation is performed internally by the simulation software used to run the DCSM.
7.3.3 Water-level data
Observed water level for this study is available from 16 stations. Some stations are located
along the British coast, the others along the Dutch coast and some stations are located off
shore. The water level is measured every 10 minutes in all locations. The observed water
level data is preprocessed to eliminate erroneous data before being used for evaluating the
model performance as well as for data assimilation. For the experiments described later,
eight stations are used as assimilation stations, where data is used in data assimilation.
The other stations are used as validation stations, where data is not assimilated but used
only for validation (Figure 5.2).
We note here that the aim of the model is to give accurate prediction of the storm
surge. The observed water level, however, gives the total water level, which is the combination between the tidal component and storm surge. If the observations are used as they
are, the data assimilation scheme will also correct for errors in the astronomical tide. In
turn, this correction will disturb the surge. In order to reduce the effect of the tidal component on the data assimilation and evaluation, we remove the tidal component from the
observations by replacing the harmonic analysis tide in the measurement locations with
the astronomical tide as calculated by the DCSM (Gerritsen et al. 1995). By astronomical
tide calculated by the DCSM, we mean the periodic simulated water level produced by
running the DCSM without any wind forcing. These adjusted water level observations
are used for data assimilation as well as evaluation of the resulting forecasts ensemble
throughout this study.
7.4 Wind stress error statistics
We would like to use the LAMEPS 10 m wind speed to determine the forecast error
statistics of the DCSM. The 10 m wind speed is used to compute the wind stress as input
Ensemble storm surge data assimilation and forecast
115
1
1
0.8
0.8
0.6
0.6
Autocorrelation
Autocorrelation
forcing to the DCSM. This section presents the resulting estimates of these statistics. The
10 m wind speed ensemble used in this study is the one in the period of August 2007 January 2008.
In this study we focus ourselves to determining error covariance by using the resulting
storm surge forecast ensemble. Nevertheless, we also need to estimate the correlationtime parameter of the wind error from the wind stress ensemble. Ideally, the correlationtime parameter should be determined at each grid point in the model domain and recomputed at different times according to the flow. However, in this study we assume that
the correlation-time parameter is spatially uniform and constant in time. By using the
algorithm described in Section (7.2.1), we have computed the estimated colored-noise
realizations for the whole period where data is available in this experiment. It is now
possible to compute the empirical correlation function by using these realizations. The
correlation-time parameter is estimated by curve-fitting the correlation function of a theoretical AR(1) process to the empirical one. The procedure is repeated for each grid point
in the model domain. The spatial average of the correlation-time parameter is found to
be about five hours and will be used in the subsequent experiments. Figure 7.2 shows a
typical autocorrelation function at one location above the sea area of the DCSM.
0.4
0.4
0.2
0.2
0
0
−5
−4
−3
−2
−1
0
Time [day]
1
2
3
4
5
−5
−4
−3
−2
−1
0
Time [day]
1
2
3
4
5
Figure 7.2: Autocorrelation function of the wind noise at one location above the sea area
of DCSM, (a) u-direction and (b) v-direction. The full-line is the empirical autocorrelation
function, while the dashed-line is the autocorrelation function of an AR(1) process with
correlation-time about 5 hours.
The error covariance estimate is computed using ensemble within a shorter time window, which corresponds to data assimilation period, and it is recomputed for different
time windows. One question to be answered in this step is how long the time window
should be for computing such estimate. Our interest is to produce covariance estimate,
7.5 Hindcast experiment
116
which reflects the true error covariance. While the true statistic is unknown, it is known
to be dependent on the actual system state (Dee 1995). Ideally, we would like to use the
ensemble only at a single time to estimate the error covariance at that time, like in EnKF
(Evensen 1994). However, for small number of ensemble members like in our case, this
approach suffers from sampling error. Basically, the time-window should be sufficiently
short to capture the actual change in the forecast error covariance, but should be long
enough to reduce sampling error.
Figure 7.3 illustrates the effect of using an ensemble within an extended time on the
spatial correlation estimate of the wind stress error. It shows some examples of wind stress
error covariance estimates obtained using a single-time ensemble and the ones obtained
using wind stress ensemble in a 24 h time window. The estimates obtained using a singletime ensemble have a complex structure, where the isolines are tightly packed. Moreover,
they show significantly non-zero correlation at distant locations away from the reference
point, which is probably a spurious correlation due to sampling error. On the other hand,
the estimates obtained using ensemble from a 24 h interval have a much smoother structure. The area with non-zero correlation is more localized around the reference point
and the correlation decreases with distance from the reference point. Moreover, the short
duration structure at distant locations have significantly been removed as well. Nevertheless, for this example, there are still non-zero correlations at distant locations from the
reference point for the meridional direction. This indicates the need of using a localization technique to confine the area with non-zero correlations to a limited area around the
reference point.
Figure 7.3 also illustrates that the spatial correlation of the wind stress error has different structure for zonal and meridional directions. Moreover, we have estimated the wind
stress error covariance at different time windows and observed that the estimates evolve in
time (see, for example, Figure 7.4). All these together are different from the ones used in
the operational system, where the error covariance is assumed to be isotropic and constant
in time. The question remains whether it will give more accurate analyses and, in turn,
more accurate forecasts. We will see this later in the next sections.
7.5 Hindcast experiment
We have performed a hindcast experiment using the procedure and data described earlier.
This section presents and discusses the results of the experiment. The period of experiment is from 01 August 2007 until 31 January 2008. The DCSM is run starting from a
rest situation. To eliminate the spin-up effect, the DCSM is run for 17 days before the
hindcast experiment is started. The model is run by forcing the DCSM by each member
of the 10 m wind speed ensemble. Therefore, at the end of day 17, there are 21 model
states as initial conditions for each ensemble member as the starting points for the hindcast
experiments. Note that, except the control run, the perturbed members of LAMEPS are
actually disconnected in time. Each perturbed member of the wind ensemble for driving
Ensemble storm surge data assimilation and forecast
0.4
0
0.6 .4
0.2
−0.4
0.4
0
0 .2
0
−0
.2
0.6
0.4
−5
0
Longitude [deg]
5
10
0
62
(d)
0
−0.2
0
0
.4
0.2
0.8
−0.4
0
0.4
−0.2
0.2
−10
62
0.4
.2
−0
.4
0.6
10
0
0.6 0
0.2
.4
−0
0.2
0
0
5
00.80.6
.4
0.4
0
0
50
0
Longitude [deg]
(c)
0.2
−0
.2
0.2
−0.4
.2 0.2 0.6 0.8
48
−5
0.8
52
48
−10
0.6
0
0.2
0
0.4
.2
−0
0
0.8
−0.4
4
0
−0.2
−0
50
0
0
2
−0.
0
0.
0.2
0.4
0
−0.4
−0
0 .2
0
.2
0
2
0.6 0.
0
00.2
.4
−0
52
.2
−0
0.2
54
0.8 0.6
0 0.2 0.4 0
2
−0. 0.8
.20.6
00.4
0.6
−0
56
6
0.
0.4
4
0.
0
0
0.8.6
0.6
0
0
−0.2
0.4 2
.
−0.40.2
0.2
0.2 0.4
0
−0.2
−0.4
.4
−0
0
00.2−0.2
.4
−0.4
−0.2
0
.4
0
2
0 0.04.6
0.80 0.2
0.4
0
58
0.2
0.8
0 0.
−0.4
0.4
0.6
−0
0.2
0.4
−0
.4
0.2
−0
54
−
0. 0.
4 2
0.2
.2
−0
.2
00
0.4
0.6
0
8
0.
0.2
0.4
0
0.2
0.8
−
58
56
60
0
0
0.2
−0.2
(b)
−0.4
−0.2
Latitude [deg]
.4
0
60
62
−0
0 0.2
−
0.2
Latitude [deg]
(a)
0.2
0.2
62
117
0
0
60
−0.4
2
0
−0.
−0.2
0
0
0.2
0
0.2
0
0.8
0.8
0.6
4
0.
0.4
50
0
0
0
0.2
0
0.6
0
50
0.6
0
0.2
0.4
Latitude [deg]
−0.2
0
0.2
0.2
Latitude [deg]
2
0.4
0.6
0
2
0.
0
0
0.
0
2
52
0
0
0.6
0
0
54
0.
4 0.6
0.
52
0
−0.2
0.4
0
0.2
0
6
.2
0
0.
54
0
0.4
4
0.
−0
0.4
0
0
56
0
0
−0.2
0.4
0
−0.2
0
56
0.2
58
0
58
0
0
0
0
0.4
0
0
0.2
0
0.2
4
0.
0.2
60
0
48
0
48
−10
−5
0
Longitude [deg]
5
10
−10
−5
0
Longitude [deg]
5
10
Figure 7.3: Spatial correlation of the wind noise with respect to a location at Hoek
van Holland. Figure (a) and (b) are obtained using a single time ensemble at 18:00, 07
September 2007, while (c) and (d) obtained using ensemble between 18:00, 07 September
2007 - 18:00, 08 September 2007. Figure (a) and (c) are for zonal direction, while (b) and
(d) meridional. Contour interval is 0.2.
the DCSM from rest is obtained by connecting the first 18 h segment, starting from respective analysis time, of that member within the whole 17 days initialization period. This
procedure potentially produces unrealistic jumps in the resulting wind timeseries at connecting times. However, since the length of each segment is reasonably short, we expect
that this discontinuity will not be too large. In fact, we have observed that this procedure
produces a smooth ensemble of DCSM outputs. Moreover, connecting the wind field at
18 to the initial wind ensemble at the next forecast period is found to reduce the bias in
the wind timeseries significantly, which grows larger otherwise.
The water level observational error standard deviation is set to 0.10 m, which is equal
7.5 Hindcast experiment
118
−0
(b)
60
−0.4
0.4
0.4
0.2
0.4
5
10
0.8
0.2
0
−10
62
0.2
0
0.2
0.2
0
0
Longitude [deg]
0 0.40.6
0.2
0
0.2
−5
0.6
0.6
0
Latitude [deg]
0
0.4
0.2
0.4
0.6
Latitude [deg]
−00.2
0.6
−0.2
48
−10
6
0
0.2
0.2
−0.4
50
2 0.4
0.
44
0. 0.
−0.2
0.
0.4
0.4
0 0.2
0.4
0.2
4
0.
0.8
0.2
0.2
0.2
2
−0.
0
48
0.2
4
0.
54
0.
2
50
6
0.2
−0.2
2
0.
0
6
0.
0.
0.4
0.6
00.20.4
56
52
0.4
0.2
0.4
0
0.6
2
0. 0
0.4
0.6
0.2
0.8
0
2
−0.2
0
0.4
−0.4
0.04
0.6
0.
58
2
0.
0
0
.2
−0
0
0.4
4
0.
0.2
−0
0. .2
2
−0.4
0
−0
.2
0
2
0
0.4
−0.2
0
0
0.2
54
0.
0
−0.2
0
0
0.2
0.2
−0.4 0.4
−
.2
−0
0
58
52
−0.4
60
56
.2
.4
−0
(a)
0
62
−0.4
62
−5
0
Longitude [deg]
5
10
62
0
0
0
(d)
−0
.2
(c)
2
0.
.2
0.4
0.2
2
0
0.
0.2
0.4
0.6
0.2
0.2
52
8
0.
0.2
0
0.4
48
−5
0
Longitude [deg]
5
10
0.
2
0
−10
0
0
0
0.2
48
.2
.4
0
−0
0.2
−0
0
50
−0.2
0.4
50
0.8
0.4
0
0
0.2
0.6
52
0.2
0.4
54
0.2
Latitude [deg]
0.4
0
0.4
0.
2
0.4
2
54
56
0.6
2
0.
0.
4
0.
0.6
0.2
0.4
0.2
0.2
0
0.2
56
0.2
−0
58
0.2
4
0.
Latitude [deg]
0
0.2
0.2
2
0.
0
0.4
0.4
0.
0.4 2
0.2
58
0.2
0
0.2
0
60
0
60
−10
−5
0
Longitude [deg]
5
10
Figure 7.4: Spatial correlation of the wind noise with respect to a location at Hoek
van Holland. Figure (a) and (b) are obtained using a single time ensemble at 18:00, 12
September 2007, while (c) and (d) obtained using ensemble between 18:00, 12 September
2007 - 18:00, 13 September 2007. Figure (a) and (c) are for zonal direction, while (b) and
(d) meridional. Contour interval is 0.2.
to the one used in the operational system. The time window for computing the forecast
error covariance is chosen to be 24 h. Because the LAMEPS is analyzed once per day
at 18:00, the time window is chosen to start from 18:00 at one day up to 18:00 at the
next day. At each time window, the observation data is assimilated at each time step. For
covariance localization, the decorrelation length is set to 200 km. This value is chosen to
leave the error covariance structure, obtained empirically, around the respective reference
point unchanged, while the covariance at distant locations is reduced to practically zero.
Ideally, the data assimilation system should produce analysis ensemble with minimum
analysis ensemble mean error with an ensemble spread, which can represent the statistics
Ensemble storm surge data assimilation and forecast
of this error. However, since we are working with real data, we do not have any access
to the truth. Therefore, it is impossible to check the error statistics. An alternative way
for ensemble verification, which is common in practice, is to verify it against analysis
data. In our study, however, we choose to verify the ensemble against observation data.
Verification against observation has several advantages. Unlike the verification against
analysis, verification against observation avoids being affected by the local systematic
errors of the analysis. It also provides a validation that is more independent from the
forecast and analysis systems.
Different procedures are performed for evaluating the property of the analysis ensemble. The first one is to evaluate the resulting ensemble spread time-series, to check the effect of iteration in computing the Kalman gain and to see if there is any indication whether
the ensemble would collapse. The second evaluation is a standard check of Kalman filter performance, by checking if the water level innovations have the same statistics as
the theoretical ones (Maybeck 1979). The other checks are performed on evaluating the
two most basic quantities of an ensemble system: spread and skill (Buizza and Palmer
1998). The ensemble skill is defined in term of root-mean-square (rms) water-level residual. Similarly, the ensemble spread is measured as the rms of the difference between the
ensemble members and the ensemble mean. We check the reliability property of the analysis ensemble by using rank histograms (Hamill 2001). The final check is performed in
term of root mean square (rms) of the water level residual.
To check the effect of observation error standard deviation on the resulting analyses
ensemble, the experiment and evaluation are performed twice, each one with different
value of observation error standard deviation. The standard deviation of observation error
used here is σobs = 10 cm and σobs = 5 cm. At each experiment, the observation error
standard deviation is set the same for all assimilation stations.
7.5.1 Time series of ensemble spread
Before we perform any data assimilation experiment, we first check the characteristic of
the resulting storm surge ensemble obtained by forcing the DCSM with each member
of the LAMEPS wind. Figure (7.5) shows the evolution of water level ensemble spread
in time on top of a plot of surge observed at Hoek van Holland. The observed surge is
obtained by subtracting astronomical tides component from observed total water level.
For this presentation, the observed surge is segmented into 24 h time intervals as used in
the hindcast experiment and the average over each 24 h interval is shown. To give a better
visibility of the ensemble spread variation, the observed surge is scaled by a factor of 0.2.
The first characteristic than can be seen from this figure is that the resulting storm surge
ensemble has a spread, which is varying in time. As expected, it is in contrast with that
in the operational system, where the error variance is assumed to be constant in time. The
second thing that this figure suggests is that the ensemble spread is large at times when
the observed surge is large positive. The correlation between the ensemble spread and
observed surge at different locations is found to be around 0.6. This indicates that large
119
7.5 Hindcast experiment
120
spread tends to occur at times of large positive surge.
hoekvhld
0.3
ens spread
obs surge
Water level [m]
0.2
0.1
0
−0.1
−0.2
10
20
30
40
50
60
70
Number of days since 01 August 2007
80
90
100
180
190
0.3
Water level [m]
0.2
0.1
0
−0.1
−0.2
90
100
110
120
130
140
150
160
Number of days since 01 August 2007
170
Figure 7.5: Timeseries of water-level ensemble spread (std) without data assimilation
averaged over all observation stations and observed surge at Hoek van Holland scaled by
a factor of 0.2.
The assimilation of observation data should reduce the uncertainty of model results,
as suggested by the Kalman filter equation. For our case, this should be reflected by the
reduction of ensemble spread after data assimilation. Figure (7.6) shows the time series
of ensemble spread before and after data assimilation using Kalman gains estimated from
different iterations. This figure clearly shows that the ensemble spread is reduced after
data assimilation consistently throughout the run. The reduction is more significant at
times where the ensemble spread is large. It suggests that the more uncertain the model
result, the more significant the effect of assimilated observation is in reducing the analysis uncertainty. As expected from theory, the Kalman filter will put more weight to the
observation if the model result is more uncertain. This will result in larger correction on
the model result for producing the analysis ensemble. Moreover, the ensemble spread
is reduced further by data assimilation using the Kalman gain from the first and second
Ensemble storm surge data assimilation and forecast
121
closed-loop steps. This suggests that iterating the estimation of the forecast error covariance, hence also the Kalman gain, helps reduce the uncertainty of the analysis. Further,
Figure (7.6) also shows that there is no indication that the ensemble would collapse. This
suggests that LAMEPS wind stresses are effective in maintaining the DCSM forecast uncertainty in a reasonable level.
0.1
Ensemble spread [m]
without DA
open−loop
1st closed−loop
2nd closed−loop
0.08
0.06
0.04
0.02
0
10
20
30
40
50
60
70
Number of days since 01 August 2007
80
90
100
180
190
Ensemble spread [m]
0.1
0.08
0.06
0.04
0.02
0
90
100
110
120
130
140
150
160
Number of days since 01 August 2007
170
Figure 7.6: Time series of water-level ensemble spread averaged over all observation
stations. Each line, with decreasing line-width, represents the ensemble spread without
data assimilation, with data assimilation using open-loop gain, 1st closed-loop gain and
2nd closed-loop gain, respectively.
7.5.2 Innovations statistics
One way of checking if a Kalman filter performs well is to check if the innovation realizations have the same statistical descriptions as the theoretical ones. This is sometimes
called residual monitoring (Maybeck 1979). From the Kalman filter equations, it can be
shown that sequence of the innovations have zero mean and variance σ 2 = (HPf H⊤ + R).
7.5 Hindcast experiment
122
Since the Kalman filter is based on the assumption that all error is Normally distributed,
we can check the Kalman filter performance by simply checking the sequence of the innovations: about 68% of the samples lie within the 1σ bound, 95% within the 2σ bounds,
etc.
We have computed the innovation realizations as the difference between ensemble
mean and observation. Figure 7.7 shows the sequence of the water level innovations and
the 1σ bound within a little portion of the experiment period for one of the assimilation
stations, obtained with σobs = 10 cm. Shown also in the picture is the percentage of the
residuals lying within the 1σ bound over the whole period of the experiment. The effect of
data assimilation can be seen to reduce the bias in the innovation sequence and bring the
innovation more into the 1σ bound. After data assimilation, the percentage of innovation
samples within the 1σ reaches more than 90%, which is bigger than it should be. This
indicates that some parameters may not have optimally been tuned.
Figure 7.8 shows the innovation sequence and the 1σ bound obtained with σobs = 5
cm. Here, we see that the realizations of the innovation have better similarity with the
theoretical statistics. This indicates that in the first experiment, the observation error
standard deviation σobs was too large and that the performance of the Kalman filter can be
improved by reducing σobs .
7.5.3 Rank histograms
Another diagnostic for ensemble reliability is the rank histogram (Hamill 2001). Rank
histograms are generated by repeatedly tallying the rank of the verifying observation, at
each time step, relative to values from an ensemble sorted from the lowest to the highest.
The rank histogram permits a quick examination of some qualities of the ensemble. A
reliable ensemble will show up as a flat rank histogram. Consistent biases in the ensemble
will show up as a sloped rank histogram, while a lack of variability in the ensemble
will show up as a U-shaped, or concave, population of the ranks. On the other hand, if
the ensemble dispersion is too big, then the observation will populate the middle ranks
more than the extremes’. Non-uniform rank histograms implies that the ensemble relative
frequency will be a poor approximation to probability (Wilks 2006).
As pointed by Hamill (2001), Sætra et al. (2004) and Candille et al. (2007), in general, verification against uncertain observation data requires the inclusion of the observation uncertainty. Hamill (2001) specifically indicated that the shape of the rank histogram
is affected by the observation noise as well. In practice, imperfect observations, contaminated by instrumentation error for example, will be used for verification. Moreover,
even if the instrumentation error is small, there is always representativeness error, which
causes the model results to deviate from the observation. Hamill (2001) suggested that
if the observational errors are a significant fraction of the spread in the ensemble, then
rank histograms should not be generated by ranking the observation relative to the sorted
ensemble. Instead, the rank histogram should be generated by ranking the observation relative to a sorted ensemble with random observational noise added to each member. The
Ensemble storm surge data assimilation and forecast
residual
std
0.2
0.15
Percentage within one−sigma: 65.6079%
0.15
Water level residual [m]
Water level residual [m]
0
−0.05
0
−0.05
−0.1
−0.15
−0.15
−0.2
−0.2
146
147
148
Number of days since 01 August 2007
149
150
145
residual
std
0.2
146
147
148
Number of days since 01 August 2007
149
Percentage within one−sigma: 92.988%
0.15
150
residual
std
0.2
Percentage within one−sigma: 91.3022%
0.1
Water level residual [m]
0.1
Water level residual [m]
0.05
−0.1
0.05
0
−0.05
0.05
0
−0.05
−0.1
−0.1
−0.15
−0.15
−0.2
−0.2
145
Percentage within one−sigma: 94.3074%
0.1
0.05
0.15
residual
std
0.2
0.1
145
123
146
147
148
Number of days since 01 August 2007
149
150
145
146
147
148
Number of days since 01 August 2007
149
150
Figure 7.7: Sequence of water level innovation and the +/- 1σ interval for assimilation
station at Hoek van Holland, with σobs = 0.10 cm: (a) without data assimilation, with
data assimilation using (b) open-loop, (c) 1st closed-loop and (d) 2nd closed-loop Kalman
gains.
specification of the observation noise standard deviation should not be a problem, since it
is already specified and used for the data assimilation system anyway. Following this suggestion, we perturbed each ensemble member with random noise drawn from observation
error distribution.
The resulting rank histograms obtained with σobs = 0.10 cm are presented in Figure
7.9. The rank histograms show that most of the observations are within the span of the
ensemble. Without data assimilation, however, the observations populate more the lower
ranks. For the period after November 2007, the rank histogram clearly indicate negative
bias, where the left-extreme is over populated and the population decreases with increasing ranks. The rank histograms for ensemble after data assimilation suggest that the effect
7.5 Hindcast experiment
124
residual
std
0.2
0.15
Percentage within one−sigma: 41.5588%
0.15
0.05
0
−0.05
0
−0.05
−0.1
−0.15
−0.15
−0.2
−0.2
146
147
148
Number of days since 01 August 2007
149
150
145
residual
std
0.2
Percentage within one−sigma: 74.4523%
0.15
147
148
Number of days since 01 August 2007
149
150
residual
std
Percentage within one−sigma: 71.8137%
Water level residual [m]
0.1
0.05
0
−0.05
0.05
0
−0.05
−0.1
−0.1
−0.15
−0.15
−0.2
−0.2
145
146
0.2
0.1
Water level residual [m]
0.05
−0.1
0.15
Percentage within one−sigma: 78.3614%
0.1
Water level residual [m]
Water level residual [m]
0.1
145
residual
std
0.2
146
147
148
Number of days since 01 August 2007
149
150
145
146
147
148
Number of days since 01 August 2007
149
150
Figure 7.8: As Fig.(7.7), with σobs = 0.05 cm.
of data assimilation is to reduce the bias between observation and model results. This
is clearly illustrated for the histograms associated with data after November 2007. The
over populated extreme ranks are shifted to populate the middle ranks after data assimilation. There is not much difference between the pattern of the rank histograms after data
assimilation using Kalman gain obtained from open-loop and 1st closed-loop iteration.
After data assimilation, the rank histograms have a bell-like shape, which suggests that
the ensemble spread is too large.
The rank histograms of the ensemble analysis with σobs = 0.05 cm have different
shape from the one with σobs = 0.10 cm, as shown in Figure 7.10. Without data assimilation, more observation now falls in the extreme ranks, the extreme lower rank being
more populated than the upper rank. The effect of data assimilation can be seen again
as to reduce the bias by bringing more observation to the middle ranks. The figures also
show that implementing gain estimates from latter iterations produces rank histograms,
Ensemble storm surge data assimilation and forecast
0.14
125
0.14
without data assimilation
0.12
with open−loop gain
0.12
0.08
0.08
P(rank)
0.1
P(rank)
0.1
0.06
0.06
0.04
0.04
0.02
0.02
0
1
4
7
10
13
16
19
0
22
1
4
7
10
Rank
13
16
19
22
Rank
0.14
0.14
with 1st closed−loop gain
0.12
with 2nd closed−loop gain
0.12
0.08
0.08
P(rank)
0.1
P(rank)
0.1
0.06
0.06
0.04
0.04
0.02
0.02
0
1
4
7
10
13
16
Rank
19
22
0
1
4
7
10
13
16
19
22
Rank
Figure 7.9: Rank histograms for water level aggregated over all observations, with
σobs = 0.10 cm: (a) without data assimilation, with data assimilation using (b) open-loop,
(c) 1st closed-loop, and (d) 2nd closed-loop Kalman gains.
which are closer to flat. However, the rank histograms still contains overpopulated extreme ranks, which indicate possible bias in the model. Some evaluations indicate that
this may be due to the fact the performance of DCSM is different for different locations.
The area around the northern part of the Dutch coast, for example, is known to be difficult to reconstruct with the present grid size used in DCSM. This may have caused the
remaining bias indicated by the rank histograms.
7.5.4 Water level residual rms
A final check of the performance of the analysis system is to check the actual water level
residual, computed as the difference between observed data and the ensemble mean. A
good analysis system should produce an ensemble analysis with rms residual, which is
smaller than the one without data assimilation, at all locations. Here, we also check the
effect of data assimilation using the gain matrix estimated from different iterations. To
gain more insight about the performance of the analysis system, we split the presentation
7.5 Hindcast experiment
126
0.14
0.25
with open−loop gain
0.12
0.2
0.1
0.15
P(rank)
P(rank)
0.08
without data assimilation
0.06
0.1
0.04
0.05
0.02
0
1
4
7
10
13
16
19
0
22
1
4
7
10
Rank
13
16
19
22
Rank
0.14
0.14
with 1st closed−loop gain
0.12
with 2nd closed−loop gain
0.12
0.08
0.08
P(rank)
0.1
P(rank)
0.1
0.06
0.06
0.04
0.04
0.02
0.02
0
1
4
7
10
13
Rank
16
19
22
0
1
4
7
10
13
16
19
22
Rank
Figure 7.10: As Fig.(7.9), with σobs = 0.05 cm. Note that the figure in the first panel has
different scale from the others.
into two groups: assimilation stations and validation stations.
Figure (7.11) shows the rms residual at assimilation stations obtained with σobs = 10
cm. Shown also in the figure is the rms residual of deterministic forecast, obtained by
running the DCSM with the control run of LAMEPS. From this figure we see that there
is no significant difference between the analyses based on deterministic and ensemble
system. The advantage of using ensemble system compared to the deterministic one, as
pointed by Leith (1974), is not found here. A discussion about this is given in Section 7.7.
After data assimilation, the rms residual decreases significantly, showing that the assimilation system has successfully steered the model closer to the observed data. The
figure also shows that the smallest rms residual is obtained by assimilation system using
open-loop gain. The rms residual increases, with smaller increments, for assimilation
system using gain from later iterations. The analysis algorithm in Section 7.2 shows that
the uncertainty of model results is reduced after each iteration. After each iteration, this
results in an assimilation system, which puts less and less weight to the observed data.
Therefore, the rms water-level residual increases with iterations.
Ensemble storm surge data assimilation and forecast
127
0.2
Control run
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
0.18
0.16
RMS water level residual [m]
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
wick
nort
lows
sher
dovr
vlis
hoek
denh
Figure 7.11: RMS of water-level residual at assimilation stations, with σobs = 0.10 cm.
0.2
Control run
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
0.18
0.16
RMS water level residual [m]
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
abdn
eurp
cadz
room
ijmd
harl
huib
delf
Figure 7.12: RMS of water-level residual at validation stations, with σobs = 0.10 cm.
The rms residual at validation stations obtained with σobs = 10 cm, presented in Figure
(7.12), shows that the data assimilation system has also successfully forced the model to
be closer to the observed data in the validation stations. This indicates that the covariance
structure estimated by using the procedure described in this paper is spatially consistent
so that it distributes the innovations at assimilation stations consistently to other locations
as well. There is not much difference in the rms residual obtained using data assimilation
systems with the gain matrix from different iterations, except at the stations located in the
7.5 Hindcast experiment
128
northern part of the Dutch coast, i.e. Harlingen, Huibertgat, and Delfzijl.
The results obtained with σobs = 5 cm are shown in Figure (7.13)-(7.14) for assimilation and validation stations, respectively. Here, we see again that the analysis system
has successfully steered the model closer to the observation at all stations. At assimilation stations, the rms residual with data assimilation is smaller than those obtained with
σobs = 10 cm. The reason for this is that, with smaller standard deviation of observation
error, the analysis system puts more weight on the observation. As a result, it will pull the
model even closer to the data. Therefore, it produces smaller rms residual. However, in
the validation stations, the rms residual obtained with assimilation using open-loop gain
is larger than those with σobs = 10 cm. The rms residual decreases as the gain estimate
from latter iterations is used for assimilation. In the end, using the Kalman gain from the
2nd closed-loop step for data assimilation, there is not much difference between system
with σobs = 10 cm and σobs = 5 cm. This indicates that for smaller observation error
standard deviation, the iteration becomes important in obtaining better analysis accuracy.
0.2
Control run
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
0.18
0.16
RMS water level residual [m]
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
wick
nort
lows
sher
dovr
vlis
hoek
denh
Figure 7.13: RMS of water-level residual at assimilation stations, with σobs = 0.05 cm.
7.5.5 Discussion
For the system used in our study, it is traditional to set the observational error standard
deviation to σobs = 10 cm. This value has been used for the operational data assimilation
system with DCSM. It covers both the instrumentation as well as the representativeness
error. Moreover, it was also chosen to impose a better computational conditioning of the
data assimilation scheme. Perturbation of each ensemble member with observation error
is also required for ensemble data assimilation to avoid underestimation of the analysis
Ensemble storm surge data assimilation and forecast
129
0.2
0.18
0.16
Control run
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
RMS water level residual [m]
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
abdn
eurp
cadz
room
ijmd
harl
huib
delf
Figure 7.14: RMS of water-level residual at validation stations, with σobs = 0.05 cm.
error covariance matrix.
As pointed earlier, a good analysis system is the one, which produces analysis ensemble with small analysis ensemble mean rms residual, with a spread that can represent
the statistics of the residual. We have seen that the analysis systems implemented in this
study have successfully reduced the rms residual at all locations. The analysis systems
have also been shown to reduce the bias between model and observation. However, the
diagnostics used to check the reliability of the analysis ensemble have found to indicate
that the ensemble is overdispersive. On the other hand, as also argued by Leeuwenburgh
(2007), this finding may also indicate that the specified observation noise variance is too
large. In fact, considering the size of the ensemble spread in Figure 7.6, it can be seen that
the observation noise is more dominant in determining the total spread of the ensemble
used in the verification.
We have, therefore, performed also an experiment with σobs = 5 cm. The resulting
ensemble is shown to have better reliability property, indicated by a flat rank histogram.
However, the rank histograms also indicate that the ensemble is still biased. Nevertheless,
the results presented in the remaining of this paper will be the ones obtained using σobs = 5
cm.
7.6 Forecast experiment
We have also performed forecast experiments using the DCSM as the storm surge forecast
model and the forecast wind field of LAMEPS as input forcing ensemble. Here we are
interested in checking the effect of data assimilation on the forecast performance. Like
in the operational system, the data assimilation is used to provide more accurate initial
130
7.6 Forecast experiment
conditions for forecast runs. A more accurate initial condition is naturally expected to
produce more accurate prediction, at least for short term forecast. This section describes
and discusses the forecast performance of the ensemble system studied in this paper, in
term of ensemble spread and skill.
Following the fact that LAMEPS is analyzed once per day at 18:00, the forecast runs
in this experiment are also performed once a day, started from 18:00 up to the next 60 h.
A 24 h hindcast run is performed on each ensemble member to provide initial ensemble,
which represent the uncertainty of the initial condition, for each forecast run. That is, the
ensemble at final time of a hindcast period is used as initial ensemble for the next forecast
run.
Figure 7.15 presents the rank histograms of the forecast ensemble obtained with σobs =
0.05 m, aggregated over all observation locations, over all forecast runs in the interval 01
August 2007-31 January 2008. For the interval 0-6 h and 6-12 h, the rank histograms
show a U-shape indicating underdispersion. This is consistent with the rank histograms
of the analysis ensemble. After 12 h, the lower rank of the histograms become more populated and the rank histograms become sloped. The sloped rank histograms indicate that
after sometime the model results are biased. The same pattern is found with the setup
using σobs = 0.10 m. However, instead of showing underdispersion, the rank histograms
obtained with σobs = 0.10 m show overdispersion in the short term forecasts. The rankhistograms become sloped for longer forecast time.
We note also here that there is no visible difference between rankhistograms of forecast ensemble initialized from hindcasts using gain estimates obtained from different iterations nor from the one without data assimilation. As pointed earlier, connecting the wind
ensemble timeseries at 18 h to the initial ensemble of the next forecast run reduces the
bias significantly, resulting in a more reliable initial ensemble. For longer term forecasts,
the reliability is expected to be similar since the effect of initial condition will vanish after
sometime. This indicates that evaluating ensemble reliability alone is not sufficient for
checking the impact of data assimilation on forecast performance.
The other property of the forecast ensemble that we check is the ensemble forecast
skill. The measure of ensemble forecast skill used here is the rms of ensemble-mean
forecast water level residual. For illustration, Figure (7.16) presents the results associated
with observation station at an assimilation station located at Vlissingen and a validation
station at IJmuiden. The results are averaged over all forecast runs within the period of
01 August 2007 to 31 January 2008 and a 6 h running average is applied to the resulting
rms forecast residual timeseries. Shown in the pictures are the rms residual of ensemblemean forecast, preceded with and without hindcasts with gain estimated from different
iterations. Shown also in the pictures is the rms residual of the deterministic forecast, that
is obtained by forcing the DCSM with the control-run member of the LAMEPS wind,
without any data assimilation.
In general, the rms residual increases with forecast time, indicating that the forecast
performance degrades in time. This can be explained by the fact that after sometime
model results become negatively biased, as indicated by the rank histograms earlier. The
Ensemble storm surge data assimilation and forecast
interval 0−6 h
0.25
0.2
P(rank)
0.15
0.15
0.1
0.1
0.05
0.05
0.05
1
4
7
10
13
16
19
0
22
1
4
7
10
Rank
13
16
19
0
22
interval 18−24 h
interval 24−30 h
0.15
0.1
0.1
0.05
0.05
0.05
10
13
16
19
0
22
1
4
7
10
Rank
interval 36−42 h
13
16
19
0
22
interval 42−48 h
P(rank)
P(rank)
0.15
0.1
0.05
0.05
0.05
4
7
10
4
7
10
13
16
19
0
22
13
16
19
22
interval 48−54 h
0.15
0.1
1
22
0.2
0.1
0
1
0.25
0.2
0.15
19
Rank
0.25
0.2
16
interval 30−36 h
Rank
0.25
13
0.15
0.1
7
10
0.2
P(rank)
P(rank)
0.15
4
7
0.25
0.2
1
4
Rank
0.25
0.2
0
1
Rank
0.25
P(rank)
0.15
0.1
0
interval 12−18 h
0.25
0.2
P(rank)
P(rank)
interval 6−12 h
0.25
0.2
P(rank)
131
1
4
7
10
Rank
13
16
19
22
Rank
0
1
4
7
10
13
16
19
22
Rank
interval 54−60 h
0.25
P(rank)
0.2
0.15
0.1
0.05
0
1
4
7
10
13
16
19
22
Rank
Figure 7.15: Rank histograms of forecast ensemble for different forecast intervals,
initialized with hindcast using 2nd closed-loop gain, with σobs = 0.05 m.
pictures also suggest that the forecast residual contains a periodic component. The periodic component is likely to be due to the remaining astronomical tides components of
water level, which can not be predicted perfectly by the DCSM.
The performance of short term forecasts of the deterministic system is similar to that
of ensemble-mean forecast without hindcast. However, after sometime, they diverge from
each other and the deterministic forecast performs better than the ensemble-mean forecast
without hindcast. This shows that, unlike what is expected, the ensemble prediction sys-
7.6 Forecast experiment
132
vlissgn
RMS water level residual [m]
0.2
0.15
0.1
Deterministic
EPS without DA
EPS with open−loop gain
EPS with 1st closed−loop gain
EPS with 2nd closed−loop gain
0.05
0
0
10
20
30
Forecast horizon [h]
40
50
60
ijmdbthvn
RMS water level residual [m]
0.2
0.15
0.1
Deterministic
EPS without DA
EPS with open−loop gain
EPS with 1st closed−loop gain
EPS with 2nd closed−loop gain
0.05
0
0
10
20
30
Forecast horizon [h]
40
50
60
Figure 7.16: RMS of forecast water-level residual vs forecast horizons, for observation
station located at Vlissingen and IJmuiden.
tem degrades the average performance of the corresponding deterministic single forecast
system, instead of improving it. A possible explanation of why this occurs is discussed in
the next section.
The figures also suggest that applying data assimilation improves the performance of
short-term forecasts. The improvement occurs at all stations used in this study, at both
the assimilation and validation stations. This may indicate that all gain estimates are
reasonably consistent in spreading the correction to other model variables and locations.
As suggested by the figures, on average, the improvement obtained by implementing the
gain estimate from open-loop step is less than the ones obtained from later iterations.
Even if the analysis residual, as shown in the previous section, suggests that the hindcast
using open-loop gain gives initial conditions, which are closest to the observations, the
performance of forecasts starting from these initial conditions is found to deteriorate more
Ensemble storm surge data assimilation and forecast
quickly than the ones initialized with hindcast using gain estimates from later iterations.
This shows that the iterations of the gain estimating algorithm proposed in this study
improve the gain estimates to be more optimal in spreading the corrections to other model
variables and locations. Based on these figures, however, we see that there is hardly any
difference between forecasts initialized from hindcast using the 1st and 2nd closed-loop
gain estimates.
The length of forecast period, where the performance is improved relative to those
without data assimilation, is determined by the location of the stations. For the locations
along the Dutch coast, the improvement occurs up to about 12 h. This period is approximately equal to the time required by a gravity wave to travel from the open ocean, near the
north-west open boundary condition of the DCSM, to the Dutch coast. The data assimilation may have incorrectly updated the model states in this area. In turn, the incorrectly
updated state propagates as a gravity wave along the British coast to the Dutch coast and
deteriorates the forecast performance when it arrives at these locations. The effect of initial condition will vanish after sometime. The forecast performance eventually converges
to the performance of ensemble-mean forecast without data assimilation.
7.7 Comparison with a steady-state Kalman filter
In the previous sections, we have shown that the analysis systems based on the ensemble forecast have successfully brought the DCSM closer to observation data. Moreover,
short term forecasts initialized from those analyses have also been shown to have better
performance compared to the ones without data assimilation. Since the operational system couples the DCSM with a steady-state Kalman filter, a question remains as whether
the ensemble based analysis system performs better than the one based on a steady-state
Kalman filter. This section presents and discusses some results of the comparison.
For this comparison, we use a steady-state Kalman filter prepared in our earlier work
(Sumihar et al. 2009). Here, the Kalman gain is computed by using a two-sample Kalman
filter implemented on the DCSM with two different wind fields as input forcing. The two
wind fields are obtained from two different meteorological centers, namely the KNMI and
UK Met Office. The steady-state Kalman gain used for this comparison is obtained from
the fifth closed-loop iteration. At this step, the two-sample Kalman filter iteration has
converged and therefore, it provides an optimal gain for that steady-state Kalman filter
setup. For the experiments with the steady-state Kalman filter, we use the control run
of LAMEPS wind as input forcing to the DCSM. The period of experiment is from 01
August 2007 until 31 January 2008.
The comparison between the ensemble based analysis system and the one based on the
steady-state Kalman filter is performed in term of their skill, defined by rms water level
residual. Figure (7.17) gives an example of rms hindcast water level residual evolution
obtained using the ensemble based analysis system with open-loop gain and the steadystate Kalman filter. This figure shows that the hindcast performance of the two analysis
133
7.7 Comparison with a steady-state Kalman filter
134
systems is varying in time, where one performs better than the other at some time intervals,
while worse at other intervals. Shown also in Figure (7.17) is the observed surge, scaled
with a factor of 0.2. The figure suggests that the performance variation of the two analysis
systems is related to the size of the observed surge. Therefore, in the following we stratify
the surge into three groups according to the size of the observed surge: positive surge
defined as the time when observed surge is larger than 20 cm, negative surge when surge
is smaller than -20 cm, and small surge when surge is between -20 and 20 cm. The
comparison between the analysis systems is performed at each category.
hoekvhld
0.3
obs surge
rms residual (EPS with open−loop gain)
rms residual (SSKF)
Water level [m]
0.2
0.1
0
−0.1
−0.2
10
20
30
40
50
60
70
Number of days since 01 August 2007
80
90
100
180
190
0.3
Water level [m]
0.2
0.1
0
−0.1
−0.2
90
100
110
120
130
140
150
160
Number of days since 01 August 2007
170
Figure 7.17: Timeseries of rms of hindcast water-level residual obtained using ensemble based analysis system with open-loop gain and using steady-state Kalman filter and
observed surge at Hoek van Holland scaled by a factor of 0.2.
Figure (7.18) presents the resulting rms hindcast water-level residual averaged over all
assimilation stations. To show the impact of data assimilation, we show also in the figure
the rms water-level residual associated with deterministic run, where the DCSM is forced
by using the control run of LAMEPS wind. Moreover, shown also in the figure is the rms
water level residual associated with the ensemble system without data assimilation. From
Ensemble storm surge data assimilation and forecast
135
the results of control run and ensemble run without data assimilation, we can see that both
systems are similarly skillful in reproducing both positive and negative surge, and it gives
the smallest residual during the time of small surge. The rms residual associated with the
ensemble system without data assimilation is slightly smaller than that of the control run.
However, the opposite is observed for negative surge.
Assimilation Stations
0.16
Control−run without KF
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
Control−run with SSKF
0.14
RMS water level residual [m]
0.12
0.1
0.08
0.06
0.04
0.02
0
negative surge
small surge
positive surge
Figure 7.18: RMS of hindcast water-level residual averaged over all assimilation stations,
stratified into positive surge: surge larger than +20 cm, negative surge: surge less than -20
cm, and small surge: surge between -20 and 20 cm.
With data assimilation, for ensemble based analysis systems, we can see that the rms
residual increases as the gain estimate from later iteration is used, as expected from theory.
For positive surge, the rms residual associated with the ensemble based analysis systems
is consistently smaller than that of the steady-state Kalman filter. The figure suggests
also that both ensemble based and steady-state Kalman filter perform equally well during
small surge. However, the ensemble based analysis systems perform consistently worse
than the steady-state Kalman filter in reproducing negative surge. This may indicate that
the resulting storm surge ensemble during negative surge produces a worse estimate of
the forecast error covariance than what is used in the steady-state Kalman filter.
To gain more insight about the ensemble characteristics during the time of the three
surge types, we show in Figure (7.19) the resulting water level ensemble spread averaged
over all assimilation stations. Here, we show the ensemble spread at the 2nd closed-loop
iteration; hence, it gives the best estimate relative to the earlier iterations. We show also in
7.7 Comparison with a steady-state Kalman filter
136
the same picture the forecast error standard deviation with respect to water level averaged
over all assimilation stations as used in the steady-state Kalman filter. The figure clearly
shows that the ensemble spread is the smallest in the time of negative surge and the largest
at positive surge. Moreover, the figure also suggest that relative to the one assumed in the
steady-state Kalman filter, the resulting ensemble underestimate the forecast error during
the time of negative surge. This results in an analysis system with a poorer performance
than the steady-state Kalman filter.
0.025
Water level ensemble spread [m]
0.02
0.015
0.01
0.005
0
negative surge
small surge
positive surge
SSKF
Figure 7.19: Water level ensemble spread averaged over all assimilation stations.
Note that ensemble spread is just one determinant factor of the analysis performance.
Another factor influencing the analysis performance is the correlation structure. When
these factors are incorrectly specified, the analysis performance will degrade quickly.
Nevertheless, this figure suggests that there is a systematic underestimation of the ensemble spread in times of negative surge, which is also responsible for the poorer analysis
performance.
To make the presentation complete, we show in Figure (7.20) the rms hindcast waterlevel residual averaged over all validation stations. General pattern observed at assimilation stations can also be seen here, at least for the performance of analysis system using
the 2nd closed-loop Kalman gain estimate: the ensemble based analysis system performs
slightly better during the time of positive surge, equally well at small surge, and worse at
negative surge than the steady-state Kalman filter. This figure also suggests that, without
data assimilation, the DCSM performs worse during negative surge than positive surge.
Ensemble storm surge data assimilation and forecast
137
However, we have checked that this is due to the fact that the skill of DCSM is different
at different locations. The DCSM is known to perform inadequately in reproducing the
water level in the northern part of the Dutch coast. The detail geometrical structure in
the area can not be well resolved with the present grid size of the DCSM and the area is
shallow. It is even more difficult to predict the water level at times of negative surge.
Validation Stations
0.16
Control−run without KF
EPS without KF
EPS with KF, open−loop
EPS with KF, 1st closed−loop
EPS with KF, 2nd closed−loop
Control−run with SSKF
0.14
RMS water level residual [m]
0.12
0.1
0.08
0.06
0.04
0.02
0
negative surge
small surge
positive surge
Figure 7.20: As Fig.(7.18), for validation stations.
We have also performed some forecast runs initialized from hindcast runs using the
steady-state Kalman filter with the control-run member of LAMEPS wind as input forcing. For comparing the forecasts performance, we group the forecast runs, based on the
initial time of each forecast, into the three categories described earlier. For illustration,
we show in Figure (7.21) the rms of forecast water level residual at Vlissingen, averaged
over the number of times each category occurs within the whole period of experiment.
To increase the visibility of the graphs, we limit the presentation only up to 20 h forecast
time.
Shown also in the figures are the results associated with the deterministic forecast
obtained by running the DCSM with the control run of LAMEPS and with the ensemble
forecast without data assimilation. These figures suggest that during positive surge, the
ensemble forecast performs slightly better than the single member control forecast. In
the time of small and negative surge, however, the performance of ensemble forecast is
slightly worse than the control run. The advantage of ensemble system over a single
deterministic forecast (Leith 1974) is found only at the times of positive surge. Note that
7.7 Comparison with a steady-state Kalman filter
138
vlissgn (positive surge)
0.15
RMS water level residual [m]
0.13
Control run
EPS without DA
EPS with open−loop gain
EPS with 1st closed−loop gain
EPS with 2nd closed−loop gain
Control run with SSKF
0.11
0.09
0.07
0
2
4
6
8
10
12
Forecast horizon [h]
14
16
18
20
vlissgn (small surge)
0.15
RMS water level residual [m]
0.13
0.11
0.09
Control run
EPS without DA
EPS with open−loop gain
EPS with 1st closed−loop gain
EPS with 2nd closed−loop gain
Control run with SSKF
0.07
0.05
0
2
4
6
8
10
12
Forecast horizon [h]
14
16
18
20
vlissgn (negative surge)
0.15
RMS water level residual [m]
0.13
0.11
0.09
Control run
EPS without DA
EPS with open−loop gain
EPS with 1st closed−loop gain
EPS with 2nd closed−loop gain
Control run with SSKF
0.07
0.05
0
2
4
6
8
10
12
Forecast horizon [h]
14
16
18
20
Figure 7.21: RMS of forecast water-level residual vs forecast horizon at Vlissingen,
stratified into positive surge: surge larger than +20 cm, negative surge: surge less than -20
cm, and small surge: surge between -20 and 20 cm.
all these facts are observed at most other locations as well.
For the forecasts initialized from hindcasts, the figures indicate that, as shown earlier,
Ensemble storm surge data assimilation and forecast
all analysis systems produce improvements in short term forecasts over the control run as
well as over the ensemble forecast without data assimilation. The figures also suggest that,
in general, the forecasts initialized from hindcasts using the ensemble based analysis systems do not perform better than those using steady-state Kalman filter. At some locations,
like Vlissingen, the performance of some of the ensemble based systems is slightly better
than the steady-state Kalman filter. At other locations, their performance in forecasting
positive and small surges is comparable with the steady-state Kalman filter. However, the
steady-state Kalman filter is found to perform better than the ensemble-based systems for
predicting negative surge. This result is found consistently on the stations located along
the Dutch coast. A reason for this may be due to the difficulty with the atmospheric model
for reconstructing the seaward wind near the land-sea area. It is known, for example, that
the Dutch version of HIRLAM systematically underestimates the seaward wind above the
sea near the land-sea area (de Vries, 2009, personal communication). Another possible
reason for this may be due to the fact that this area is located relatively close to the southeast boundary of the LAMEPS. The wind initiated from this boundary is responsible to
simulate the negative surge along the Dutch coast as well as along the British east coast.
Since this area is located very closed to the south-east open boundary, the wind ensemble has not yet traveled sufficiently long when it arrives there. Hence, the ensemble has
not yet spread sufficiently and hence underestimate the model uncertainty. As the consequence, our method performs even worse than the steady-state in reconstructing negative
surges.
The resulting ensemble-based analysis performance may also be associated with how
the ensemble wind is generated. The ensemble is generated by perturbing the initial and
boundary conditions. The perturbation is expected to grow to different trajectories, especially when the weather system is unstable. The seaward wind in this area, that is the
southerly wind, is probably associated with a stable weather system. Thus, in this period,
the ensemble members do not grow apart sufficiently and the ensemble spread becomes
too narrow. Moreover, the wind ensemble is generated with an optimization interval of
48 h (Frogner and Iversen 2001). That is, the ensemble spread is expected to dominate
the forecast error at the end of 48 h interval and possibly beyond. In our study, only the
ensemble from the first 24 h is used to represent the error covariance. This naturally will
result in wind ensemble with a spread smaller than its optimal value. We might try to use
ensemble at later forecast time to compute the error covariance estimate. However, care
must be taken on bias, which appear in the model forecasts even only after the forecast
time 12 h as indicated by the rank histograms of the forecast ensemble on Figure (7.15).
The LAMEPS wind ensemble is generated by perturbing the initial and boundary conditions of the Norwegian version of HIRLAM. This implicitly assumes that the forecast
model is perfect. In fact, a numerical forecast model always contains model error. An
approach in generating forecasts ensemble, which take into account the model error is
by using the multi-model approach (e.g. Palmer et al. 2004; Hagedorn et al. 2005).
Multi-model ensemble may consist of models from different centers or different versions
of the same model (e.g. Houtekamer et al. 1996). The difference in the forecast mod-
139
7.8 Conclusion
140
els may simulate some of the flow-dependent unexplained intermittent sources of model
error. Some studies have shown that multi-model approach performs better than singlemodel ensemble system (Atger 1999; Ziehmann 2000). A further study is required using
multi-model ensemble approach to see if it may yield a better performance, both in the
atmospheric models producing wind ensemble and in the storm surge forecasting models.
It should be noted here that in most cases and locations, the ensemble forecasts performance initialized from hindcasts using 1st closed-loop Kalman gain estimate is better
than those using open-loop Kalman gain estimate and it is slightly improved further using
the 2nd closed-loop Kalman gain. This indicates the importance of the iterative procedure
for improving the Kalman gain estimate. However, this is not true for the station located
in the northern part of the Dutch coasts during negative surge. At these locations, the forecast performance deteriorates as Kalman gain from later iterations are used for analysis.
This is likely because this area is very shallow and so it is difficult to predict the water
level. It is even more difficult to predict the water level during negative surge.
7.8 Conclusion
We have presented in this paper our work with the Dutch Continental Shelf Model (DCSM),
an operational model used for storm surge forecast in the Netherlands. The uncertainty of
this model is known to be mainly due to uncertain wind input. To correct the model from
such error, in the operational system, the model is coupled with a data assimilation system,
which is based on steady-state Kalman filter. The steady-state Kalman filter is developed
based on the assumptions that the forecast error is stationary in time and homogeneous
in space. In this work, we have relaxed these assumptions by representing the forecast
uncertainty by using ensemble of storm surge forecasts. The ensemble of the storm surge
forecasts is generated by forcing the DCSM using each member of the 10 m wind speed
ensemble of the LAMEPS, which is an operational ensemble weather prediction system
at the Norwegian Meteorological Institute.
To account for the fact that wind error is correlated in time, each member of the wind
ensemble is considered to be deviated from the truth by a colored-noise term, modelled
by an AR(1) process. The temporal correlation parameter is determined empirically from
the wind ensemble.
The data assimilation used in this study is an extension of the two-sample Kalman
filter published earlier (Sumihar et al. 2008). The two-sample Kalman filter is an iterative procedure for computing steady-state forecast error covariance, hence a steady-state
Kalman gain, using two samples of the system of interest by averaging the difference of
the two samples over time. In this work, the algorithm is generalized to work with n samples, where n > 2. By using more ensemble members, the averaging can be done over
shorter time. The experiment is done by dividing the experiment period into shorter time
windows. The data assimilation procedure is done at each time window. As the first step,
we have chosen the time window length to be 24 h.
Ensemble storm surge data assimilation and forecast
The resulting analysis ensemble has shown to have smaller rms water-level residual
than the one without data assimilation, at both assimilation and validation stations. This
indicates that the estimated covariances have consistent structure that they have distributed
properly the corrections to locations other than the assimilation stations. The forecasts
initialized from these analyses are more accurate than the ones without data assimilation,
at least for the short term forecasts.
A comparison with a steady-state Kalman filter is performed. For the existing available data, the results do not suggest if the ensemble-based assimilation systems yield
much better performance than the steady-state Kalman filter. For large positive surges the
ensemble based prediction are a little better than the steady-state Kalman filter. On the
other hand, there is an indication that the steady-state Kalman filter performs better than
the ensemble-based systems for predicting negative surges. The ensemble spread during
negative surge is found to be too narrow. Further effort is required to develop methods for
generating better ensemble.
141
142
7.8 Conclusion
8
Conclusion
This thesis presents several methods to improve the accuracy of water level prediction. In this study, as the first method to improve water level forecast accuracy, we used
the method developed by Heemink et al. (1991) to predict periodic components of a harmonic analysis residual. The residual of harmonic analysis usually contains some periodic
components with slowly varying harmonic parameters. Each periodic component oscillates with a frequency of a tidal constituent. The idea of this method is to model the
harmonic analysis residual, called observed surge in this thesis, as the sum of several such
periodic components. A state-space representation and a measurement model have been
developed for this summation, the harmonic parameters being the state of the model. The
time-varying harmonic parameters of the periodic component from within an observed
surge timeseries are estimated by using a Kalman filter. The estimated harmonic parameters are used to forecast the periodic components, which in turn is used as correction to
the forecast made by harmonic analysis in the future. This method has been shown to be
able to extract the periodic component from within observed surge and it can be used to
correct the forecast using harmonic analysis. It has been found to work reasonably well
during calm period. However, during stormy period, the method might produce worse
prediction at some locations. This may be due to the nonlinear interaction between tides
and meteorological component. This hampers the method from increasing the accuracy
of overall forecast using harmonic analysis.
The other methods that we explored in this research fall into the category of data
assimilation for model reinitialization. Almost all data assimilation methods require the
specification of forecast error statistics. In reality, however, it is difficult to estimate such
statistics, since the true state in never known exactly. One common approach of estimating
the error statistics is by using the difference of several different similarly-skillful models
as proxy to the unknown true error.
To accommodate the study, we first developed a method for computing a steady-state
Kalman filter, in the case when only two samples of the system are available. We call this
method as two-sample Kalman filter. Using idealized experiments, where all statistics are
known and errors are generated using random generator, we have been able to show that
our method can reproduce the solution obtain by a Kalman filter.
We have implemented the two-sample Kalman filter with the Dutch Continental Shelf
Model (DCSM), a numerical model for storm surge prediction, which is operational in the
143
144
Netherlands. Here, the two samples of the DCSM are generated by running the DCSM
twice using two different wind fields produced by two meteorological centers. The first
wind field is produced by the Dutch version of High Resolution Limited Area Model
(HIRLAM) operational at the Royal Dutch Meteorological Institute (KNMI), while the
second one is from the UK Met Office (UKMO). The two wind fields are found to contain
systematic difference. Some care must be taken to eliminate this systematic difference
before implementing them to the two-sample Kalman filter. The data assimilation using
Kalman gain estimated from this difference has successfully forced the DCSM closer to
measurements, at all assimilation and validation stations used in this study. Forecasts
initialized from these analyses are found to have better accuracy than the ones without
data assimilation, at least for short terms. Due to the physics of gravity wave propagation
in the area of DCSM, the period of improved accuracy is limited to about 12 h for the
locations along the Dutch coast. We have also compared the forecast performance to those
obtained using a steady-state Kalman filter based on an isotropic assumption. The system
with UKMO-HIRLAM difference is found to outperform the isotropic-based steady-state
Kalman filter.
The two-sample Kalman filter is based on the assumption that model error is stationary, hence the forecast error covariance can be estimated by averaging over time. In fact,
model error is varying in time along with the actual state. To further improve the accuracy, we have extended the two-sample Kalman filter algorithm to work with more than
two samples. By working with more samples, the averaging can be done over a shorter
time. Therefore, it relax the assumption of stationarity. By doing this, the performance of
the analysis system can be improved further.
We have implemented the extended version of the two-sample Kalman filter by using
the wind ensemble of the LAMEPS. This is an ensemble weather forecast system operational at the Norwegian Meteorological Institute. It consists of 20 perturbed members and
one control run. Here, following the fact that LAMEPS is analysed once a day at 18:00,
the computation of the forecast error covariance is performed over 24 h time and all ensemble members. Like the steady-state Kalman filter, the ensemble-based analysis system
has also successfully steered the DCSM closer to measurement and short-term forecasts
are improved relative to the ones without data assimilation. We have also compared the
performance to the steady-state Kalman filter based on UKMO-HIRLAM difference. The
results of the experiments do not give a clear evidence if the ensemble-based method is
better than the steady-state Kalman filter. In contrast, there is an indication that the steadystate Kalman filter performs better than the ensemble-based system in predicting negative
surge, at least for the locations along the Dutch coast. The resulting ensemble spread
during negative surge is found to be narrower than the standard deviation assumed by the
steady-state Kalman filter. Further effort is required to develop methods for generating
better ensemble.
The LAMEPS wind ensemble is generated by perturbing the initial and boundary conditions of the Norwegian version of HIRLAM. This implicitly assumes that the forecast
model is perfect. For future research, we recommend to study using multi-model wind
Conclusion
ensemble as input forcing to the DCSM. By using different forecast models, some of the
unavoided model error may be better simulated. This may avoid the underestimation of
the forecast error covariance. Moreover, the forecast models should cover the DCSM
area in such a way that the DCSM area lies in sufficiently large distance from the model
boundaries. This may help improve the performance of predicting negative surge.
All the methods explored in this study focus on improving forecast accuracy by correcting the random component of the error. In fact, we have seen in this study that bias is
also a pronounced error component. Therefore, we recommend for the future studies to
focus on correcting systematic error component as well.
145
Bibliography
Alves, O. and Robert, C. (2005) Tropical pacific ocean model error covariances from
monte carlo simulations, Q.J.R.Meteorol.Soc., 131, pp. 3643–3658.
Anderson, J. L. and Anderson, S. L. (1999) A monte carlo implementation of the nonlinear
filtering problem to produce ensemble assimilations and forecasts, Monthly Weather
Review, 127, pp. 2741–2758.
Atger, F. (1999) The skill of ensemble prediction systems, Monthly Weather Review, 127,
pp. 1941–1953.
Bertino, L., Evensen, G., and Wackernagel, H. (2002) Combining geostatistics and
kalman filtering for data assimilation in an estuarine system, Inverse Problems, 18,
pp. 1–23.
Borovikov, A., Rienecker, M., Keppenne, C., and G.C.Johnson (2005) Multivariate error
covariance estimates by monte carlo simulation for assimilation studies in the pacific
ocean, Monthly Weather Review, 133, pp. 2310–2334.
Buehner, M. (2004) Ensemble-derived stationary and flow-dependent background error
covariances: Evaluation in a quasi-operational nwp setting, Q.J.R.Meteorol.Soc., 128,
pp. 1–30.
Buizza, R. and Palmer, T. N. (1995) The singular-vector structure of the atmospheric
global circulation, Journal of The Atmospheric Sciences, 52, 9, pp. 1434–1455.
— (1998) Impact of ensemble size on ensemble prediction, Monthly Weather Review, 126,
pp. 2503–2518.
Candille, G., Cote, C., Houtekamer, P., and Pellerin, G. (2007) Verification of an ensemble
prediction system against observations, Monthly Weather Review, 135, pp. 2688–2699.
Canizares, R., Madsen, H., Jensen, H. R., and Vested, H. J. (2001) Developments in operational shelf sea modelling in danish waters, Estuarine, Coastal and Shelf Science, 53,
pp. 595–605.
Daley, R. (1991) Atmospheric Data Analysis, Cambridge University Press.
Daley, R. and Barker, E. (2001) Navdas: formulation and diagnostics, Monthly Weather
Review, 129, pp. 869–883.
de Meyer, F. (1984) Time-variant tidal analysis and the kalman filter, Annales Geophysicae, 2, 5, pp. 513–528.
146
BIBLIOGRAPHY
Dee, D. P. (1991) Simplification of the kalman filter for meteorological data assimilation,
Q.J.R.Meteorol. Soc., 117, pp. 365–384.
— (1995) On-line estimation of error covariance parameters for atmospheric data assimilation, Monthly Weather Review, 123, pp. 1128–1145.
Demirov, E., Pinardi, N., Fratianni, C., Tonani, M., Giacomelli, L., and de Mey, P. (2003)
Assimilation scheme of the mediterranean forecasting system: operational implementation, Annales Geophysicae, 21, pp. 189–204.
Devenyi, D., Benjamin, S. G., and Weygandt, S. S. (2004) The ruc 3dvar: operational
performance and recent improvements, in: 20th Conf. on Weather Analysis and Forecasting, Seattle,WA, Amer. Meteoc. Soc., cD-ROM,P1.20.
Evensen, G. (1994) Sequential data assimilation with a nonlinear quasi-geostrophic
model using monte carlo methods to forecast error statistics, Journal of Geophysical
Research, 99, C5, pp. 10143–10162.
— (1996) Advanced data assimilation for strongly nonlinear dynamics, Monthly Weather
Review, 125, pp. 1342–1354.
— (2003) The ensemble kalman filter: theoretical formulation and practical implementation, Ocean Dynamics, 53, pp. 343–367.
Fenton, G. A. and Griffiths, D. V. (2007) Random field generation and the local average subdivision method, in: Probabilistic Method in Geotechnical Engineering (D. V.
Griffiths and G. A. Fenton, eds.), Springer Vienna, pp. 201–223.
Fisher, M. (2003) Background error covariance modelling, ECMWF Seminar.
Frogner, I.-L., Haakenstad, H., and Iversen, T. (2006) Limited-area ensemble predictions
at the norwegian meteorological institute, Q.J.R.Meteorol.Soc., 132, pp. 2785–2808.
Frogner, I.-L. and Iversen, T. (2001) Targeted ensemble prediction for northern europe
and parts of the northern atlantic ocean, Tellus, 53A, pp. 35–55.
— (2002) High-resolution limited-area ensemble predictions based on low-resolution targeted singular vectors, Q.J.R.Meteorol.Soc., 128, pp. 1321–1341.
Fu, L.-L., Fukumori, I., and Miller, R. N. (1993) Fitting dynamic models to the geosat sea
level observations in the tropical pacific ocean. part ii: A linear wind-driven model,
Journal of Physical Oceanography, 23, pp. 2162–2181.
Fukumori, I. (1995) Assimilation of topex sea level measurements with a reduced-gravity
shallow water model of the tropical pacific ocean, Journal of Geophysical Research,
100, C12, pp. 25027–25039.
147
148
BIBLIOGRAPHY
Fukumori, I., Benveniste, J., Wunsch, C., and Haidvogel, D. B. (1993) Assimilation of
sea surface topography into an ocean circulation model using a steady-state smoother,
Journal of Physical Oceanography, 23, pp. 1831–1855.
Fukumori, I. and Malanotte-Rizzoli, P. (1995) An approximate kalman filter for ocean
data assimilation; an example with an idealized gulf stream model, Journal of Geophysical Research, 100, C12, pp. 6777–6793.
Gaspari, G. and Cohn, S. (1999) Construction of correlation functions in two and three
dimensions, Q.J.R.Meteorol.Soc., 125, pp. 723–757.
Gerritsen, H. (2005) What happened in 1953? the big flood in the netherlands in retrospect, Phil. Trans. R. Soc. A, 363, pp. 1271–1291.
Gerritsen, H., de Vries, J., and Philippart, M. (1995) The dutch continental shelf model,
in: Quantitative Skill Assessment for Coastal Ocean Models. Coastal and Estuarine
Studies (D. Lynch and A. Davies, eds.), American Geophysical Union, vol. 47, pp.
425–467.
Hagedorn, R., Doblas-Reyes, F. J., and Palmer, T. N. (2005) The rationale behind the
success of multi-model ensembles in seasonal forecasting - i. basic concept, Tellus,
57A, pp. 219–233.
Hamill, T. M. (2001) Interpretation of rank histograms for verifying ensemble forecasts,
Monthly Weather Review, 129, pp. 550–560.
Hamill, T. M. and Withaker, J. S. (2001) Distance-dependent filtering of background error covariance estimates in an ensemble kalman filter, Monthly Weather Review, 129,
pp. 2776–2790.
Heemink, A. W. (1986) Storm surge prediction using Kalman filtering, Ph.D. thesis,
Twente University of Technology, The Netherlands.
— (1988) Two-dimensional shallow water flow identification, Appl. Math. Modelling, 12,
pp. 109–118.
— (1990) Identification of wind stress on shallow water surfaces by optimal smoothing,
Stochastic Hydrology and Hydraulics, 4, pp. 105–119.
Heemink, A. W., de Jong, B., and Prins, H. (1991) Adaptive harmonic analysis, Deutsche
Hydrographische Zeitschrift, 44, 2, pp. 91–106.
Heemink, A. W. and Kloosterhuis, H. (1990) Data assimilation for non-linear tidal models, International Journal for Numerical Methods in Fluids, 11, 2, pp. 1097–1112.
Hirose, N., Fukumori, I., and Yoon, J. H. (1999) Assimilation of topex/poseidon altimeter data with a reduced gravity model of the japan sea, Journal of Oceanography, 55,
pp. 53–64.
BIBLIOGRAPHY
Hollingsworth, A. and L¨onnberg, P. (1986) The statistical structure of short-range forecast errors as determined from radiosonde data. part i: the wind field, Tellus, 38A,
pp. 111–136.
Houtekamer, P. L., Lefaivre, L., Derome, J., Ritchie, H., and Mitchell, H. L. (1996) A
system simulation approach to ensemble prediction, Monthly Weather Review, 124,
pp. 1225–1242.
Houtekamer, P. L. and Mitchell, H. L. (1998) Data assimilation using an ensemble kalman
filter technique, Monthly Weather Review, 126, pp. 796–811.
— (2001) A sequential ensemble kalman filter for atmospheric data assimilation, Monthly
Weather Review, 129, pp. 123–137.
Jazwinski, A. H. (1970) Stochastic processes and filtering theory, Academic Press Inc.
Kalman, R. E. (1960) A new approach to linear filtering and prediction problems, Journal
of Basic Engineering, 82, 1, pp. 35–45.
Kantha, L. H. and Clayson, C. A. (1999) Numerical Models of Oceans and Oceanic Processes, Academic Press.
Leendertse, J. J. (1967) Aspects of a computational model for long-period free-surface
flow modelling, no. Report RM-5294-RR, Rand Corporation, Santa Monica, California,
USA.
Leeuwenburgh, O. (2007) Validation of an enkf system for ogcm initialization assimilating
temperature, salinity, and surface height measurements, Monthly Weather Review, 135,
pp. 125–139.
Leith, C. E. (1974) Theoretical skill of monte carlo forecasts, Monthly Weather Review,
102, pp. 409–418.
Lermusiaux, P. F. J., Chiu, C.-S., Gawarkiewicz, G., Abbot, P., Robinson, A., Miller,
R., Haley, P., Leslie, W., Majumdar, S., Pang, A., and Lekien, F. (2006) Quantifying
uncertainties in ocean predictions, Oceanography, 19, 1, pp. 81–93.
Lorenc, A. C., Bell, R. S., and Macpherson, B. (1991) The meteorological office analysis
correction data assimilation scheme, Q.J.R.Meteorol.Soc., 117, pp. 59–89.
Lorenz, E. N. (1963) Deterministic non-periodic flow, Journal of the Atmospheric Sciences, 20, pp. 130–141.
Madsen, H. and Canizares, R. (1999) Comparison of extended and ensemble kalman filters
for data assimilation in coastal area modelling, International Journal for Numerical
Methods in Fluids, 31, pp. 961–981.
149
BIBLIOGRAPHY
150
Maybeck, P. S. (1979) Stochastic Model Estimation, and Control Volume 1, Academic
Press Inc.
Mehra, R. K. (1970) On the identification of variances and adaptive kalman filtering,
IEEE Transaction on Automatic Control, AC-15, 2, pp. 175–184.
— (1972) Approaches to adaptive filtering, IEEE Transaction on Automatic Control,
pp. 693–698.
Miller, R. N., Ghil, M., and Gauthiez, F. (1994) Advanced data assimilation in strongly
nonlinear dynamical systems, Journal of the Atmospheric Sciences, 51, 8, pp. 1037–
1056.
Mitchell, H. L. and Houtekamer, P. L. (2000) An adaptive ensemble kalman filter, Monthly
Weather Review, 128, pp. 416–433.
Mitchell, H. L., Houtekamer, P. L., and Pellerin, G. (2002) Ensemble size, balance, and
mode-error representation in an ensemble kalman filter, Monthly Weather Review, 130,
pp. 2791–2808.
Ogata, K. (1990) Modern Control Engineering, Prantice-Hall, Inc.
Oke, P. R., Allen, J. S., Miller, R. N., Egbert, G. D., and Kosro, P. M. (2002) Assimilation of surface velocity data into a primitive equation coastal ocean model, Journal of
Geophysical Research, 107, 3122, doi: 10.1029/2000JC000511.
Palmer, T., Alessandri, A., Andersen, U., Cantelaube, P., Davey, M., Dlcluse, P., Dqu, M.,
Dez, E., Doblas-Reyes, F., Feddersen, H., Graham, R., Gualdi, S., Gurmy, J., Hagedorn, R., Hoshen, M., Keenlyside, N., Latif, M., Lazar, A., Maisonnave, E., Marletto,
V., Morse, A., Orfila, B., Rogel, P., Terres, J., , and Thomson, M. (2004) Development of a european multimodel ensemble system for seasonal-to-interannual prediction
(demeter), Bull. Amer. Meteor. Soc., 85, pp. 853–872.
Parish, D. F. and Derber, J. C. (1992) The national meteorological center’s spectral
statistical-interpolation analysis system, Monthly Weather Review, 120, pp. 1747–
1763.
Rogers, S. R. (1988) Efficient numerical algorithm for steady-state kalman covariance,
IEEE Transactions on Aerospace and Electronic Systems, 24, 6, pp. 815–817.
Sætra, O., Hersbach, H., Bidlot, J.-R., and Richardson, D. (2004) Effects of observations
errors on the statistics for ensemble spread and reliability, Monthly Weather Review,
132, pp. 1487–1501.
Sakov, P. and Oke, P. R. (2008) A deterministic formulation of the ensemble kalman filter:
an alternative to ensemble square root filters, Tellus-A, 60A, pp. 361–371.
BIBLIOGRAPHY
Schiereck, G. J. and Eisma, D. (2002) The rotterdam harbour: the connection with the
north sea and europoort, in: Engineered Coasts (J. Chen, K. Hotta, and H. J. Walker,
eds.), Springer, pp. 269–290.
Segers, A. J., Heemink, A. W., Verlaan, M., and van Loon, M. (2000) A modified rrsqrtfilter for assimilating data in atmospheric chemistry models, Environmental Modelling
and Software, 15, pp. 663–671.
Serafy, G. Y. H. E. and Mynett, A. E. (2008) Improving the operational forecasting
system of the stratified flow in osaka bay using an ensemble kalman filter-based
steady state kalman filter, Water Resources Research, 44, W06416, pp. 1–192, doi:
10.1029/2006WR005412.
Sørensen, J. V. T., Madsen, H., and Madsen, H. (2004a) Efficient kalman filter techniques
for the assimilation of tide gauge data in three-dimenstional modeling of the north sea
and baltic sea system, Journal of Geophysical Research, 109, C03017, pp. 1–14.
— (2004b) Operational data assimilation in marine modeling: water level forecasting with regularisation techniques, in: Hydroinformatics, World Scientific Publishing
Company, vol. 2, pp. 1197–1204.
— (2006) Parameter sensitivity of three kalman filter schemes for assimilation of water
levels in shelf sea models, Ocean Modelling, 11, pp. 441–463.
Stelling, G. S. (1984) On the construction of computational methods for shallow water
flow problems, Ph.D. thesis, Delft University of Technology, The Netherlands.
Sumihar, J. H., Verlaan, M., and Heemink, A. W. (2008) Two-sample kalman filter for
steady-state data assimilation, Monthly Weather Review, 136, pp. 4503–4516.
— (2009) Storm surge forecast with two-sample kalman filter, submitted.
Sun, C., Rienecker, M. M., Rosaty, A., Harrison, M., Wittenberg, A., Keppenne, C. L.,
Jacob, J. P., and Kovach, R. M. (2007) Comparison and sensitivity of odasi ocean
analyses in the tropical pacific, Monthly Weather Review, 135, pp. 2242–2264.
Talagrand, O. (1997) Assimilation of observations, an introduction, Journal of the Meteorological Society of Japan, 75, pp. 191–209.
ten Brummelhuis, P. G. J. and Heemink, A. W. (1990) Parameter identification in tidal
models with uncertain boundary conditions, Stochastic Hydrology and Hydraulycs, 4,
pp. 193–208.
Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S. (2003)
Ensemble square root filters, Monthly Weather Review, 131, pp. 1485–1490.
151
152
BIBLIOGRAPHY
Treebushny, D. and Madsen, H. (2005) On the construction of a reduced rank squareroot kalman filter for efficient uncertainty propagation, Future Generation Computer
Systems, 21, pp. 1047–1055.
Und´en, P., Rontu, L., J¨arvinen, H., Lynch, P., Calvo, J., Cats, G., Cuxart, J., Eerola, K.,
Fortelius, C., Garcia-Moya, J. A., Jones, C., Lenderlink, G., McDonald, A., McGrath,
R., Navascues, B., Nielsen, N., Ødegaard, V., Rodriquez, E., Rummukainen, M., R¯oo¯ m,
R., sattler, K., Sass, B., Savij¨arvi, H., B.W.Schreur, Sigg, R., The, H., and Tijm, A.
(2002) Hirlam-5 scientific documentation, Available online from http://hirlam.org/.
Verboom, G. K., de Ronde, J. G., and van Dijk, R. P. (1992) A fine grid tidal flow and
storm surge model of the north sea, Continental Shelf Research, 12, 2/3, pp. 213–233.
Verlaan, M. (1998) Efficient Kalman Filtering Algorithms for Hydrodynamics models,
Ph.D. thesis, Delft University of Technology, The Netherlands.
Verlaan, M. and Heemink, A. W. (1997) Tidal flow forecasting using reduced rank square
root filters, Stochastic Hydrology and Hydraulics, 11, pp. 349–368.
Verlaan, M., Zijderveld, A., de Vries, H., and Kroos, J. (2005) Operational storm surge
forecasting in the netherlands: developments in the last decade, Phil. Trans. R. Soc. A,
363, pp. 1441–1453.
von Storch, H. and Woth, K. (2008) Storm surges: perspectives and options, Sustain Sci,
3, pp. 33–43.
Whitaker, J. S. and Hamill, T. M. (2002) Ensemble data assimilation without perturbed
observations, Monthly Weather Review, 130, pp. 1913–1924.
Wilks, D. S. (2006) Statistical methods in the atmospheric sciences, Academic Press.
Wong, E. (1971) Stochastic processes in information and dynamical system, Mc.GrawHill Book Company.
Xu, Q., Lu, H., Gao, S., Xue, M., and Tong, M. (2008a) Time-expanded sampling for
ensemble kalman filter: assimilation experiments with simulated radar observations,
Monthly Weather Review, 136, pp. 2651–2667.
Xu, Q. and Wei, L. (2001) Estimation of three-dimensional error covariances. part ii:
analysis of wind innovation vectors, Monthly Weather Review, 129, pp. 2939–2954.
Xu, Q., Wei, L., Lu, H., Qiu, C., and Zhao, Q. (2008b) Time-expanded sampling for
ensemble-based filters: assimilation experiments with a shallow-water equation model,
Journal of Geophysical Research, 113, D02114, pp. 1–12.
Yang, R., Guo, J., and Riishøjgaard, L. P. (2006a) Application of an error statistics estimation method to the psas forecast error covariance model, Advances In Atmospheric
Sciences, 23, 1, pp. 33–44.
BIBLIOGRAPHY
Yang, S.-C., Baker, D., Li, H., Cordes, K., Huff, M., Nagpal, G., Okereke, E., Villafane, J.,
Kalnay, E., and Duane, G. S. (2006b) Data assimilation as synchronization of truth and
model: experiments with the three-variable lorenz system, Journal of the Atmospheric
Sciences, 63, pp. 2340–2354.
Yen, P. H., Jan, C. D., Lee, Y. P., and Lee, H. F. (1996) Application of kalman filter to
short-term tide level prediction, Journal of Waterway, Port, Coastal, and Ocean Engineering, 122, 5, pp. 226–231.
Ziehmann, C. (2000) Comparison of a single-model eps with a multi-model ensemble
consisting of a few operational models, Tellus, 52A, pp. 280–299.
153
154
BIBLIOGRAPHY
Samenvatting
Twee aanpakken voor het verbeteren van de nauwkeurigheid van waterstandsvoorspellingen worden in deze studie onderzocht. De eerste aanpak concentreert op de verbetering van de voorspellingsnauwkeurigheid van de astronomische getij component. Hierbij wordt een methode toegepast ten behoeve van het analyseren en voorspellen van
de periodieke componenten van het residue van de harmonisch analyse. Deze methode
is redelijk nauwkeurig tijdens rustige weersomstandigheden, maar niet tijdens een stormachtige periode. Deze constatering heeft geleid tot een voortzetting van de studie met
een geheel andere aanpak waarbij een nieuw data assimilatie is geimplementeerd in het
operationele twee-dimensionale stormvloedkeringvoorspellingsmodel.
Het operationele stormvloedvoorspelmodel in Nederland maakt gebruik van een steadystate Kalman filter om nauwkeurige beginvoorwaarden voor de voorspelsimulatie van
het model te berekenen. Een belangrijke factor die bepalend is voor het succes van het
Kalman filter, is de specificatie van de modelfout covariantie. In het operationele systeem
wordt de modelfout gemodelleerd op basis van de aannames van isotropie en homogeniteit. In deze studie onderzoeken we het gebruik van het verschil tussen de voorspellingen
van twee vergelijkbare atmosferische modellen om de onbekende fout van het model te
representeren. Tevens wordt een nieuwe methode voor de berekening van een steady-state
Kalman gain, het zogenaamde two-sample Kalman filter, ontwikkeld. Dit is een iteratieve
procedure voor de berekening van de steady-state Kalman gain met behulp van twee realisaties van het proces. Een groot aantal experimenten zijn uitgevoerd om aan te tonen dat
dit algoritme de goede oplossingen produceert.
Het two-sample Kalman filter algoritme is toegepast met behulp van de wind voorspellingen van twee meteorologische centra: het Koninklijk Nederlands Meteorologisch
Instituut (KNMI) en UK Met Office (UKMO). De bias of systematische fout wordt eerst
ge¨elimineerd voordat de wind voorspelling worden gebruikt voor het two-sample Kalman
filter. De ruimtecorrelatie van de systeemfout wordt geschat uit deze twee windvoorspellingen en blijkt, in tegenstelling tot de aanname in het huidige operationele systeem,
sterk anisotroop. Het steady-state Kalman filter dat op basis van deze foutencovariantie
wordt gevonden resulteert in nauwkeuriger voorspellingen. Voor de meetstations langs
de Nederlandse kust wordt de nauwkeurigheid van de voorspelling tot ongeveer 12 uur
vooruit verbeterd.
Ter verdere verbetering van het data assimilatie systeem wordt het two-sample Kalman
filter uitgebreid tot het gebruik van meerdere realisaties. Door meer realisaties te gebruiken kan de berekening van de foutencovariantie worden uitgevoerd door middeling
over een kortere periode. Hierdoor kan de stationariteit over een kortere periode worden
aangenomen en zal naar verwachting de modelfout nog nauwkeuriger worden gerepresenteerd. In deze studie wordt dit algoritme toegepast met behulp van een wind-ensemble van
LAMEPS, het systeem dat bij het Noorse Meteorologische Instituut operationeel wordt
gebruikt. De resultaten zijn hierbij vergelijkbaar met het steady-state Kalman filter ti155
156
Samenvatting
jdens grote positieve stormvloeden. Echter blijkt het steady-state Kalman filter beter
te presteren dan het ensemble systeem bij de voorspelling van negatieve stormvloeden.
Verder onderzoek naar de wind ensemble aanpak is daarom vereist.
Curriculum Vitae
Julius H. Sumihar was born on 28th July 1975 and grew up in Bandung, Indonesia.
He attended the high shool Sekolah Menengah Atas Negeri 3 Bandung in 1990. In 1993,
he started his bachelor study at the Department of Engineering Physics, Institut Teknologi
Bandung, Indonesia. He obtained his engineer degree in 1998. During the next three
years, he explored different possibilities for his career. In this period, he became interested
in the application of mathematics in the field of environmental system modelling and
prediction. To pursue this interest, in 2001 he left his country to study at the MSc program
of Mathematics in Risk and Environmental Modelling, at Technische Universiteit Delft,
The Netherlands. This graduate program was supervised by Prof. Roger Cooke and Prof.
Arnold Heemink. He obtained his MSc degree in 2003, with a thesis about adaptive
harmonic analysis using Kalman filter. In October 2004, he began his PhD study at Delft
Institute of Applied Mathematics, under the supervision of Dr. Martin Verlaan and Prof.
Arnold Heemink. His research was about data assimilation and its application for storm
surge forecasting, whose results are written in this thesis. Since March 2009 he has been
working at Deltares/TNO as a researcher.
157