Two-sample Kalman filter and system error modelling for storm surge forecasting Two-sample Kalman filter and system error modelling for storm surge forecasting Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus Prof. dr. ir. J.T. Fokkema, voorzitter van het College voor Promoties, in het openbaar te verdedigen op maandag 19 oktober 2009 om 12.30 uur door Julius Harry SUMIHAR Wiskundig ingenieur geboren te Bandung, Indonesi¨e. Dit proefschrift is goedgekeurd door de promotor: Prof. dr. A.W. Heemink Copromotor: Dr. M. Verlaan Samenstelling Promotiecommissie: Rector Magnificus, Prof. dr. A.W. Heemink, Dr. M. Verlaan, Prof. dr. M.F.P. Bierkens, Prof. dr. P.J. van Leeuwen, Prof. dr. A. Mynett, Prof. dr. K. Ponnambalam, Dr. H. Madsen, voorzitter Technische Universiteit Delft, promotor Technische Universiteit Delft, copromotor Utrecht University University of Reading, United Kingdom UNESCO-IHE/Technische Universiteit Delft University of Waterloo, Canada DHI Group, Denmark ISBN 978-90-8570-412-6 c 2009, by J.H. Sumihar, Delft Institute of Applied Mathematics, Delft University of Copyright Technology, The Netherlands. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the author. Typesetting system: LATEX 2ε Printed in The Netherlands by: W¨ohrmann Print Service Acknowledgements Many people have contributed to the completion of this thesis, directly or indirectly. Therefore, here I would like to extend my gratitude to those who somehow have made it possible for me to finish this thesis. I’d like to extend a deep gratitude to Martin Verlaan, with whom I colaborated mostly in carrying out this research. His enthusiasm has helped me to endure and to enjoy the research process in the same time. He has guided me patiently throughout these years. I would also like to thank Arnold Heemink for the opportunity he has given me to pursue this PhD study. I thank him for his support. He has the patience to listen and discuss any issues that I raised. It is very encouraging that even during his busy time, he remains accessible. I thank him as well for the careful reading and constructive comments on the manuscript. I would like to acknowledge Bruce Hackett from the Norwegian Meteorological Institute for introducing one of the ideas explored in this research. I thank Øyvind Sætra and Inger-Lisse Frogner for helping me provide information about LAMEPS. I acknowledge also Hans de Vries, Gerrit Burgers, and Niels Zweers from KNMI for their help and discussion about HIRLAM. I have also enjoyed technical support from Kees Lemmens, Eef Hartman, Carl Schneider, and Wim Tiwon. I thank them for their help with the computer systems. I thank Kees Lemmens also for his inisiative of organizing lectures about programming and computational techniques. I would also like to thank Bas van ’t Hof from VORtech for his encouragement and for helping me to solve some technical difficulties with WAQUA. There were a lot of administrative works related to a PhD study. I have received significant help from Evelyn Sharabi and Dorothee Engering in handling many administrative matters. They have always been ready to help and I appreciate them for that. I would also like to thank Theda Olsder, Manon Post, Veronique van der Varst, Rene Tamboer, and Paul Althaus from CICAT for their help. All their helps have contributed to my well being and avoided me to be distracted from doing my research. I would like to thank my former officemates: Loeky Haryanto, Nils van Velzen, Gosia Kaleta and Remus Hanea for the helps, nice talks, and good musics. I thank Nils also for being my main guide in understanding the detail of the code of WAQUA. I attribute mainly to his assistance that I was able to use and modify WAQUA codes for my experiments. I thank him as well for translating the summary and my propositions into Dutch (Later on I learned that in doing the translation, he was helped by his colleges at VORtech: Jeroen and Erwin. So, I thank them as well for their help). I thank all other former and present PhD colleges for being part of the nice atmosphere. I would also like to extend special thanks to Eka Budiarto, Umer Althaf, Nils van Velzen and Ivan Garcia for proofreading parts of this thesis. I thank all participants of the weekly literature meeting for having shared their knowledge and for the meaningful and fun discussions. i ii The community of International Student Chaplaincy has been a big support for me. Here, I’d like to thank Ben Engelbertink, Waltraut Stroh, Avin Kunnekkadan, Ruben Abellon, Henk van der Vaart, and Ton. I extend a big thank to all the ISC choir members, the biggest support group for me since the first month of my stay in Delft. I thank especially Mieke and Reini Knoppers, the main pillars of the choir. I’d also like to thank ISC Rotterdam community, which at a later stage of my stay in The Netherlands, has become an important group for me as well. I thank all other friends of mine, especially Dwi, Xander, Eka, Dela, Iwa, Sinar, I Nengah Suparta, Loeky, and Ferry. I thank Xander also for translating my summary into Dutch. I would also like to thank all my former housemates for their cooperation and the good talks. I thank Mim, Carol Yu, Sarah Los, Alex Peschanskyy and Umer Althaf (with his gesture) for their constructive comments on the earlier design of the cover of this thesis. I would like also to thank all my friends in Indonesia. I am fortunate to have had the opportunity to meet them at least once a year during this period. Certainly, all these people have helped me enjoy my life and in turn, helped maintain my stamina to carry out my research. I thank Sandra Singgih for her courage, love, and for many other good reasons. I would like to thank my family for the constant support they have been giving me. I thank especially my mother, who has taught me the value of work. It is she who has initially opened up the ways and opportunities that I find in life. I extend my deepest gratitute to her for what she has done. Delft, September 2009, Julius H. Sumihar Summary Two directions for improving the accuracy of sea level forecast are investigated in this study. The first direction seeks to improve the forecast accuracy of astronomical tide component. Here, a method is applied to analyze and forecast the remaining periodic components of harmonic analysis residual. This method is found to work reasonably well during calm weather, but poorly during stormy period. This finding has led to continue the study with the second direction, which is about data assimilation implemented into the operational two-dimensional storm surge forecast model. The operational storm surge forecast system in the Netherlands uses a steady-state Kalman filter to provide more accurate initial conditions for forecast runs. An important factor, which determines the success of a Kalman filter, is the specification of system error covariance. In the operational system, the system error covariance is modelled explicitly by assuming isotropy and homogeneity. In this study, we investigate the use of the difference between wind products of two similarly skillful atmospheric models as proxy to the unknown error of the storm surge forecast model. To accommodate this investigation, a new method for computing a steady-state Kalman gain, called the two-sample Kalman filter, is developed in this study. It is an iterative procedure for computing the steadystate Kalman gain of a stochastic process by using two samples of the process. A number of experiments have been performed to demonstrate that this algorithm produces correct solutions and is potentially applicable to different models. The two-sample Kalman filter algorithm is implemented by using the wind products from two meteorological centers: the Royal Dutch Meteorological Institute (KNMI) and UK Met Office (UKMO). Here, the investigation is focused on random component of the system error. Therefore, bias or systematic error is eliminated prior to the implementation of the wind products to the two-sample Kalman filter. The system error spatial correlation estimated from these two wind products is found to be anisotropic, in contrast to the one assumed in the operational system. The steady-state Kalman filter based on this error covariance estimate is found to work well in steering the model closer to the observation data. For the stations along the Dutch coast, the data assimilation is found to improve the forecast accuracy up to about 12 hours. Moreover, it is also demonstrated that this data assimilation system outperforms a steady-state Kalman filter based on isotropy assumption. To further improve the data assimilation system, the two-sample Kalman filter is extended to work with more samples. By using more samples, the computation of the error covariance can be done by averaging over shorter time. This relaxes the stationarity assumption and is expected to simulate better the state-dependence model error. In this study, this algorithm is implemented by using wind ensemble of the LAMEPS, which is operational at the Norwegian Meteorological Institute. This setup is found to perform similarly well as the steady-state Kalman filter during large positive surge. However, the steady-state Kalman filter is found to perform better than the ensemble system in forecastiii iv ing negative surge. The resulting ensemble spread during negative surge is found to be narrower than the standard deviation assumed by the steady-state Kalman filter. A further investigation on the wind ensemble is required. Contents Acknowlegement i Summary iii 1 1.1 1.2 . . . . . . . 1 1 3 4 5 5 7 10 . . . . . 13 13 15 17 18 20 . . . . 21 21 21 24 27 1.3 1.4 1.5 Introduction Why sea level prediction? . . . . . . Operational sea level forecast system 1.2.1 Astronomical tides . . . . . 1.2.2 Storm surge . . . . . . . . . Data assimilation . . . . . . . . . . Research objectives . . . . . . . . . Overview . . . . . . . . . . . . . . 2 2.1 2.2 2.3 2.4 2.5 Kalman filter Kalman filter . . . . . . . . . . . . . . . Steady-state Kalman filter . . . . . . . . . Ensemble Kalman filter . . . . . . . . . . Reduced Rank Square Root Kalman filter Error statistics . . . . . . . . . . . . . . . 3 3.1 3.2 3.3 3.4 The Dutch Continental Shelf Model Model area . . . . . . . . . . . . . . Model equations . . . . . . . . . . . . Numerical approximations . . . . . . Operational setting . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis and prediction of tidal components Kalman filter 4.1 Introduction . . . . . . . . . . . . . . . . . . . 4.2 Approach . . . . . . . . . . . . . . . . . . . . 4.2.1 Narrow-band Processes . . . . . . . . . 4.2.2 Surge model . . . . . . . . . . . . . . 4.2.3 Kalman filter . . . . . . . . . . . . . . 4.3 System characteristics . . . . . . . . . . . . . . 4.3.1 Bode diagram . . . . . . . . . . . . . . 4.3.2 Cross Over . . . . . . . . . . . . . . . 4.4 Results and discussion . . . . . . . . . . . . . 4.5 Conclusion . . . . . . . . . . . . . . . . . . . 5 5.1 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . within observed surge using 29 . . . . . . . . . . . . . . 29 . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . 32 . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . 37 . . . . . . . . . . . . . . 41 . . . . . . . . . . . . . . 49 Two-sample Kalman filter for steady state data assimilation Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 53 CONTENTS II 5.3 5.4 5.5 5.6 6 6.1 6.2 6.3 6.4 6.5 7 7.1 7.2 7.3 7.4 7.5 5.2.1 Statistics from two independent realizations . 5.2.2 Open-loop estimate . . . . . . . . . . . . . . 5.2.3 Closed-loop estimate . . . . . . . . . . . . . 5.2.4 Proposed algorithm . . . . . . . . . . . . . . Experiment using a simple wave model . . . . . . . Experiment using the Dutch Continental Shelf Model Experiment using the three-variable Lorenz model . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 54 55 55 58 60 65 68 Storm surge forecasting using a two-sample Kalman filter 71 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2.2 Two-sample Kalman filter algorithm . . . . . . . . . . . . . . . . 73 6.2.3 System transformation . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Experiment with a simple wave model . . . . . . . . . . . . . . . . . . . 78 Experiments with the Dutch Continental Shelf Model . . . . . . . . . . . 81 6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.4.2 The Dutch Continental Shelf Model . . . . . . . . . . . . . . . . 81 6.4.3 Wind stress error statistics . . . . . . . . . . . . . . . . . . . . . 81 6.4.4 Water level observation data . . . . . . . . . . . . . . . . . . . . 89 6.4.5 Computing Kalman gain using the two-sample Kalman filter . . . 90 6.4.6 Hindcast experiment . . . . . . . . . . . . . . . . . . . . . . . . 92 6.4.7 Forecast experiment . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4.8 Comparison with isotropic assumption based steady-state Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.9 Tuning of observation error statistics . . . . . . . . . . . . . . . . 97 6.4.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Ensemble storm surge data assimilation and forecast Introduction . . . . . . . . . . . . . . . . . . . . . . . . Data assimilation algorithm . . . . . . . . . . . . . . . . 7.2.1 System representation transformation . . . . . . 7.2.2 Analysis algorithm . . . . . . . . . . . . . . . . 7.2.3 Summary . . . . . . . . . . . . . . . . . . . . . Setup of experiment . . . . . . . . . . . . . . . . . . . . 7.3.1 The Model . . . . . . . . . . . . . . . . . . . . 7.3.2 10 m wind speed forecasts ensemble . . . . . . . 7.3.3 Water-level data . . . . . . . . . . . . . . . . . . Wind stress error statistics . . . . . . . . . . . . . . . . Hindcast experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 105 106 107 109 110 111 111 112 114 114 116 CONTENTS 7.6 7.7 7.8 7.5.1 Time series of ensemble spread . . . 7.5.2 Innovations statistics . . . . . . . . . 7.5.3 Rank histograms . . . . . . . . . . . 7.5.4 Water level residual rms . . . . . . . 7.5.5 Discussion . . . . . . . . . . . . . . Forecast experiment . . . . . . . . . . . . . . Comparison with a steady-state Kalman filter Conclusion . . . . . . . . . . . . . . . . . . 8 Conclusion III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 121 122 125 128 129 133 140 143 Bibliography 146 Samenvatting 155 Curriculum Vitae 157 1 Introduction 1.1 Why sea level prediction? The work presented in this dissertation is about sea water level prediction. In particular, the research work is focused on methods to improve the accuracy of water level prediction in the North Sea, especially along the Dutch coast. In this section, we describe why it is important to have an accurate sea level prediction system. Predicting water level is one of the water management activities in the Netherlands. There are different interests involved, which together motivate the effort to predict water level accurately. These interests include safety, ecological, as well as economy. All of these interests should be accommodated for successful water management. The safety issue concerns with the risk of storm surge flooding. For coastal areas, storm surge is the main contributor of flood risks. In fact, the risk deriving from storm surges is a global, albeit regionalized, phenomenon affecting a large percentage of the world population and many centers of commerce and trade (von Storch and Woth, 2008). In the Netherlands, since parts of its area lie below mean sea level, the risk is even greater. In fact, a number of occurrences have taken place in the past, where the storm surge flooded the area near the coast. Throughout history, the populations of the Dutch coastal provinces have been regularly afflicted by devastating storm surges. The last occurrence happened on 1st February 1953. It took victims of 1836 casualties and approximately 100000 people were evacuated. It inundated about 136500 ha of land and killed tens of thousands of livestocks. The damage to buildings, dikes and other infrastructure was enormous (Gerritsen, 2005). Immediately after the devastating storm surge of 1953, a Delta Commission was installed, with a task to develop measures to prevent such a disaster from recurring. The key part of the plans recommended by the commission was the full closure of the Eastern Scheldt with a regular dam to create a closed coastline. In 1975, however, concerns about the economic and ecological issues were raised by different social groups. If the sea is closed with dams, the salt water behind the dams will gradually be changed into sweet water. This in turn will affect the ecosystem in that area. A great variety of life lives in this ecosystem: more than 70 types of fish, 140 types of water plants and algae, 350 types of animals living in the water and between 500 and 1 1.1 Why sea level prediction? 2 Figure 1.1: The 1953 storm surge flooding at Scharendijke, a village in Zeeland B.Th. Boot / Rijkswaterstaat 600 types on the land. The Eastern Scheldt is also an important area for birds, as a place for food, brooding, as well as hibernation. Closing the inlets with dams will eventually ruin this habitat, the impact of which is irreversible. This would also have had severe economic consequences. Fishery has always been the largest source of income for the traditional fishing villages in this area. People have been farming oysters in the Eastern Scheldt since 1870. Although safety remains an important concern, other aspects like fishery and the environment should also be considered (www.deltawerken.com). Through many discussions, the parliament eventually agreed to replace the design of the closed dams with moveable storm surge barriers. The barrier is normally open, so the water body in the two sides of the barrier is still connected to each other. In times when the water level is higher than a safety threshold, the barrier is closed to protect the land from the water. Another moveable storm surge barrier is located in the New Waterway. This channel connects the North Sea with the Rotterdam port. It protects around 1 million people in Rotterdam, Dordrecht and the surrounding towns from storm surges from the North Sea. Rotterdam is the largest port in Europe, with an annual throughput of more than 400 million tones of goods and is still expected to grow (www.portofrotterdam.com). It is a gateway to a European market of more than 500 million consumers. A lot of ships, big and small, come in and out of the port continuously via this channel. Considering the economic value at stake, the barrier should be closed only if it is really necessary. Using a moveable storm surge barrier, both safety, economy and ecological demands are satisfied. Nonetheless, for successful operation of a storm surge barrier, information about the water level some time ahead is required to be able to close the barrier in time. The procedure of closing the barrier at the New Waterway, for example, takes six hours. Introduction Therefore, an accurate water level prediction is required, at least six hours ahead. Besides being required for the closure of the storm surge barrier, water level prediction is also required for raising alarms. To ensure that the people have sufficient time to flee the area should a flood is indicated, alarm is raised at least six hours ahead of when the storm surge is expected to come. This can be done only if an accurate water level prediction system is employed. Accurate water level prediction is also required for port management. At times of negative surges at low tides, it may be dangerous for supertankers and bulk carriers to leave or enter the ports of Rotterdam, IJmuiden and the Westerschelde. The largest ships that can enter the port of Rotterdam are confined to the Euro channel, with a depth of 25.4 m, and to Meuse channel, with a depth of 23.75 m (Schiereck and Eisma, 2002). One of the biggest ships calling at Rotterdam is the Berge Stahl, carrying iron ore and sails between Brazil and Rotterdam. It is a ship with 343 m length, 65 m width, and a draft, or depth in the water, of 23 m. Passage must be timed to coincide with the time window when the water is high to prevent the ship running aground. Figure 1.2: Storm surge barrier at the Eastern Scheldt. Rijkwaterstaat-AGI 1.2 Operational sea level forecast system Two most dominant forces, which determine the sea water level are the gravitational forces of the moon and sun on earth and the wind stresses exerted by the atmospheric flow just above the sea surface. The operational water level forecast system in the Netherlands is based on the decomposition of the water level components according to these forces. Each component is predicted separately and the predicted total water level is defined as the superposition of both components (Heemink, 1986). 3 1.2 Operational sea level forecast system 4 Figure 1.3: Storm surge barrier at the Nieuwe Waterweg. The picture is taken from ”Flood Control,” Microsoft Encarta Online Encyclopedia 2008. 1.2.1 Astronomical tides The water level component, which is due to the gravitational forces of the moon and sun, is called the astronomical component. The gravitational forces of the moon and the sun on the ocean deform the water body and induce variation of water level in the sea. In the same time, this deformation propagates as tidal wave. Since the relative motion of the earth, moon and sun is harmonic in time, the astronomical component is also harmonic. Hence, the astronomical component can be expressed as the sum of several sinusoid functions with different frequencies, each of the sinusoid function is referred to by tidal constituent: ξ(t) = J X Hj cos(ωj t + Gj ) (1.1) j=1 where ξ(t) is the astronomical component water level at time t, while Hj , Gj , and ωj are the amplitude, phase and frequency of tidal constituent j, respectively. The frequency of each tidal constituent corresponds to the characteristic frequencies of relative motion of the moon-earth-sun system. For example, the tidal constituent M2 corresponds to the motion of the moon around the earth and has a frequency of 12.421 h. The frequencies of oceanic tidal constituents can be derived from gravitational potential on earth surface due to the moon and the sun (see, for example, Kantha and Clayson 1999). When the tidal waves propagate into a shallow water area, higher harmonics of tides are formed due to the nonlinearity of the shallow water system. The frequencies of these shallow water tides is a linear combination of those of the fundamental frequencies. For example, MS4 has a frequency that is the sum of M2 and S2 and is generated by the interaction of M2 and S2. Introduction Due to its periodic nature, the analysis of the astronomical component is performed by using harmonic analysis. Here, using a long sequence of observed water level data, usually of one year long, Equation (1.1) is solved using a least square estimator to determine the amplitude and phase of each tidal constituent. On its propagation, the tidal waves are deformed by the structure and local characteristics of the sea, such as sea bed configurations, coastal shapes, etc. As the result, the astronomical tide has a local characteristic. Hence, the number of tidal constituents required to provide accurate predictions is different from one location to another. For the Dutch coast, typically more than one hundred tidal constituents are needed. The prediction of the astronomical component is done by using the same equation, with the estimated amplitude and phase of each tidal constituent. The prediction of astronomical tide at a location can be done fairly accurately over a period of years in advance given that there is no structural change in the vicinity of that location, like for example the building of dams or other constructions. 1.2.2 Storm surge Another factor which affects the water level is the wind stress exerted on the sea surface. When the wind blows in one direction, it will push against the water and cause the water to pile up higher than the normal level of the sea. This pile of water is pushed and propagated in the direction of the wind. The growth of the water level due to the wind stress depends on wind strength and how long the wind blows in one direction. The meteorological component of sea water level is often called a storm surge. The operational storm surge prediction is performed by using a numerical model called the Dutch Continental Shelf Model (DCSM). This model is based on the shallow water equations. It describes how the water level and velocities are related and evolves in time as a response to the wind forcing exerted on the sea surface as well as to the tidal waves coming in from the Atlantic ocean. A more detailed description of this model is given in Chapter 3. Although the DCSM also includes the astronomical component as its input forcing, its prediction is found to be less accurate than the ones obtained using harmonic analysis. Therefore, in the operational system it is used only for predicting the meteorological component. The meteorological component is obtained by subtracting the periodic component from the total predicted water level. The DCSM periodic component is obtained by running the DCSM without wind input (Figure 1.4). 1.3 Data assimilation A mathematical model is developed to predict the actual behavior of the system of interest. However, a model is just an approximate representation of the actual system. Therefore, predictions made by the model will always contain errors. The errors may be due, for 5 1.3 Data assimilation 6 wind DCSM Storm Surge − DCSM (without wind input) Total Sea Level + HARMONIC ANALYSIS Astronomical Tides Figure 1.4: Scheme of the operational sea level forecast system in the Netherlands example, to neglected physical processes in the model equations or to uncertain model parameters. The DCSM consists of a set of partial differential equations, which describes how the state variables evolve in time. Solving the equations requires discretization in time and space. This discretization entails that only processes with scales larger than several grid size can be reproduced reliably. The model also contains model parameters, such as bathymetry and bottom friction coefficients, which are not known exactly and should be determined empirically. Errors in the prediction can also be due to uncertain initial and boundary conditions. A major source of errors in a storm surge model is the uncertain wind input forcing. The wind data that is used as input forcing for a storm surge model is usually the product of a meteorological model, which in itself contains error. The multitude of error sources generate uncertainty in the model prediction. Another source of information about the actual system in question is the information from observations. In the Netherlands, for example, a network of tide gauges are available along the coast and off shore, which provide on-line information about water level regularly every ten minutes. Observation data are also available from satellite altimetry or from measuring ships. The information from observations is usually reasonably accurate. However, observed data usually only gives information at a few locations. Moreover, observation data alone can not provide predictions about the future state of the system. In order to improve the prediction, one may use a data assimilation technique. Data assimilation is the incorporation of observation data into a model to improve the forecast made by the model. The search for data assimilation algorithms was started by the meteorological science community. Nowadays, data assimilation has become one of the essential components of forecast systems in many earth system applications. In operational forecast systems, the state obtained at the end of the assimilation period is taken as the initial state for the ensuing forecast. There are two main classes of data assimilation algorithm: variational and sequential methods. The variational data assimilation aims at globally adjusting a model solution Introduction to all the observations available over the assimilation period (Talagrand, 1997). In this method, the incorporation of observed data is done by first defining a scalar function, which measures the deviation of the model results from the observation. The initial condition estimate is computed by minimizing this so called cost function with respect to the initial condition. For an efficient computation, the minimization of the cost function requires the implementation of the adjoint of the linearized model in question. For some cases, however, it is difficult to build an adjoint model. The difficulty in developing the adjoint model makes the variational method impractical in some cases. In the sequential data assimilation method, observation data is assimilated every time it becomes available. The most popular method in this class is the Kalman filter. In Kalman filter, the model is not only used to propagate the model state, but also to determine the uncertainty of the estimate, expressed in term of the error covariance. The estimate of the uncertain parameters or model states are obtained by a weighted average between observation and model prediction, where the weights are determined based on the covariances of the model prediction and observation errors. The implementation of a Kalman filter does not require an adjoint model. However, its implementation has also difficulties, namely the computational cost of the error covariance propagation and the nonlinearity. A more complete description about Kalman filtering is given in Chapter 2. The data assimilation used in the operational storm surge forecast system in the Netherlands is based on the so called steady state Kalman filter. A more complete description about the steady-state Kalman filter is given in Chapter 2. Here we suffice ourselves to describe that the steady-state Kalman filter is a Kalman filter with constant error covariances. For stable systems with constant error covariances, the weighting matrix, usually called as the Kalman gain, will also be constant. Hence, it is possible to compute the Kalman gain prior to using the Kalman filter in an operational setting. Once the Kalman gain is available, it is simply inserted to the forecast system as shown in Figure 1.5. Experience has shown that assimilating total sea level observation into the DCSM may cause negative effect on the forecast performance. This has to do with the fact that the DCSM can not reconstruct the astronomical components accurately, as mentioned in Section 1.2.2. To avoid assimilating astronomical tides information into the DCSM, the adjusted observation described in Figure 1.6 is used instead of the total sea level observation. Here, the astronomical components as predicted by harmonic analysis are replaced by those obtained using DCSM without wind input. 1.4 Research objectives The general objective of this research is to develop and implement methods for improving sea level forecast. This objective is achieved by following several directions. From the scheme of the operational forecast system in the Netherlands, we can identify various directions, which we can take to improve the sea level forecast accuracy. Two directions are explored in this study. 7 1.4 Research objectives 8 wind H DCSM − adjusted observation + K Figure 1.5: Scheme of the operational data assimilation system, H: observation operator, K: Kalman gain. Total Sea Level Observation HARMONIC ANALYSIS − adjusted observation + DCSM (without wind input) Figure 1.6: Diagram block of adjusted observation The first direction is to improve the accuracy of astronomical tides prediction. The decision to follow this direction is motivated by the fact that periodic components can still be found in the residual of harmonic analysis. The idea is to use the method introduced by Heemink et al. (1991) to predict the remaining harmonic components and use them as correction to the astronomical tides forecasts obtained using harmonic analysis. Another direction is to improve the data assimilation scheme, which is coupled to the DCSM. As mentioned earlier, the operational data assimilation system is based on a Kalman filter. Kalman filter could in theory be used for optimal estimation of a system state. However, in practice it requires the information about the error covariances of the forecast model, which is crucial but difficult to estimate (Dee, 1995). In fact, the issue of error covariance estimation applies to most data assimilation algorithms, which find root Introduction in the statistical estimation theory. A problem of specifying forecast/model error covariance matrix is that the error covariance matrix is usually enormous. We are forced to simplify it just to fit it into the computer. Even if we could fit it into the computer, we do not have sufficient statistical information to determine its elements. The second problem is that we do not know what the errors are since we do not know the truth. We can not produce samples of model or forecast errors since we do not know the true state (Fisher, 2003). There are three possible ways to approach this problem. The first way is explicitly model the structure of the error covariance. In the operational system, for example, the model error variance is assumed to be homogeneous in space. Moreover, the spatial correlation is assumed to be isotropic. This means that the correlation between errors at two locations is dependent only on distance between the two locations and independent from direction. Here, the error covariance is parameterized into a few parameters, such as correlation length and standard deviation (Verlaan, 1998). Parameterizing error covariance into a few calibrating parameters is also proposed by Dee (1995) and used by Mitchell and Houtekamer (2000). The second possible way is to disentangle information about the error statistics from the difference between the model results and observations (Hollingsworth and L¨onnberg, 1986; Xu and Wei, 2001). This method is likely to give the most accurate estimate of the error covariance. However, it requires a lot of observation points within the model area. It gives less accurate information for the area with only few or no observations. The third method is to find a surrogate quantity whose error statistics are expected to be similar to those of the unknown system error statistics (Fisher, 2003). The most popular choice for a surrogate of samples of the error is to use differences between forecasts of different length that verify at the same time, frequently referred to as the NMC method (Parish and Derber, 1992; Yang et al., 2006a). The practical advantage of this approach is that once such forecast samples have been generated, it is straightforward to determine covariances of the error for the entire model domain, in terms of the model variables. Moreover, the main advantage of such method is that in an operational NWP environment, the forecasts required to calculate the statistics are already available in the operational archives. No computationally expensive running of the forecast or analysis system is required (Fisher, 2003). The difficulty with the approach is that the statistics of any chosen surrogate quantity are likely to differ in some aspects from the true statistics of the error. In this research, we develop a method for estimating error covariance using a surrogate quantity. For the storm surge forecast system used in our study, wind fields data as products of two similarly skillful atmospheric models are available. Here, we explore the possibility of representing the model error from the difference of the model results obtained by running the model with these two wind fields. This approach is similar to Alves and Robert (2005) and Leeuwenburgh (2007). However, their works are in the framework of ensemble Kalman filtering. In our research, like in the operational system, the method is developed in the framework of steady-state Kalman filtering. To accommodate this study we first develop a method for computing a steady-state Kalman gain from two 9 1.5 Overview 10 samples of the forecast system. We call it the two-sample Kalman filter. It is a recursive algorithm for computing a steady-state Kalman gain of a stochastic system based on two samples of the system. It is based on the assumption that the error is stationary. The error covariance is computed by averaging over time. It is also known that model error actually vary in time along with the actual system state (Dee, 1995). To improve the data assimilation performance even further, we extend the two-sample Kalman filter algorithm to work with more samples. By using more samples, it is possible to compute the error covariance by averaging over shorter time. In this way we can relax the stationary assumption. This will in principle result in a more accurate estimate. 1.5 Overview The remainder of this book is organized as follows. The next two chapters give a general introduction related to the research, followed by four chapters describing the content of the research works. Finally, the last chapter concludes this book with summary and recommendations. Chapter 2 is devoted to a brief description about Kalman filtering. It presents the basic equations of the Kalman filter algorithm. Some difficulties in practical implementation of a Kalman filter are mentioned and some variants of Kalman filter algorithm to cope with these difficulties, which are found in literature, are presented. Chapter 3 gives the description of the DCSM, the numerical storm surge model used throughout this study. A brief explanation about the model area, system of equations and the numerical method is presented. The forecast residual using harmonic analysis is found to still contain some astronomical components. This has motivated us to improve the forecasts, by extracting the astronomical components from the harmonic analysis residual. We call hereafter the residual of harmonic analysis as observed surge. In Chapter 4, the results of the study about analysis and forecast of astronomical components within observed surge are presented. It describes how each tidal constituent is modelled and forecast. Chapter 5 presents our proposed algorithm, called the two-sample Kalman filter, for computing a steady state Kalman gain. The development of this algorithm was motivated to provide a framework for estimating forecast error covariance by simply using two realizations of a dynamic system. Here, results of twin experiments using different models are presented to demonstrate the validity of the algorithm. In Chapter 6, we implement the two-sample Kalman filter on the DCSM. Two wind fields from two different meteorological centers are used to force the DCSM for generating the two samples required for computing the steady state Kalman gain. It is shown that the algorithm performs well with the DCSM. We have extended the two-sample Kalman filter algorithm for working with more than two samples. Chapter 7 describes both the modified algorithm as well as its implementa- Introduction tion with the DCSM. In this experiment, we use an ensemble of wind field forecasts, to generate the ensemble of DCSM forecasts. Both the analysis and forecast ensembles are evaluated. Chapter 8 concludes the disertation, by summarizing the research works that have been done as well as giving recommendations for future research. 11 12 1.5 Overview 2 Kalman filter Kalman filter is a recursive algorithm for estimating the state of a dynamic system. It combines sequentially in time the state of a dynamic system with observation to produce optimal estimate of the system state. Since its first publication (Kalman, 1960), Kalman filter has received a lot of attention and found many applications. Originated from the field of control theory, it has found many applications in other fields as well, including earth system applications. The purpose of this chapter is to provide the readers with some basic equations of the Kalman filter and its variants pertaining to this study. We have not intended to give a complete description about the Kalman filter theory. The exposition will focus only on certain issues related to Kalman filter algorithm and how certain algorithms found in the literature solve these issues. For a complete description about the Kalman filter theory, the readers are referred to, for example, Jazwinski (1970) and Maybeck (1979). We first give a brief description about the classical Kalman filter and mention some issues related to that. It is followed by a short description about some variants of the classical Kalman filter. Only three variants pertaining to this study are described. Those are the steady-state Kalman filter, the Ensemble Kalman filter (EnKF), and the Reduced Rank Square Root Kalman (RRSQRT) filter. In the last section we give a brief discussion about a practical problem of implementing a Kalman filter, which we deal with in this research. 2.1 Kalman filter The Kalman filter is an algorithm, which solves the general problem of estimating the process state X(tk ) ∈ Rn at time tk that is governed by the linear stochastic difference equation X(tk ) = M(tk )X(tk−1 ) + B(tk )u(tk−1 ) + G(tk )η(tk−1) (2.1) with a measurement Y(tk ) ∈ Rm that is Y(tk ) = H(tk )X(tk ) + ν(tk ) (2.2) where M(tk ) is the model operator, u(tk ) input forcing, B(tk ) input forcing matrix, G(tk ) model error matrix and H(tk ) observation operator. The random variable η(tk ) represents 13 2.1 Kalman filter 14 the noise of the process due to model and/or input forcing error, while ν(tk ) represents observation noise. They are assumed to be Gaussian white noise and independent of each other, with covariance Q(tk ) ∈ Rn×n and R(tk ) ∈ Rm×m , respectively. Due to the linearity of the system, the Gaussianity of the model state is preserved in time. It is known that the statistics of a normally distributed random variable can be described completely by using its mean and variance. Therefore, to obtain a complete statistical description of the system state X(tk ) above, it is sufficient to propagate both the mean and covariance of the system state. This is what is done in the Kalman filter: it propagates in time the mean as well as covariance of the system state. The Kalman filter uses the mean as the estimate of the system state. For normally distributed random variable, the mean value is the optimal estimate that satisfies many different criteria of optimality, such as minimum variance, least squares, and maximum likelihood estimates (Maybeck, 1979). Moreover, by using the mean value as the state estimate, the state covariance is in fact also the covariance of the error committed by that estimate of the state value. When observations are available, it uses the covariance to define how the observation should be used to update the system state estimate. The Kalman filter solves the estimation problem above in two steps: forecast and analysis steps. Therefore, for clarity, we denote the estimates of the system state computed at forecast and analysis steps as xf (tk ) and xa (tk ), respectively. Their corresponding error covariances are denoted by Pf (tk ) and Pa (tk ). In the forecast step, the mean and covariance of the process are propagated forward in time by: xf (tk ) = M(tk−1 )xa (tk−1 ) + B(tk−1 )u(tk−1 ) (2.3) Pf (tk ) = M(tk−1 )Pa (tk−1 )M(tk−1 )⊤ + G(tk−1 )Q(tk−1 )G(tk−1 )⊤ (2.4) while at analysis time, that is when observation, y o , is available, the forecast state is combined linearly with observation to obtain an updated system state estimate: xa (tk ) = xf (tk ) + K(tk )(yo (tk ) − H(tk )xf (tk )) a f P (tk ) = (I − K(tk )H(tk ))P (tk ) (2.5) (2.6) where K(tk ) is called the Kalman gain, given by: K(tk ) = Pf (tk )H(tk )⊤ [H(tk )Pf (tk )H(tk )⊤ + R(tk )]−1 (2.7) The algorithm is run sequentially in time and started from a specified initial condition: xa (t0 ) = xa0 a P (t0 ) = Pa0 (2.8) (2.9) In short, to implement a Kalman filter, we need to supply the information about xa (to ) and P(to ) as initial guess and its covariance, as well as the covariance of model error Q(tk ) and measurement error R(tk ) at each time tk as input parameters. In principle, the optimal estimator for linear systems with Gaussian error process is the Kalman filter. However, there are some difficulties in its implementation, especially Kalman filter 15 for earth systems applications. One difficulty, which hampers the implementation of the classical Kalman filter for large scale models is the computational cost due to the propagation of the error covariance. For a system with dimension n, the covariance matrix will have a dimension of n × n. In earth system applications, the system state is usually of large dimension. For the storm surge forecast system that we are working with here, for example, the dimension of the system state is about 60000. Working directly with matrices having dimension of squared of this size will take too much computational time. Another difficulty is related to the linear assumption. Most of real life systems are nonlinear. Many approximations to the Kalman filter have been proposed to solve these problems. In the remaining of this chapter, we give a brief explanation of three algorithms, which are pertaining to our study. 2.2 Steady-state Kalman filter It should be noted here that the Kalman gain K(tk ) only depends on the forecast error covariance Pf (tk ) and measurement error covariance R(tk ), and it is independent from the realizations of measurement itself. Therefore, it is possible to prepare the Kalman gain before using the Kalman filter for data assimilation. Moreover, if the model is timeinvariant and stable, the Kalman gain K(tk ) will converge to a limiting value K. When this steady-state Kalman gain is used for all measurement times the estimate converges to the optimal estimate for large times tk . This fact can be exploited to reduce the computational cost of running the Kalman filter. A time-invariant model is a special system described by Equation 2.1-2.2, where the system matrices M, B, G, H, Q and R are constant. A model is exponentially stable if the modulus of all eigenvalues of M is smaller than one. The stability of a linear and time-invariant system can also be evaluated by using the concept of controllability and observability. A system is controllable if all components or linear combinations of these components are affected by the system noise. The result is that the forecast error covariance matrix will be positive definite. Controllability can be verified by computing the rank of controllability matrix C: C = [G MG M2 G ... Mn−1 G]. (2.10) The system is controllable if the rank of C is equal to n, where n is the dimension of the state vector. The concept of observability is related to the measurement information that is available. A system is observable if all variation of the system state can be observed in the measurements. The result of observability is that the covariance matrix of the analysis error is bounded from above. Observability can be checked by computing the rank of the observability matrix O: O = [H⊤ MH⊤ M2 H⊤ ... Mn−1 H⊤ ]. (2.11) 2.2 Steady-state Kalman filter 16 The system is observable if the rank of O is equal to n. If a system is observable and controllable, then it is also exponentially stable. For time invariant systems, controllability and observability also imply that the forecast error statistic will converge to a steady-state and the Kalman filter for large time tk becomes time invariant as well. For linear, time-invariant and stable system, the steady-state Kalman gain can be computed by running only the covariance propagation and using the forecast error covariance to compute the Kalman gain, i.e. Equation 2.4, 2.6 and 2.7. Once the solution of the equations has reached a constant value, the Kalman gain can be used for data assimilation. This computation is performed off-line and only once. Hence, the computational problem of the error covariance propagation is eliminated. Once the Kalman gain is available, only the equations for the state forecast and analysis have to be solved on-line: xf (tk ) = Mxa (tk−1 ) + Bu(tk−1 ) a f o f x (tk ) = x (tk ) + K(y (tk ) − Hx (tk )) (2.12) (2.13) The steady-state Kalman filter is computationally efficient. It only adds a small fraction to the running time of the model itself. Moreover, the implementation of the analysis equation only requires a few additional codes to the model program. For many environmental systems, it is computationally expensive to directly solve Equation 2.4, 2.6 and 2.7 due to the large dimension of the involved matrices. To efficiently compute the steady-state Kalman gain, different methods can be used. Heemink (1986), for example, used a discrete form of the Chandrasekhar-type algorithm to compute a steady-state Kalman filter for a storm surge forecast system. This algorithm was first proposed by Morf, Sidhu and Kailath (1974) and is based on the recursion of the incremental covariance ∆P(tk ) = Pa (tk ) − Pa (tk−1 ). The advantage of working with incremental covariance is that the rank of this matrix is equal to p, where p is the dimension of the system noise process and usually much smaller than the dimension of the system state n, i.e. p << n. This reduces the computational time significantly. Although in reality the error statistics may vary in time with the actual system state (Dee, 1995), the steady-state Kalman filter algorithm has been proven efficient in many applications (e.g. Heemink, 1990; Canizares et al., 2001; Oke et al., 2002; Verlaan et al., 2005; Sørensen et al., 2006; Serafy and Mynett, 2008). The limitation of this approach is that it requires a fixed observing network; that is, the observation matrix H is constant. Moreover, this algorithm does not solve the nonlinearity issue. The approach usually taken for dealing with the nonlinearity issue is to propagate the error covariance matrix by using a linearized model and to compute the steady-state Kalman gain with this simpler model. Sometimes, this approach is combined with using a coarser grid to solve the computational cost problem in propagating the error covariance (Heemink, 1990). Kalman filter 17 2.3 Ensemble Kalman filter A popular method to solve the nonlinearity and computational cost issues in implementing the Kalman filter is the ensemble Kalman filter (EnKF), first introduced by Evensen (1994). It is based on Monte Carlo method where the uncertainty of the model state is represented by using ensemble of the model state realizations. Here, the propagation of the error covariance is replaced by the propagation of each member of the ensemble. The number of the ensemble members q is chosen to be much smaller than the number of columns of the error covariance matrix, q << n. Therefore, it reduces the computational cost considerably. Moreover, it is relatively simple to implement. Due to these nice properties, it has been a subject of many researches and applications. A complete description about the derivation of EnKF can be found in Evensen (2003). In this section, we are going to give only the basic descriptions about the algorithm. The equations for the EnKF are as follows. Suppose we have an N-member ensemble of the model state, ξ1 , ξ2, ..., ξN . At forecast step, each ensemble member is propagated in time and forecast perturbation matrix Ef is formed: ξif (tk ) = M(tk−1 )ξia (tk−1 ) + B(tk−1 )u(tk−1 ) + G(tk−1 )ηi (tk−1 ), i = 1, ..., N (2.14) N 1 X f ξ (tk ) (2.15) xf (tk ) = N i=1 i Ef (tk ) = √ 1 f [ξ1f (tk ) − xf (tk ), ..., ξN (tk ) − xf (tk )] (2.16) N −1 At analysis step, each ensemble member is updated using perturbed observation and analysis perturbation matrix is defined ξia (tk ) = ξif (tk ) + K(tk )(yo (tk ) − H(tk )ξif (tk ) + νi (tk )) 1 a Ea (tk ) = √ [ξ1a (tk ) − xa (tk ), ..., ξN (tk ) − xa (tk )], N −1 (2.17) (2.18) where the Kalman gain is computed by K(tk ) = Ef (tk )Ef (tk )⊤ H(tk )⊤ [H(tk )Ef (tk )Ef (tk )⊤ H(tk )⊤ + R(tk )]−1 (2.19) As can be seen from the equations above, instead of working directly with error covariance matrices Pf and Pa , the EnKF works with the perturbation matrix Ef and Ea , which are of lower rank. Therefore, the computational expense is reduced. When needed, the covariance matrices can be computed from Pf = Ef (Ef )⊤ and Pa = Ea (Ea )⊤ . Note that at each analysis step, a random vector drawn from observation error distribution is used to perturb the observation, as in Equation (2.17). This is necessary in order to prevent systematic underestimation of the analysis error covariance (Evensen, 2003). While it helps to produce correct analysis error covariance, the addition of perturbed observations adds an additional source of sampling error related to the estimation of the observation error covariances. Some modifications of this approach have been proposed to avoid having to perturb the observations randomly (Whitaker and Hamill, 2002; 18 2.4 Reduced Rank Square Root Kalman filter Tippett et al., 2003; Evensen, 2003; Sakov and Oke, 2008). Reformulating the analysis deterministically reduces the sampling error in the analysis step. Because the error covariance matrix is represented by the multiplication of its square root matrix, it will always be positive definite. The algorithm can also be implemented for nonlinear models without any conceptual difficulty. Here, like for linear models, the propagation of error covariance for nonlinear models is simply done by propagating each ensemble member by the nonlinear model. A limitation of the EnKF is that the standard deviation of the errors in the state estimate is of a statistical nature and converge slowly with the ensemble size. In fact, due to the limitation of computational resources, application of EnKF for large dimensional systems requires one to use a small ensemble. This in turn will cause a problem of spurious correlation, due to sampling error. A popular ad hoc technique to solve this problem is the covariance localization method (Houtekamer and Mitchell, 2001). This method involves the Schur product of the empirical covariances with an analytical covariance function (Gaspari and Cohn, 1999), which sets the spurious correlation at distant locations from a reference point to zero. The covariance localization, however, has certain disadvantages. It induces, for example, the issue of imbalance (e.g. Mitchell et al., 2002), which may be important for certain applications such as for large-scale atmospheric systems. It also adds an additional parameter, which requires tuning. Another problem raised due to the small ensemble size is that the error covariance tends to be underestimated. A popular method for solving this problem is to use the covariance inflation algorithm (Anderson and Anderson, 1999). Here, the underestimated prior ensemble state covariance is increased by linearly inflating each ensemble member with respect to the ensemble mean. 2.4 Reduced Rank Square Root Kalman filter Another variant of the Kalman filter algorithm, which solves the problem of computational cost, is the Reduced Rank Square Root (RRSQRT) Kalman filter (Verlaan and Heemink, 1997). It has been applied for data assimilation research in estuarine system (Bertino et al., 2002), coastal modelling (Madsen and Canizares, 1999) and atmospheric chemistry models (Segers et al., 2000). The RRSQRT method can be considered as an ensemble-based filter as well. The difference is that in the RRSQRT algorithm the ensemble members are not selected randomly, but in the direction of the most important eigen vectors. Here, a square root decomposition of the error covariance is used to ensure that the error covariance remains positive at all times. Instead of working with the full covariance matrix, the RRSQRT propagates the square root of the covariance with lower rank. The forecast step of the Kalman filter 19 RRSQRT can be written as xf (tk ) = M(tk−1 )xa (tk−1 ) + B(tk−1 )u(tk−1 ) Lfc (tk ) a = [M(tk−1 )L (tk−1 ), G(tk−1 )Q(tk−1 ) 1/2 (2.20) ] (2.21) Lf (tk ) = Πf (tk )Lfc (tk ) (2.22) while the analysis step as K(tk ) = Lf (tk )Lf (tk )⊤ H(tk )⊤ [H(tk )Lf (tk )Lf (tk )⊤ H(tk )⊤ + R(tk )]−1 a f o f x (tk ) = x (tk ) + K(tk )(y (tk ) − H(tk )x (tk )) Lac (tk ) = [(I − K(tk )H(tk ))Lf (tk ), K(tk )R(tk )1/2 ] a L (tk ) = Πa (tk )Lac (tk ) (2.23) (2.24) (2.25) (2.26) where Lf (tk ) and La (tk ) are the n × q estimate square root of the error covariance Pf (tk ) and Pa (tk ), respectively. The notation [ , ] denotes that the matrix is built from two block matrices on the left- and right-hand side of the coma sign. Note that the propagation M(tk−1 )La (tk−1 ) is much faster than that of the original Kalman filter, because in RRSQRT the matrix Lf (tk ) contains only q columns, while q << n. The addition of the square-root of model error covariance in Equation (2.21), however, results in a matrix with a larger dimension. Hence, a reduction step must follows to reduce the number of columns back to q. The symbol Π denotes the reduction operator. The basic idea of this approximation is to use only the first q leading eigenvalues and eigenvectors of the approximate error covariance matrix Pfc (tk ) = Lfc (tk )Lfc (tk )⊤ . The covariance matrix Pfc (tk ) itself is actually never computed. The computations are based on the eigen decomposition of the smaller dimension matrix Lfc (tk )⊤ Lfc (tk ), hence it is computationally cheaper than the full Kalman filter. Similar procedure is also performed for the analysis error covariance. Besides being computationally cheaper than the original full Kalman filter, the RRSQRT can be extended to work with nonlinear model. The changes needed are conceptually not very difficult. For nonlinear models, the time propagation of the estimate is performed using the non-linear model, M, while the time propagation of the error covariance estimate is done by the tangent linear model, as follows xf (tk ) = M[xa (tk−1 ), u(tk−1 )] 1 lif = {M[xa (tk−1 ) + ǫlia (tk−1 ), u(tk−1 )] − M[xa (tk−1 ), u(tk−1 )]} ǫ (2.27) (2.28) where lif is ith column (mode) of the matrix Lf The weakest point of the RRSQRT Kalman filter is the reduction step [Equation (2.22, 2.22)]. This step can be computationally expensive and it is also not invariant for a scaling of the state variables. Some modifications have been proposed to cope with these problems, for example, by Treebushny and Madsen (2005) and Segers et al. (2000). 2.5 Error statistics 20 2.5 Error statistics In the previous sections, we have described three approximate algorithms, which solve the computational and nonlinearity issues of Kalman filter implementation. Another important issue related to the implementation of Kalman filter is the specification of error statistics. Although a Kalman filter computes a forecast error covariance internally, this is a valid depiction of the true errors committed by the filter only to the extent that the filter’s own system model adequately portrays true system behavior (Maybeck, 1979). In other words, the Kalman filter will give an optimal estimate only if the specified error statistics exactly represent the true measurement and model error statistics. In practice, however, it is impossible to obtain the true model error statistics, since we do not know the true state. Note that the issue of specifying error covariance applies also to most other data assimilation methods. To solve this problem one usually resorts to explicitly model the error covariance and represent it as a simple function of a few calibrating parameters. As mentioned earlier, another way to approach this issue is to find a surrogate quantity whose statistics are expected to be similar to the unknown true statistics. In this research project, we proposed a method, which is based on such surrogate quantity to represent model error statistics. This method is based on computing error statistics using two system realizations. The two system realizations are generated, for example, by using two different but similarly skillful models. Here, the difference between the two models is assumed to have the statistics, which are similar to those of the unknown true error. The algorithm is developed in the framework of steady-state Kalman filtering. We first develop an algorithm to accommodate this study and describe it in Chapter 5. In Chapter 6 we implement this algorithm for the operational storm surge forecast system in the Netherlands. To gain more improvement, we have extended the algorithm to work with more realizations, resulting in an ensemble-based Kalman filter algorithm. This extended algorithm is presented in Chapter 7. 3 The Dutch Continental Shelf Model To provide the readers with information about the numerical model used in this study, this chapter presents a description about the Dutch Continental Shelf Model (DCSM). This model has been used operationally in the Netherlands since the mid-1980s for water level and storm surge forecasting. Some developments have been taking place on the model since then and new developments are still on going (Verlaan et al., 2005). The DCSM is based on the WAQUA simulation software package developed by the Department of Public Works (Rijkwaterstaat). The development of the exposition below is mainly derived from Gerritsen et al. (1995), Verlaan et al. (2005) and the technical documentation of WAQUA. 3.1 Model area The decision about the area to be covered by the DCSM was taken in the beginning of the 1980’s. The area was chosen to model explicitly the nonlinearities of the surge-tide interaction and to give a better response to approaching oceanic depression systems by including their effect in the boundary condition. These requirements imply a model extent to beyond the 200 m depth contour and therefore the model area includes the British isles. This results in one open-sea model boundary along deep water (Figure 3.1). 3.2 Model equations The basic equations of the DCSM are the 2D shallow water equations, which describe the large-scale motion of water level and depth-integrated horizontal flow. They are well suited for modelling tidal and meteorologically induced water levels and flows. The equations are derived from the principles of mass and momentum conservations, which in a 21 3.2 Model equations 22 62 200 0 0 02 20 20 200 0 0 50 20 60 50 50 50 50 50 56 50 50 50 50 50 50 54 50 latitude (o N) 50 50 25000 200 50 200 58 50 50 200 52 50 50 50 0 20 50 −10 −5 0 longitude (o W/E) 5 10 Figure 3.1: DCSM area with the 50 m and 200 m depth contour lines. cartesian coordinate system read: √ ∂u ∂u ∂ξ gu u2 + v 2 ∂2u ∂u +u +v +g − fv + − ν( ∂t ∂x ∂y ∂x C 2H ∂x2 √ ∂v ∂v ∂v ∂ξ gv u2 + v 2 ∂2v +u +v +g + fu + − ν( ∂t ∂x ∂y ∂y C 2H ∂x2 ∂ξ ∂Hu ∂Hv + + = 0(3.1) ∂t ∂x ∂y ∂2u 1 τx 1 ∂pa + 2) = − (3.2) ∂y ρw H ρw ∂x ∂2v 1 τy 1 ∂pa + 2) = − (3.3) ∂y ρw H ρw ∂y where: x, y u, v ξ D H g = cartesian coordinates in horizontal plane = depth-averaged currenct in x and y direction respectively = water level above a reference plane = water depth below a reference plane = total water depth (H = D + ξ) = gravity acceleration The Dutch Continental Shelf Model f C τx , τy ρw pa ν 23 = coefficient for the Corriolis force = Chezy coefficient = wind stress components in x and y direction respectively = density of sea water = air pressure at the surface = viscosity coefficient Further, it is noted that the North West European continental shelf is too small and too shallow to generate tides internally. It features a tide that is driven by and cooscillates with that in the Atlantic Ocean. Therefore, the 2D shallow water equations do not include a tide generating volume force. The effect of tidal forcing is included by specifying tidal signal along the open boundaries. The Chezy coefficient for the bottom friction is largely empirical parameter and assumed to be a function of depth: D ≤ 40m 65, C= 65 + (D − 40), 40m < D ≤ 65m 90, D > 65m (3.4) On the other hand, the shear stress at the surface is dependent on the wind velocity. Without wind forcing, the shear stress at the free surface should be zero. With wind, the wind stress is computed using the wind velocity at 10 m above the sea surface by q 2 τx = Cd ρa u10 u210 + v10 (3.5) q 2 τy = Cd ρa v10 u210 + v10 (3.6) where ρa is the air density, Cd the air-sea drag coefficient, while u10 and v10 are the 10 m wind velocity components in x and y directions, respectively. To make the mathematical problem well-posed, the equations are supplemented by boundary conditions. At closed boundaries, such as coastlines and dams, the boundary condition is specified as v⊥ = 0 (3.7) which means that no inflow or outflow can occur through these boundaries. At open sea boundaries, two boundary conditions are specified. The first is by specifying the inflow as vk = 0 (3.8) while the second is the tidal water level, which in its simplest form is given by ξb (t) = J X j=1 where: Aj cos(ωj t + Gj ) (3.9) 3.3 Numerical approximations 24 ξb(t) = water level elevation at time t Aj = amplitude of tidal constituent j ωj = angular velocity of harmonic constituent j Gj = phase of harmonic constituent j Ten tidal constituents are used; those are M2, S2, N2, K2, O1, K1, Q1, P1, ν2, and L2. The tidal constants (Hj , Gj ) were estimated by using results from encompassing models, matched with nearby coastal and pelagic tidal data. For simulations that include meteorological effects, the effects of atmospheric pressure are parameterized by a correction term at the boundaries. This correction is a function of the deviation from the average pressure Pavg : 1 ∆ξ = − (P − Pavg ) (3.10) ρg with Pavg = 1012 hPa. This term is responsible in initializing the model with approaching oceanic depression systems. 3.3 Numerical approximations The above system of equations has to be solved numerically. The numerical methods were chosen to satisfy the following demands: unconditionally stable, at least second order consistency, suitable for both time-dependent and steady state problems, and computationally efficient (Stelling, 1984). The grid sizes were selected to ensure maximum resolution allowed by the available o 1 o . Covering the area computing power. This resulted in a spherical grid size of 81 × 12 o o of the North-East European continental shelf from 12 W to 13 E and 48o N to 62.33oN, the discretization results in 173 × 201 computational grid cells. The standard staggered C-grid is used for spatial discretization. The time step used for the integration is 10 minutes. To ensure numerical stability, an implicit method is needed for temporal integration. However, an implicit method requires a lot of computational cost. Here, the alternating direction implicit (ADI) method, introduced by Leendertse (1967) and later extended by Stelling (1984), is used. The advantage of the ADI-method is that it is computationally cheap, which was required at the time this simulation package was developed. The ADI-method splits one time step into two stages. Each stage consists of half a time step. In both stages, the terms of the model equations are solved in a consistent way. In the first half time step, the x-derivatives are solved implicitly and in the second half time step the y-derivatives are solved implicitly. The v-momentum equation is solved first explicitly in the first half time step, so that the new v velocity components are available for the cross terms in the u-momentum equation. The u-momentum equation is then coupled with the mass conservation equation and solved implicitly. Similar way is performed on the second half time step, but first for the umomentum explicitly, followed by v-momentum and conservation of mass implicitly. For a complete time step, each separate term of the equations is still a second-order consistent The Dutch Continental Shelf Model 25 approximation to the differential equations. ∆x k vm,n ukm,n+1 k ξm+1,n+1 Dm,n k vm+1,n ukm,n k ξm+1,n ∆y k ξm,n n Dm,n−1 k vm+1,n−1 m Figure 3.2: The computational grid. As an illustration, we give below the finite difference equations to approximate the homogeneous part of the system of equations described in the previous section, excluding the horizontal viscos terms. Here, we denote the index of a grid cell in the (x,y) coordinate by (m, n) and the time index by k, as illustrated by Figure (3.2). step 1: from time k to k + 1 2 1 solve for v k+ 2 explicitly k+ 1 k+ 1 k+ 1 k k k 2 2 − vm,n−1 vm,n2 − vm,n vm,n+1 ξm,n+1 − ξm,n k+ 1 k + ukm,n S+x [vm,n2 ] + vm,n +g + (3.11) ∆t/2 2∆y ∆x q k+ 1 k )2 gvm,n2 (ukm,n )2 + (vm,n k =0 f um,n + (H v )km,n C 2 3.3 Numerical approximations 26 1 1 solve for uk+ 2 and ξ k+ 2 implicitly k+ 1 k+ 1 k+ 1 2 ξm+1,n um,n2 − ukm,n uk − ukm−1,n − ξm,n2 k+ 21 k k+1 m+1,n + um,n + vm,n Soy [um,n ] + g − (3.12) ∆t/2 2∆x ∆y q k+ 1 k+1 1 (ukm,n )2 + (vm,n2 )2 gu k+ 2 m,n =0 f vm,n + (H u )km,n C 2 k+ 1 k+ 1 k+ 1 k 2 ((H u u)m,n2 − (H u u)m−1,n ξm,n2 − ξm,n ) ((H v v)km,n − (H v v)km,n−1 ) + + = 0 (3.13) ∆t/2 ∆x ∆y step 2: from time k + 21 to k + 1 solve for uk+1 explicitly k+ 1 k+ 1 k+ 1 k+1 k+1 2 2 − ξm,n2 uk+1 ξm+1,n k+ 1 um+1,n − um−1,n k+ 1 m,n − um,n + um,n2 + vm,n2 S+y [uk+1 − (3.14) ] + g m,n ∆t/2 2∆x ∆x q k+ 1 k+ 1 k+1 gum,n (um,n2 )2 + (vm,n2 )2 k+ 21 =0 f vm,n + k+ 1 (H u )m,n2 C 2 solve for v k+1 and ξ k+1 implicitly k+ 1 k+ 1 k+ 1 k+1 k+1 k+1 2 vm,n − vm,n2 ξm,n+1 − ξm,n v 2 − vm,n−1 k+ 12 k+1 k+1 m,n+1 + um,n Sox [vm,n ] + vm,n +g + (3.15) ∆t/2 2∆y ∆x q 1 k+1 k+1 )2 + (v k+ 2 )2 gv (u m,n m,n m,n =0 f uk+1 m,n + k+ 1 (H v )m,n2 C 2 k+ 1 k+ 1 k+ 1 k+1 v k+1 2 ξm,n − ξm,n2 ((H u u)m,n2 − (H u u)m−1,n ) ((H v v)k+1 m,n − (H v)m,n−1 ) + + = 0 (3.16) ∆t/2 ∆x ∆y where S+x [ ] and Sox [ ] are numerical operators for the cross advection terms, defined by 3um,n −4um,n−1 +um,n−2 , vm,n ≥ 0 2∆x (3.17) S+x [um,n ] = −3um,n +4um,n+1 −um,n+2 , vm,n < 0 2∆x um,n+2 + 4um,n+1 − 4um,n−1 − um,n−2 Sox [um,n ] = (3.18) 12∆x and S+y [ ] and Soy [ ] are defined similarly. For the DCSM, the system of equations above is solved in a curvilinear coordinate system. The conversion of the coordinate system introduces additional terms in the equations. For the details of the numerical treatment, the readers are referred to Stelling (1984). A state-space representation of the discretized system of equations above is obtained by defining an n-vector x, which contains the water level and velocities at all grid points in the DCSM area. Since the model is nonlinear, we can write the dynamics of the DCSM as x(tk ) = M[x(tk−1 )] (3.19) where M consists of the coefficients representing the ADI scheme and also the input forcings. The Dutch Continental Shelf Model 3.4 Operational setting The DCSM has been implemented, for example, for performing sea level rise scenario studies, forecasting storm surge and generating boundary conditions for local coastal models (Gerritsen et al., 1995 and the literatures therein). The DCSM has been implemented at the Royal Netherlands Meteorological Institute (KNMI) for daily operational forecasting, where it is part of the automatic production line (APL). KNMI’s APL has been developed to produce regular numerical forecasts with a minimum of human intervention. The heart of the APL is KNMI’s limited area atmospheric model, HIRLAM, which currently provides an analysis of the state of the atmosphere and a forecast for up to 48 h ahead, four time per day. The time interval is 3 h, with a resolution of approximately 22 km. Therefore, the wind products of HIRLAM have to be interpolated before they can be used as input forcing the DCSM. In time the data is interpolated linearly. Conversion from the grid of the atmospheric model to the DCSM grid proceeds by a bilinear interpolation in space. However, since the wind over land is usually considerably lower than over the sea, near the coasts the interpolation has been modified to use only sea points of the atmospheric model for sea points in the DCSM. In the operational system, the DCSM is coupled with a data assimilation based on a steady-state Kalman filter to improve the forecast accuracy (Heemink and Kloosterhuis, 1990). This filter assimilates the selected water level observations from the tide gauges along the British and Dutch coasts. This Kalman filter can correct the forecasts up to 12 h ahead if the forecasts deviate from the available water level observations. The Kalman filter is intended to reduce the forecast uncertainty due to errors in the wind input forcing. To do this, information about the error covariance of the wind input forcing is needed as input to the Kalman filter. Here, the wind error variance is explicitly modelled by assuming spatially homogeneity and isotropy. In reality, these assumptions are likely to be violated. Spatial correlation, for example, is likely to be different for different direction, hence anisotropic. Moreover, the wind error variance is likely to vary in space. It is known that a Kalman filter will give an optimal state estimate to the extent of the accuracy of the specified statistical description of the model error. A better description of the model error statistics will lead to a better filter performance. In this study, we propose and investigate a method, which avoids explicit modelling of the error statistics, in the framework of steady-state Kalman filtering. The method is an iterative algorithm for computing the steady-state Kalman gain of a stochastic system, based on two samples of the system. Assuming the error process is stationary, the error covariance is computed by averaging over time. It is also known that model error actually evolves in time together with the actual system state (Dee, 1995). Therefore, to improve the filter performance even further, we have also extended this algorithm to work with more samples. By working with more samples, it is possible to compute the error covariance by averaging over a shorter time. This will relax the assumption that model error is stationary and it will describe better the evolution of error statistics in time. In turn, this should give more accurate analyses. 27 28 3.4 Operational setting As described earlier, in the operational system, the DCSM is used to forecast the meteorological component of the total sea level. On the other hand, the astronomical component is analyzed and predicted using harmonic analysis, because harmonic analysis is more accurate than the DCSM in predicting the astronomical tide. However, it is found that the residual of harmonic analysis always contains some harmonic components. This fact offers a possibility for improving astronomical tides prediction. The first direction taken in this research for improving the sea level forecast is to apply a method to analyze and predict the remaining harmonic components in the harmonic analysis residual. These predicted harmonic components are used as correction to the harmonic analysis forecasts. This investigation is described in the next chapter. 4 Analysis and prediction of tidal components within observed surge using Kalman filter∗ 4.1 Introduction The operational water level prediction in the Netherlands is based on water level decomposition into astronomical and meteorological components. The astronomical components are analyzed and predicted using harmonic analysis. However, the harmonic analysis can not predict completely the astronomical components from the total water level. This is indicated by the fact that the frequency spectrum of the observed surge, defined as the difference between total water level and astronomical components predicted by using harmonic analysis, contains a number of spikes centered at astronomical constituent frequencies [see, for example, Figure (4.1)]. This serves as a motivation to seek for a way of improving tidal prediction by exploiting the information carried by the spikes. The objective of the study presented in this chapter is to improve astronomical tides prediction by extracting the remaining astronomical tidal components within observed surge. In doing so, a Kalman filter is used to estimate the time-varying tidal components. The approach taken here is based on the work of Heemink et al. (1991). Kalman filters have been successfully used in numerous applications. Its application for the tidal analyses can be found for example in the work of de Meyer (1984), where Kalman filter is used to investigate possible non-stationarity of a multi input-single output model to study tidal characteristics. Yen et al. (1996) used the Kalman filter to make short-term tidal prediction in the case of only short period of observation is available. In our study, instead of working with full tidal signal, we analyze and forecast the tidal components within observed surge. The chapter is organized as follows. Section 4.2 desribes the basic idea used in our study to predict the periodic components of observed surge. It shows how each spike ∗ Paper in revised version appears in the proceeding of HydroPredict 2008 29 4.2 Approach 30 0.08 0.07 0.06 Amplitude [m] 0.05 0.04 0.03 0.02 0.01 0 0 20 40 60 80 100 120 Frequency [deg/hour] 140 160 180 200 Figure 4.1: Spectral density of observed surge measured at Vlissingen, 00:00 1 January 1999 - 31 December 1999 23:00 or narrow-band process can be considered as a periodic function with slowly varying harmonic parameters. It describes also how we obtain a state space representation of observed surge and its the associated Kalman filter set of equations. We proceed with a discussion about the characteristics of overall estimation system in Section 4.3. In Section 4.4 the results using observed surge is presented and discussed, while in Section 4.5 some conclusions of the study are drawn. 4.2 Approach In this section, we describe the approach used in this study. We first describe how a narrow-band process is modelled. It is followed by a description about the state space representation of observed surge. In the end, a Kalman filter formulation for estimating the time-varying harmonic parameters is given. 4.2.1 Narrow-band Processes To provide the readers with a more complete information, in this section we present the derivation of how a narrow-band process can be considered as a harmonic oscillation with slowly varying amplitude, as can be found in Heemink et al. (1991). Consider a weakly stationary stochastic process γ(t) described by Wong (1971): Z ∞ γ(t) = eiωt dZ(ω) (4.1) −∞ where Z(ω) is a complex valued stochastic process with zero mean and with independet increments such that: (4.2) E{dZ(ω)dZ(ω)} = S(ω)dω Analysis and prediction of tidal components within observed surge using Kalman filter 31 with S(ω) as the two-sided energy spectral density function of the process γ(t). Since γ(t) is a real process, then its Fourier transforms have the following porperties (Maybeck 1979): 1. dZ(ω) is complex in general, with dZ(−ω) = dZ(ω), where dZ(ω) denotes complex conjugate. 2. The real part of dZ(.) is an even function of ω and the imaginary part is odd in ω Using these properties we can rewrite Equation (4.1) as: Z ∞ Z ∞ γ(t) = cos(ωt)dV (ω) + sin(ωt)dW (ω) −∞ (4.3) −∞ and dW (ω) = i dZ(ω)−dZ(−ω) are mutually independent where dV (ω) = dZ(ω)+dZ(−ω) 2 2 processes, both real and with independent increments such that: E[dV (ω)dV (ω)] = E[dW (ω)dW (ω)] = S(ω)dω. (4.4) If γ(t) is a narrow-band process, the spectral density function has a form as shown in Figure 4.2. Here ∆ω ≪ ωm . In this case Equation (4.3) can be rewritten as follows: ∆ω 0 −ωm ωm ω Figure 4.2: Spectral density function of a narrow band process γ(t) = Z ∞ cos([ωm + µ]t)dV (µ) + −∞ Z ∞ sin([ωm + µ]t)dW (µ), (4.5) −∞ Z ∞ Z ∞ γ(t) = cos(ωm t)[ cos(µt)dV (µ) + sin(µt)dW (µ)] −∞ −∞ Z ∞ Z ∞ +sin(ωm t)[ sin(µt)dV (µ) + cos(µt)dW (µ)], −∞ −∞ (4.6) 4.2 Approach 32 which eventually can be written as γ(t) = Am (t)cos(ωm t) + Bm (t)sin(ωm t), where the mutual independent stochastic processes: Z ∞ Z ∞ Am (t) = cos(µt)dV (µ) + sin(µt)dW (µ), −∞ −∞ Z ∞ Z ∞ Bm (t) = −sin(µt)dV (µ) + cos(µt)dW (µ) −∞ (4.7) (4.8) −∞ vary slowly in time since their spectral density: Sm (µ) = S(ω − ωm ) (4.9) is concentrated near µ = 0. From this, we conclude that a narrow band process can be considered as a harmonic oscillation with slowly varying amplitude: p (4.10) A(t) = Am (t)2 + Bm (t)2 and a slowly varying phase: φ(t) = arctan[Bm (t)/Am (t)] (4.11) 4.2.2 Surge model The spectral density of a tidal signal consists of a number of narrow band peaks as for example shown in Figure 4.1. This figure shows the spectral density of the water elevation in Vlissingen, which was taken from 00:00 January 1, 1999 until 23:00 December 31, 1999 with time interval 10 minutes. Similar pattern is observed at other locations as well. This fact suggests that the water level ξ(t) can be approximated by the sum of a number of narrow band processes: X X Ai (t)cos(ωi t) + Bi sin(ωi t) (4.12) ξ(t) ∼ γi (t) = = i i In order to derive a state space model for ξ(t), the power spectral density function Si (µ) of Ai (t) and Bi (t) is parameterized according to: Si (µ) = σi2 a2i + µ2 (4.13) For small values of ai , Si (µ) is a narrow-band spectrum. The autocovariance ri (t) of Ai (t) and Bi (t) is given by Z ∞ σ2 ri (t) = eiµt Si (µ)dµ = i e−ai |t| (4.14) 2ai −∞ Analysis and prediction of tidal components within observed surge using Kalman filter 33 As shown by Jazwinski, processes Ai (t) and Bi (t) with autocovariance (4.14) can be modelled by means of stochastic differential equations, which read dAi (t) = −ai Ai (t)dt + σi dα(t) dBi (t) = −ai Bi (t)dt + σi dβ(t) (4.15) where α(t) and β(t) are mutually independent stochastic processes with zero mean and unit variance and with independet increments such that: E[dα(t)dα(t)] = E[dβ(t)dβ(t)] = dt. (4.16) Since σi is constant, the stochastic differential equations above are the same in both the Ito and Stratonowitz sense. Using the Euler scheme, these equations can be approximated by: A¯i (tk+1 ) = (1 − ∆tai )A¯i (tk ) + σi ∆α(tk ), ¯i (tk+1 ) = (1 − ∆tai )B ¯i (tk ) + σi ∆β(tk ). B (4.17) (4.18) ¯i (tk ) represent the numerical approximation of Ai (tk ) and Bi (tk ) rewhere A¯i (tk ) and B spectively. Furthermore, ∆t is the time step and ∆αi (tk ) and ∆βi (tk ) are assumed to be mutually independent random variable with zero mean and variance ∆t. The state space representation of these equations can be written as X(tk ) = MX(tk−1 ) + Gη(tk−1) (4.19) where the state vector X(tk ) is defined as: ¯1 (tk ) . . . A¯i (tk )B ¯i (tk ) . . .]⊤ , X(tk ) = [A¯1 (tk )B (4.20) and η(tk ) consists of the random variables ∆αi (tk ) and ∆βi (tk ). Because ∆αi (tk ) and ∆βi (tk ) are assumed to be mutually independent random variable with zero mean and variance ∆t, the covariance matrix Q of η(tk ) can be shown to be a diagonal matrix of I∆t. M and G are coefficient matrices that consist of the terms (1 − ∆tai ) and σi respectively. The measurement Y(tk ) of water level ξ(t) is represented by the relation: Y(tk ) = H(tk )X(tk ) + ν(tk ) (4.21) where ν(tk ) is the measurement noise, assumed to be Gaussian white noise with variance R, and H(tk ) is the measurement vector given as H(tk ) = [cos(ω1 tk )sin(ω1 tk ) . . . cos(ωitk )sin(ωi tk ) . . .]. (4.22) Now that we have both system and measurement models, we combine them using Kalman filter to analyze and predict the astronomical components within observed surge. 4.3 System characteristics 34 4.2.3 Kalman filter The objective of the approach here is to estimate the harmonic parameters X(tk ) from all available realization ξ o (tk ) of observed surge Y(tk ) and use it to forecast the periodic components of the harmonic analysis residual in the future. The estimation of the timevarying tidal components within observed surge is performed by using a Kalman filter. Following the description in Chapter 2, the equations set of the Kalman filter for the equations system (4.19)-(4.22) reads: xf (tk ) = Mxa (tk−1 ) (4.23) ⊤ (4.24) x (tk ) = x (tk ) + K(tk )(ξ (tk ) − H(tk )x (tk )) (4.25) f a ⊤ P (tk ) = MP (tk−1 )M + GQG a f o f Pa (tk ) = (I − K(tk )H(tk ))Pf (tk ) K(tk ) = Pf (tk )H(tk )⊤ [H(tk )Pf (tk )H(tk )⊤ + R]−1 (4.26) (4.27) with initial conditions xa (t0 ) = xa0 a P (t0 ) = Pa0 (4.28) (4.29) Here, since we usually do not know the initial condition exactly, we set Pa (t0 ) to be very large so that the effect of inexact initial condition will vanish quickly. 4.3 System characteristics We have described the system model as well as the Kalman filter used for estimating the time-varying harmonic parameters. In this section, we are going to discuss some characteristics of the estimator scheme. Here, we analyze its frequency response and examine the possible crossover between output of different tidal frequencies. In doing so, the following set of tidal constituents is used: A0, O1, K1, MU2, M2, S2, 2SM2, 3MS4, M4, MS4, M6, 2MS6, M8, 3MS8, 4MS10. For each tidal constituent, we set the colored noise standard deviation σi = 0.02 m and ai = 2.3×10−5 min−1, which is equivalent with a correlation time of 1 month. In addition, we also used an AR(1) process as in Equation (4.17) to model the meteorological component with a = 7 × 10−4 min−1 that is equivalent with a correlation time of 1 day, and σ = 0.1 m. The sampling time parameter ∆t is 10 minutes, which is equal to the sampling time of the operational water-level observing system in the Netherlands. 4.3.1 Bode diagram To have an idea about how the Kalman filter works for the system described previously, it is useful to present how it will respond to different frequency components of the input Analysis and prediction of tidal components within observed surge using Kalman filter 35 signal. To present the frequency response characteristics of the Kalman filter scheme we used one of the most commonly used method, which is the Bode diagram (see, for example, Ogata 1990). Since generating Bode diagram requires the system to be linear and time-invariant, we need to transform the estimation system such that it becomes time-invariant. To do this, we make use of the rotational transformation matrix and the relation between the original ˜ k ) is written as: X(tk ) and the new states X(t ˜ k) X(tk ) = Tk X(t (4.30) where for each constituent with frequency ωi , the transformation matrix Tk is given by cos(ωi tk ) −sin(ωi tk ) Tk = (4.31) sin(ωi tk ) cos(ωi tk ) The transformation is linear, hence the new system representation is also linear. Under this transformation, the state elements corresponding to tidal constituent i at time tk read A cos(ω t ) + B sin(ω t ) i i k i i k ˜ k) = (4.32) X(t Bi cos(ωitk ) − Ai sin(ωi tk ) and the new system matrix is given by ˜ = T(k + 1) MT(k) = M −1 αcos(ωi∆t) αsin(ωi ∆t) −αsin(ωi ∆t) αcos(ωi ∆t) (4.33) where α = 1 − ∆ta. The transformation is also performed on the error term η(tk ) and its ˜ coefficient matrix G, which gives η(t˜k ) and G: σcos(ωi ∆t) σsin(ωi ∆t) −1 ˜ (4.34) G = T(k + 1) GT(k) = −σsin(ωi ∆t) σcos(ωi ∆t) ˜ and to the measurement matrix H ˜ = 1 0 1 0 ... 1 0 1 1 H (4.35) ˜ η(tk−1 ) ˜ X(t ˜ k) = M ˜ k−1 ) + G˜ X(t ˜ k )X(t ˜ k ) + ν(tk ) Y(tk ) = H(t (4.36) The transformed state space representation now reads (4.37) Since the coefficient matrices as described in Equation (4.33), (4.34), and (4.35) are time-invariant, it is now possible to perform steady-state analysis. Based on this new model representation, the corresponding Kalman filter equations can be derived. Express˜ a , the estimation system now reads ing the estimate in terms of the analysis state x ˜ H) ˜ M˜ ˜ xa (tk−1 ) + Ku(t ˜ k−1 ) ˜ a (tk ) = (I − K x a Y(tk ) = C˜ x (tk ) (4.38) (4.39) 4.3 System characteristics 36 ˜ is the steady state Kalman gain, u(tk ) the input to the system (i.e. observed surge where K measurement at time tk ), C is a diagonal matrix with diagonal elements (1 0 1 0 ... 1 0 1 1) and Y the output of the system. Each non-zero element of Y gives signal contributed by a tidal constituent. The system of Equation (4.38) - (4.39) shows that the Kalman filter scheme is a single-input multiple-output system. Figure 4.3 and 4.4 present the Bode diagram for several components at one time step. The figures show that at each output frequency the Kalman filter has maximum response at the respective frequency and rapid decreasing side-lobes. From these diagrams we can learn that in producing the estimates of each component, the system will take the energy at the corresponding tidal frequency and its surrounding. It acts as a bank of band-pass filter centered at the analysis component frequencies. Moreover, the diagram of any one component shows significant decrease at other analysis component’s frequencies. This shows that for any single output component the Kalman filter scheme tends to minimize the effect of other analysis components. O1 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Frequency [deg/hour] 70 80 90 100 K1 1 0.5 0 MU2 1 0.5 0 M2 1 0.5 0 S2 1 0.5 0 Figure 4.3: Bode diagram, magnitude It should be noted that the frequency response represented by the Bode diagram is independent of measurement. It is completely determined by the choice of components included in the Kalman filter and their statistics. Therefore we can adjust the filter even before the measurement is available. Analysis and prediction of tidal components within observed surge using Kalman filter O1 200 0 −200 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Frequency [deg/hour] 70 80 90 100 K1 200 0 −200 MU2 200 0 −200 M2 200 0 −200 S2 200 0 −200 Figure 4.4: Bode diagram, phase 4.3.2 Cross Over From the previous section, we have learned that the estimation system operates as a set of band-pass filters centered at the frequency of the tidal constituents included in the estimation system. The bandwidth of the frequency response at each tidal constituent is reasonably small to expect that the output will mainly consist of each component’s energy. We have also performed an experimental study using generated signal to see if cross-over between different tidal constituents occur. Here, we want to check if the existence of a tidal constituent within a signal affects the estimate of other constituents. Two main experiments are performed. The first one uses generated signal with constant harmonic parameters and the other with varying harmonic parameters. The signal is generated using the measurement model given in Equation (4.21), with the parameters as described previously. For the case of signal with constant harmonic parameters we used for all components A = 0.7 and B is chosen such that the amplitude equals one and set both ai and σi equal 0. The experiment consists of several steps. At each step, a signal is generated using one tidal constituent. The signal is then used as input to the Kalman filter. The output of the Kalman filter, which is the estimates of the harmonic parameters of all tidal constituents, 37 38 4.3 System characteristics is recorded at all time. Using these harmonic parameters, we reconstruct the signal as contributed by each output tidal constituent. To obtain an idea about the size of the contribution, we use the root-mean-square (RMS) of the output signal corresponding to each tidal constituent. The experiment is repeated for all tidal constituents mentioned earlier. When there is no cross-over, the output will be zero at all frequency constituents, except at the constituent with the same frequency as the input signal. The signal is generated for the period of 120 days. The first 40 days are excluded in computing RMS of the amplitude estimates to eliminate the initialization effect. The results are shown in Table 4.1 for the signal with constant harmonic parameters and Table 4.2 for the varying one. Output O1 K1 MU2 M2 S2 2SM2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0 Weather Input O1 K1 MU2 M2 S2 2SM2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0 0.7389 0.0536 0.0045 0.0042 0.0040 0.0038 0.0016 0.0016 0.0016 0.0010 0.0010 0.0008 0.0008 0.0006 0.0024 0.0543 0.7547 0.0049 0.0046 0.0043 0.0041 0.0017 0.0017 0.0016 0.0010 0.0010 0.0008 0.0008 0.0006 0.0023 0.0048 0.0052 0.8212 0.0640 0.0321 0.0215 0.0024 0.0023 0.0023 0.0013 0.0012 0.0009 0.0009 0.0007 0.0013 0.0044 0.0047 0.0622 0.7951 0.0621 0.0312 0.0024 0.0023 0.0022 0.0012 0.0012 0.0009 0.0009 0.0007 0.0012 0.0041 0.0044 0.0313 0.0623 0.7979 0.0625 0.0025 0.0024 0.0023 0.0013 0.0012 0.0009 0.0009 0.0007 0.0012 0.0041 0.0043 0.0217 0.0324 0.0647 0.8315 0.0027 0.0026 0.0025 0.0013 0.0013 0.0009 0.0009 0.0007 0.0012 0.0018 0.0019 0.0025 0.0026 0.0027 0.0028 0.8688 0.0677 0.0340 0.0024 0.0023 0.0012 0.0012 0.0009 0.0007 0.0017 0.0018 0.0023 0.0024 0.0025 0.0026 0.0655 0.8372 0.0655 0.0024 0.0023 0.0012 0.0012 0.0008 0.006 0.0018 0.0018 0.0024 0.0024 0.0025 0.0026 0.0341 0.0678 0.8706 0.0025 0.0024 0.0013 0.0013 0.0009 0.0007 0.0012 0.0012 0.0013 0.0014 0.0014 0.0014 0.0024 0.0025 0.0026 0.8953 0.0699 0.0025 0.0024 0.0013 0.0005 0.0012 0.0012 0.0013 0.0013 0.0014 0.0014 0.0023 0.0024 0.0025 0.0700 0.8958 0.0026 0.0025 0.0013 0.0005 0.0009 0.0009 0.0010 0.0010 0.0010 0.0010 0.0013 0.0013 0.0013 0.0025 0.0026 0.9015 0.0704 0.0024 0.0003 0.0009 0.0009 0.0009 0.0009 0.0010 0.0010 0.0013 0.0013 0.0013 0.0024 0.0025 0.0704 0.9018 0.0025 0.0003 0.0007 0.0007 0.0008 0.0008 0.0008 0.0008 0.0009 0.0009 0.0010 0.0013 0.0014 0.0026 0.0027 0.9665 0.0003 0.0043 0.0039 0.0022 0.0021 0.0020 0.0019 0.0011 0.0011 0.0010 0.0007 0.0007 0.0005 0.0005 0.0004 0.3041 0.1576 0.1472 0.0812 0.0782 0.0756 0.0734 0.0403 0.0395 0.0389 0.0265 0.0262 0.0199 0.0197 0.0159 0.3773 Analysis and prediction of tidal components within observed surge using Kalman filter Table 4.1: Amplitude RMS with constant harmonic parameters 39 40 Table 4.2: Amplitude RMS with varying harmonic parameters Output 4.3 System characteristics O1 K1 MU2 M2 S2 2MS2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0 Weather Input O1 K1 MU2 M2 S2 2SM2 3MS4 M4 MS4 M6 2MS6 M8 3MS8 4MS10 A0 Weather 0.5297 0.2056 0.0202 0.0181 0.0175 0.0191 0.0106 0.0092 0.0099 0.0079 0.0072 0.0066 0.0059 0.0060 0.0083 0.0819 0.2058 0.5524 0.0268 0.0207 0.0193 0.0195 0.0100 0.0101 0.0102 0.0078 0.0073 0.0065 0.0062 0.0061 0.0098 0.0876 0.0215 0.0255 0.6451 0.2802 0.1620 0.1086 0.0165 0.0142 0.0156 0.0100 0.0096 0.0077 0.0071 0.0068 0.0059 0.0661 0.0212 0.0207 0.2602 0.6050 0.2661 0.1561 0.0147 0.0146 0.0144 0.0091 0.0091 0.0070 0.0070 0.0066 0.0049 0.0522 0.0217 0.0203 0.1469 0.2559 0.6099 0.2743 0.0163 0.0145 0.0156 0.0090 0.0089 0.0073 0.0070 0.0067 0.0050 0.0419 0.0224 0.0245 0.1139 0.1544 0.2793 0.6652 0.0189 0.0171 0.0168 0.0107 0.0099 0.0076 0.0075 0.0071 0.0054 0.0527 0.0100 0.0111 0.0174 0.0185 0.0174 0.0205 0.7167 0.3243 0.1996 0.0203 0.0201 0.0113 0.0109 0.0091 0.0038 0.0369 0.0101 0.0099 0.0148 0.0151 0.0147 0.0170 0.2985 0.6626 0.3112 0.0204 0.0188 0.0105 0.0106 0.0091 0.0035 0.0338 0.0115 0.0116 0.0155 0.0168 0.0166 0.0181 0.1872 0.3169 0.7305 0.0223 0.0218 0.0115 0.0109 0.0092 0.0036 0.0377 0.0088 0.0086 0.0120 0.0103 0.0105 0.0107 0.0214 0.0216 0.0234 0.7599 0.3644 0.0252 0.0237 0.0143 0.0031 0.0322 0.0088 0.0085 0.0101 0.0108 0.0110 0.0110 0.0202 0.0203 0.0222 0.3557 0.7741 0.0264 0.0253 0.0143 0.0032 0.0330 0.0069 0.0071 0.0080 0.0083 0.0082 0.0087 0.0124 0.0113 0.0123 0.0258 0.0270 0.7707 0.3742 0.0289 0.0029 0.0266 0.0071 0.0072 0.0077 0.0079 0.0078 0.0083 0.0111 0.0115 0.0128 0.0243 0.0255 0.3643 0.7860 0.0304 0.0028 0.0256 0.0073 0.0075 0.0076 0.0072 0.0075 0.0078 0.0108 0.0100 0.0108 0.0159 0.0159 0.0311 0.0330 0.9125 0.0030 0.0290 0.0118 0.0115 0.0080 0.0072 0.0070 0.0075 0.0055 0.0050 0.0054 0.0046 0.0045 0.0038 0.0037 0.0039 0.3486 0.1445 0.4358 0.4308 0.3043 0.2716 0.2648 0.2891 0.2109 0.1910 0.2060 0.1759 0.1732 0.1469 0.1448 0.1497 0.6457 0.8689 Analysis and prediction of tidal components within observed surge using Kalman filter For the case of signal with constant harmonic parameters as shown in Table 4.1 we see that for each tidal component as input the cross-over is of very small percentage. They are only about 0.1%. The greatest cross-over occur at the neighboring components, which are very close in frequency to the input signal. This is consistent with the fact that the accuracy of a frequency analysis is determined by the closest two frequency lines within the system. For such output components, the cross-over may reach 2%. However for the case of varying harmonic parameters, which is more realistic than the constant parameters case, the cross-over is more pronounced with the neighboring components being the worst. The cross-over at the neighboring component can reach 40% as shown in Table 4.2. This occurs because a signal generated using varying amplitude and phase will occupy a certain frequency band instead of a single frequency line, eventhough it is generated by using a single frequency. When the variation parameters are chosen such that the signal has a rather wide band, a significant amount of energy may extend until the neighboring tidal frequencies. As the result, the output at these neighboring components will be non-zero. It should be noted here that when the input is the meteorological component, modelled by using AR(1) model, all the output components will yield significant non-zero values. This may suggest that in fact the meteorological component also has energy within tidal frequencies. 4.4 Results and discussion We have implemented the Kalman filter scheme using the observed surge, the same way as we performed the experiments using generated signal described in the previous section. One of the problems of using real signal is to select the tidal constituents to be included in the analysis. As a result of non-linear processes in the shallow-water area, the water level may contain a lot of tidal constituents. Visual inspection on the spectrum of the surge may reveal that it has many spikes around tidal frequencies. Therefore, we also performed a study using models with different set of tidal constituents. Three sets of tidal constituents are used in the study. The first set of components, called Set A, consists of one component per tidal species: A0, M2, M4, MSN6, M8, 4MS10, 3MNKS12. The second set, Set B, is the same with the one used in the previous section. For the third set, Set C, we use 38 tidal constituents that are identified by visual inspection on the spectrum of observed surge. In addition to the tidal components, an AR(1) model is used for representing the meteorological component for each set. Based on the three sets of tidal constituents, we have developed the corresponding Kalman filter schemes and used them to estimate the harmonic parameters of the tidal components in the observed surge. For each set of components, the parameters ai and σi , are chosen by trial and error to match the theoretical spectrum of a signal generated with these parameters with the spectrum of the observed surge. For illustration, Figure 4.5 and 4.6 present the estimates of harmonic parameter of several components obtained by using the Kalman filter scheme with Set B. We see from 41 4.4 Results and discussion 42 3MS8 M8 2MS6 M6 MS4 M4 3MS4 2SM2 S2 M2 MU2 K1 O1 these figures that the harmonic parameters of the tidal components in the surge are varying in time. 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0 50 100 150 200 250 Time since 1 January 1999 [day] 300 350 400 Figure 4.5: Amplitude estimates of several components using Set B, Vlissingen. The y-axis gives the amplitude measured in meter. The estimates of the harmonic parameters obtained by using the three Kalman filter schemes are then used to predict the tidal components within the surge. The performance of the prediction is measured in term of variance reduction percentage defined as R = (1 − var(S − C) )100% var(S) (4.40) 3MS8 M8 2MS6 M6 MS4 M4 3MS4 2SM2 S2 M2 MU2 K1 O1 Analysis and prediction of tidal components within observed surge using Kalman filter 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 200 0 −200 0 50 100 150 200 250 Time since 1 January 1999 [day] 300 350 400 Figure 4.6: Phase estimates of several components using Set B, Vlissingen. The y-axis gives the phase measured in degree. To have also an idea about how the performance changes with respect to the length of prediction period, we checked the performance for five different prediction periods: 10 minutes, 1, 6, 12, and 24 hours. Besides Vlissingen, we also used data taken in four other locations: Scheveningen, Den Helder, Terschellingen, and Euro Platform. The prediction performance is evaluated in term of the percentage of variance reduction, defined as: The prediction performance using these data are depicted in Figure 4.7 - 4.11. These figures shows that the performance of Set C is the best among the three sets used in the experiment, in both the short and longer term prediction. Set A tends to perform 43 4.4 Results and discussion 44 90 80 70 Variance reduction [%] 60 50 40 30 20 10 10 min 1 h 6 h 12 h 24 h 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (a) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec Aug Sep Oct Nov Dec (b) 90 80 70 Variance reduction [%] 60 50 40 30 20 10 10 min 1 h 6 h 12 h 24 h 0 −10 −20 Feb Mar Apr May Jun Jul Period (c) Figure 4.7: Variance reduction obtained using several correctors with (a) Set A, (b) Set B, and (c) Set C, Vlissingen better than Set B in the short term prediction, but worse in longer term. The findings suggest that the more components included in the analysis, the better the performance of the Kalman filter scheme. This may be due to the fact that model with more components describes the data more completely. Analysis and prediction of tidal components within observed surge using Kalman filter 45 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (a) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (b) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (c) Figure 4.8: Variance reduction obtained using several correctors with (a) Set A, (b) Set B, and (c) Set C, Scheveningen Examining the results obtained using Set C, we see that the best performance is achieved in the period between May and September. In some locations it also includes the period of April and October. In this period of time, the weather was relatively calm. The results indicate that the variance reduction is relatively large and the performance 4.4 Results and discussion 46 90 80 70 Variance reduction [%] 60 50 40 30 20 10 10 min 1 h 6 h 12 h 24 h 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (a) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec Aug Sep Oct Nov Dec (b) 90 80 70 Variance reduction [%] 60 50 40 30 20 10 10 min 1 h 6 h 12 h 24 h 0 −10 −20 Feb Mar Apr May Jun Jul Period (c) Figure 4.9: Variance reduction obtained using several correctors with (a) Set A, (b) Set B, and (c) Set C, Den Helder degrades relatively slowly in forecast time. Here, the variance reduction achieved using 24-hour prediction ranges between 20% in Scheveningen up to 60% in Europlatform. The performance difference between analysis locations also suggest difference in local tidal characteristics. It seems that the astronomical tides at off-shore is relatively easier to Analysis and prediction of tidal components within observed surge using Kalman filter 47 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (a) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (b) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (c) Figure 4.10: Variance reduction obtained using several correctors with (a) Set A, (b) Set B, and (c) Set C, Terschelingen predict than at coastal area. However, in other period the short term performance is worse and the prediction performance degrades more quickly in prediction time. As can be seen in Figure 4.8, 4.9, and 4.11, in some locations the 24-hour prediction even yields negative variance reduction. A 4.4 Results and discussion 48 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (a) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (b) 90 10 min 1 h 6 h 12 h 24 h 80 70 Variance reduction [%] 60 50 40 30 20 10 0 −10 −20 Feb Mar Apr May Jun Jul Period Aug Sep Oct Nov Dec (c) Figure 4.11: Variance reduction obtained using several correctors with (a) Set A, (b) Set B, and (c) Set C, Europlatform visual inspection on the time series shows that within this period the meteorological component tends to have very large amplitude. This unsatisfactory performance is likely to be due to large cross-over of the meteorological component into the frequency bands of the astronomical constituents. The experiment described in Section 4.3.2 shows that, with Analysis and prediction of tidal components within observed surge using Kalman filter meteorological component as input signal, the Kalman filter gives non-zero estimates of all other tidal constituents. It indicates that the frequency band-width of a large meteorological component extends to higher frequencies, which in turn affects the estimates of harmonic parameters of the other tidal constituents. Moreover, since meteorological components usually last within a short period like one day, its tidal frequencies energy also last very shortly. This means that during this period the harmonic parameters will vary relatively quickly. It should be noted that the Kalman filter scheme used in this study has been successful in extracting the harmonic components of the observed surge. This is confirmed by the fact that most of the energy at tidal frequencies have been eliminated from the observed surge, as can be seen in Figure 4.12. However, due to cross-over, the energy within the tidal bands may be contributed from several components. It is not possible for this method to separate the components having the same frequency. Since the prediction scheme used in this method is based on the signal decomposition into its frequency components, the prediction using this scheme is only possible if the harmonic parameters of these frequency components are reasonable stationary. However the results show that it seems that the data is not stationary especially during stormy period. In this period, the assumption that the harmonic parameters are slowly-varying during the prediction period does not hold anymore. The experiment results indicate that, within stormy period, the harmonic parameters seem to be strongly non-stationary. This fact suggests that the surge-tide interaction within this period is significant. Therefore, to obtain a better prediction the study about the nonlinear surge-tide interaction seems to be a must. Another point worth mentioning is the fact the Kalman filter scheme takes only a relatively short period of observation to yield the estimates. Therefore it is not possible to have well separated frequency lines in the output. To obtain well-separated frequency lines one needs a long observation period as in classical harmonic analysis. However this has disadvantage that one will miss the short-duration variation in the parameters. 4.5 Conclusion This chapter presents a method for analysis and prediction of the time varying tidal components from a harmonic analysis residual. The method is based on modelling the residual as the sum of some tidal oscillations with slowly varying harmonic parameters. The slowly varying parameters are modelled by using stochastic difference equations and estimated by using a Kalman filter. This method has been implemented by using real data taken from several locations in the North Sea. For the calm period May-October 1999, it yields improvement ranging from 20% to 60% depending on the location. However, it does not perform well during the other period. This may be due to the nonlinear interaction between meteorological and astronomical components. Another possible cause is the cross-over between different components, which can not be separated without any additional independent information. This hampers our approach from producing accurate 49 4.5 Conclusion 50 Vlissingen Amp [m] Amp [m] Vlissingen 0.04 0.02 0 0.04 0.02 0 Scheveningen Amp [m] Amp [m] Scheveningen 0.04 0.02 0 0.04 0.02 0 Den Helder Amp [m] Amp [m] Den Helder 0.04 0.02 0 0.04 0.02 0 Terschelling N Amp [m] Amp [m] Terschelling N 0.04 0.02 0 0.04 0.02 0 Europlatform Amp [m] Amp [m] Europlatform 0.04 0.02 0 0 20 40 60 80 100 120 Frequency [deg/hour] 140 160 180 200 0.04 0.02 0 0 20 40 60 80 100 120 Frequency [deg/hour] 140 160 180 200 Figure 4.12: Fourier spectrum of observed surge without correction (left panels) and with correction obtained using Set C (right panels) prediction during stormy periods. 5 Two-sample Kalman filter for steady state data assimilation∗ 5.1 Introduction One of the main goals of data assimilation is to improve the forecast performance of a dynamical model. Many data assimilation algorithms are based on Kalman filtering, where the state of a system is estimated by combining all the information that is available about the system in accord with their statistical uncertainty. The main computational issue in Kalman filter based data assimilation is the propagation of the forecast error covariance matrix. A number of methods have been proposed to solve this problem, for example the ensemble Kalman filter (EnKF) proposed by Evensen (1994). In EnKF, the forecast error covariance matrix is approximated by using a Monte Carlo method. Another method, which can also be considered as an ensemble method, is the reduced rank square root (RRSQRT) Kalman filter proposed by Verlaan and Heemink (1997). The RRSQRT Kalman filter approximates the error covariance by a matrix of a lower rank. An overview about the ensemble assimilation development can be found for example in Evensen (2003). A less computationally demanding assimilation algorithm is based on the steady state solution of the Kalman filter as suggested by Heemink and Kloosterhuis (1990). This approach is built on the fact that for systems where the covariance of the forecast error is almost time invariant, it is possible to compute the Kalman gain off-line until it has reached approximately a constant solution. Applying the constant Kalman gain in a data assimilation system reduces the computational demands to almost the same as the standard model execution without data assimilation. However, this approach still requires the solution of the Riccati equation used subsequently in the Kalman filter formulation or it requires a more elaborate scheme for the generation of the steady state gain. Heemink and Kloosterhuis (1990) used simpler dynamics for propagating the covariance matrix and a Chandrasekhar-type algorithm to compute a steady state Kalman gain. Another ∗ This chapter is based on Sumihar et al. 2008, Monthly Weather Review, 136, pg.4503-4516 51 52 5.1 Introduction approximation can be introduced by defining the error at a coarser grid size (Fukumori and Malanotte-Rizzoli 1995). The usage of a simpler dynamical model propagator for the error propagation is also suggested by Dee (1991). Sørensen et al. (2004a) computed the steady gain as a long-term average of the Kalman gains estimated by an EnKF and demonstrated its performance in the North and Baltic sea system. The EnKF, however, requires a large number of ensemble members to get the ensemble mean statistics. In this chapter, a new method for computing the steady gain is proposed, which is based only on two forecast realizations. If the covariance of forecast error is almost time invariant we can use time average instead of ensemble mean. This has the advantages of being computationally very cheap and very easy to implement. The two forecast realizations may be generated with two differing but similarly skillful prediction models. Moreover, it is also possible to use the algorithm for the case where the two realizations are generated by running the dynamical model twice, driven by two different inputs. In oceanography, for example, it becomes increasingly popular to compute the forecast error statistics by using two different wind fields, produced by two different atmospheric models, as input to an oceanic model under study (Alves and Robert 2005, Leeuwenburgh 2007). In this case no explicit model error specification is required. The proposed algorithm is based on the assumption that the error process of the system of interest is weakly stationary. That is, the mean and covariance of the forecast error of the dynamical system being studied is constant in time. However, the algorithm may also be applicable for a system where the error statistics vary slowly in time. The proposed algorithm makes use of two different forecasts performed for the same period. The difference between the two forecasts provides information on the forecast error statistics. In order to estimate the forecast error statistics from the time series of these differences, it is necessary to assume that the noise process is ergodic. In this case the covariance estimates can be computed by averaging the realization over time. In practice, ergodicity assumption may be taken for a stationary random process with a short correlation time. If we can select some samples of the random process at time interval sufficiently longer than its correlation time, the samples are effectively uncorrelated. This allows us to consider the series as independent realizations of the same distribution. In this case, averaging over time is equivalent to averaging over ensembles. The covariance estimate is used to define a gain matrix for a sequential steady-state analysis system. The proposed algorithm also consists of an iterative procedure for improving the covariance estimates. The procedure requires a fixed observing system during the iteration. The algorithm proposed in this chapter may be applied to systems where a steady state Kalman filter can be applied. A number of applications of the steady state Kalman filter can be found for example in shelf sea modellings (e.g. Canizares et al. 2001, Verlaan et al. 2005, Sørensen et al. 2006) and coastal sea modellings (e.g. Heemink 1990, Oke et al. 2002, Serafy and Mynett 2008). The application of an approximate Kalman filter, which includes the use of stationary error covariance, can also be found in oceanography (e.g. Fukumori et al. 1993). The effectiveness of the stationary assumption was examined by Fu et al. (1993) where the resulting solution was found to be statistically indistinguishable Two-sample Kalman filter for steady state data assimilation from the one obtained using the full Kalman filtering. It is interesting to mention here that the utility of a stationary covariance matrix in the Kalman gain was demonstrated even in the presence of evolving properties of the observation system (Fukumori 1995, Fukumori and Malanotte-Rizzoli 1995, Hirose et al. 1999). The proposed algorithm is expected to be applicable for any system similar to those mentioned above. Some twin experiments are performed to evaluate the performance of the proposed algorithm. This allows us to evaluate the results by comparing them to the ”truth”. The experiments are intended to demonstrate that for a number of applications, the results produced by the proposed algorithm converge to the optimal ones. For these experiments, the perturbations are drawn from a Gaussian distribution with mean zero and given covariance matrix. The results are compared to those produced by a classical Kalman filter algorithm. The chapter is organized as follows. Section 2 explains the algorithm along with its mathematical formulations. The first experiment, which uses a simple one-dimensional wave model, is discussed in Section 3. The second experiment uses the operational model for storm surge prediction, the Dutch Continental Shelf Model, which is described and discussed in Section 4. An experiment with the three-variable Lorenz model is given in Section 5. The chapter concludes in Section 6 with a discussion of the results. 5.2 Algorithm The algorithm proposed in this chapter is belonging to the class of iterative methods for finding steady-state Kalman gain. A number of such methods have been proposed, for example by Mehra (1970), Mehra (1972), and Rogers (1988), which are based on some algebraic manipulations on the Kalman filter equations. However, these methods still require propagating matrices having the same dimension as the covariance matrix into the dynamical system of interest. Therefore, they are not applicable for earth systems problems, which usually have big dimension. Our algorithm is based on estimating the forecast error covariance matrix using two forecast realizations, which are perturbed by statistically independent perturbations. This covariance matrix is used to define the Kalman gain matrix for a sequential steady state analysis system. The algorithm consists of an open-loop step and a closed-loop step. In the open-loop step, the covariance is computed without making use of any observations of the process. The closed-loop step consists of an iterative procedure for improving the covariance estimate by assimilating available observations. The result of the open-loop step serves as the starting point for the closed-loop iteration. 53 5.2 Algorithm 54 5.2.1 Statistics from two independent realizations Consider a discrete random process Xf (tk ) ∈ Rn that can be decomposed into deterministic and stochastic components as follows ˜ k) Xf (tk ) = x ¯(tk ) + X(t (5.1) ˜ k ) ∈ Rn a weakly stationary random where x ¯(tk ) ∈ Rn is a deterministic process and X(t process. Suppose we can draw two realizations of Xf (tk ), xf1 (tk ) and xf2 (tk ), obtained ˜ k ) added to the deterministic part. from two statistically independent realizations of X(t We define e1,2 (tk ) as the difference between these two series: e1,2 (tk ) = xf1 (tk ) − xf2 (tk ) (5.2) It is easy to see that e1,2 (tk ) is a weakly stationary random process and has the following properties: E[e1,2 (tk )] = 0 E[e1,2 (tk )e⊤ 1,2 (tk )] f = 2P (tk ) (5.3) (5.4) where E[ ] is the expectation operator and Pf (tk ) ∈ Rn×n is the covariance matrix of the random process Xf (tk ) defined as Pf (tk ) = E[(Xf (tk ) − E[Xf (tk )])(Xf (tk ) − E[Xf (tk )])⊤ ]. Since the random process is weakly stationary, its covariance matrix is constant, Pf (tk ) = Pf . Given a sample e1,2 (tk ), k = 1, ..., N, and assuming that the random process is ergodic, we can calculate the covariance by averaging the realization over time: N 1 1 X e1,2 (tk )e⊤ P = 1,2 (tk ) 2 N − 1 k=1 f (5.5) Note that two samples of Xf (tk ) are required to eliminate the trend within each sample, so that we can use time-averaging for computing the covariance. 5.2.2 Open-loop estimate We first discuss the open-loop estimate, where the estimation of the covariance matrix of a random process is done without making use of any observations. Consider a random process governed by a dynamic model M as the following: Xf (tk+1 ) = MXf (tk ) + Bu(tk ) + W(tk ) (5.6) where Xf (tk ) ∈ Rn denotes the state vector, u(tk ) ∈ Rp the deterministic forcing term, B ∈ Rn×p the transition matrix between the forcing and the state, and W(tk ) ∈ Rn is a white noise vector with covariance Q ∈ Rn×n . From the governing equation of Xf (tk ) we can derive the evolution equation of the covariance Pf : Pf (tk+1 ) = MPf (tk )M⊤ + Q (5.7) Two-sample Kalman filter for steady state data assimilation 55 If M is stable it will reach a steady state condition where Pf (tk+1 ) = Pf (tk ) = Pf according to Pf = MPf M⊤ + Q. Now let xf1 (tk ) and xf2 (tk ), k = 1, ..., N, be two independent samples from this process. If the process is ergodic, we can estimate Pf by using Equation (5.5). 5.2.3 Closed-loop estimate Suppose now the observation yo (tk ) ∈ Rm of the dynamical system (5.6) is available and we assume that the observation is related to the unknown true state xt (tk ) through yo (tk ) = Hxt (tk ) + V(tk ) (5.8) where H ∈ Rm×n is a linear observation operator and V(tk ) ∈ Rm is a white noise vector with a constant covariance R ∈ Rm×m representing observation uncertainty. Combining the observations and the dynamical system, we write the governing equation for Xf (tk ) as follows Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + W(tk ) (5.9) Xa (tk+1 ) = Xf (tk+1 ) + K(yo (tk+1 ) − HXf (tk+1 ) − V(tk+1)) (5.10) It is easy to show that the corresponding covariance matrices evolve in time according to a Pf (tk+1 ) = MPa (tk )M⊤ + Q (5.11) f (5.12) ⊤ P (tk+1 ) = (I − KH)P (tk+1 )(I − KH) + K R K ⊤ For a stable and time-invariant system, steady state condition will occur, where Pf (tk+1 ) = Pf (tk ) = Pf and Pa (tk+1 ) = Pa (tk ) = Pa . Let xf1 (tk ) and xf2 (tk ), k = 1, ..., N, be two independent samples from this process. Then if we assume that the process is ergodic, we can estimate Pf by using Equation (5.5). The constant gain matrix K ∈ Rn×m is chosen such that it minimizes the covariance Pa . It can be shown that such K is given by K = Pf H⊤ (HPf H⊤ + R)−1 (5.13) Note that the closed-loop system of equations (5.9)-(5.10) is actually a forecast-analysis system. 5.2.4 Proposed algorithm The algorithm is initialized by generating two independent realizations of the open-loop process (5.6). This is done by running the open-loop system twice for the same period and by perturbing both runs using statistically independent perturbations drawn from a given distribution. Having two independent realizations, the covariance is computed using Equation (5.5) and subsequently the open-loop gain matrix by using Equation (5.13). 5.2 Algorithm 56 The next step is the closed-loop iteration. In the first iteration, the gain matrix computed in the open-loop step is inserted into Equation (5.10) and the closed-loop system, Equation (5.9)-(5.10), is run twice for the same period. The forecasts in both runs are perturbed using statistically independent perturbations drawn from the model error distribution. Moreover, independent perturbations representing observation noise are also generated from a given distribution. The two independent realizations of the closed-loop forecasts (5.9) are used to compute the forecast error covariance and the corresponding gain matrix by using Equation (5.5) and (5.13) respectively. This gain matrix is subsequently inserted into the closed-loop system (5.10) for the next iteration. The closed-loop estimation is repeated until the covariance does not change anymore. At each iteration, the gain matrix computed in the previous iteration is inserted into the closed-loop system (5.10). To summarize, the proposed algorithm consists of the following steps: • Open-loop step: – generate two realizations of open-loop process, Equation (5.6): xf1 (tk ) and xf2 (tk ), k = 1, ..., N – estimate Pf using Equation (5.5) – estimate K using Equation (5.13) • Closed-loop step: – insert K into Equation (5.10) – generate two realizations of closed-loop process, Equation (5.9): xf1 (tk ) and xf2 (tk ), k = 1, ..., N – estimate Pf using Equation (5.5) – estimate K using Equation (5.13) – repeat closed-loop step until Pf , and therefore also K, have converged From the previous section, since K is chosen to minimize the covariance Pa , the covariance Pf at one iteration step will be smaller than the one at the previous iteration. Hence, it can be seen that if the iteration process has converged, then K is the optimal gain. Note that for linear, time-invariant systems, the corresponding steady-state Kalman gain is a solution to the iteration. This algorithm has much in common with the analysis-ensemble method (Fisher, 2003). The novel aspects of this algorithm are the use of the open-loop estimate as an initial guess for the closed-loop iterations and the iterative improvement of the estimated covariances. Our algorithm is also similar as the ensemble Kalman filter (Evensen, 2003). The difference is that instead of averaging over ensembles, the forecast error covariance is estimated by averaging over time. Note that the equations of the closed loop system are the same as in the Kalman filter formulation. The difference is that in our algorithm K is updated every iteration step, Two-sample Kalman filter for steady state data assimilation 57 while in Kalman filter it is updated every time step. It is a well known result that for a stable and time-invariant system the Kalman gain will converge. For the same system the gain computed by the proposed algorithm is expected to converge as well. The computational cost of the proposed algorithm depends on the length of the two samples as well as on the number of observations used for assimilation. It should be noted here that the computation of the gain matrix K requires only the computation of the two terms Pf H⊤ and HPf H⊤ , which we compute as N 1 1 X (e1,2 (tk ))(He1,2 (tk ))⊤ P H = 2 N − 1 k=1 f ⊤ (5.14) and N 1 1 X HP H = (He1,2 (tk ))(He1,2 (tk ))⊤ 2N −1 f ⊤ (5.15) k=1 For a limited number of observations per analysis step this is a limited amount of work. Moreover, since the statistical computations are performed off-line, the computational cost is not an issue as long as it is feasible for the system of interest. Once the gain matrix is computed and implemented in a data assimilation system, the computational cost will be the same as an assimilation system using a steady state Kalman filter or optimal interpolation. Furthermore, the implementation is simple since only the analysis-equation (5.10) is added to the code of the model. From the description above, we can see that the algorithm is applicable for an assimilation system with fixed observation network. Moreover, it also assumes the error process of the system under study to be weakly stationary. The stationarity assumption impedes the algorithm from having the ability to include a flow-dependence in the forecast error statistics. However, in a number of real time forecasting systems, it has been shown that it is sufficient to use a stationary covariance for data assimilation (e.g. Heemink and Kloosterhuis 1990, Sørensen et al. 2004b). In these applications, a steady state Kalman filter is found to be sufficiently accurate. In particular, Canizares et al. (2001) used an ensemble Kalman filter for a shelf sea model and showed that the forecast error covariance matrix in this specific case tends to a quasi-steady state after a few days of asimilation. Moreover, Sørensen et al. (2006) studied and compared the properties of EnKF, RRSQRT Kalman filter and steady state Kalman filter using an idealised bay and demonstrated the effectiveness of the steady state Kalman filter over the other methods for the given system. Hence, the steady state Kalman filter is recommended due to its low computational cost, especially if an operational setting is considered. A constant forecast error covariance matrix is also used in other data assimilation systems, which work with for example optimal interpolation (e.g. Demirov et al. 2003, Borovikov et al. 2005, Sun et al. 2007) and 3DVAR (e.g. Daley and Barker 2001, Devenyi et al. 2004). Although in this chapter the algorithm is developed for using two realizations, it can be extended for a larger number of realizations. With a larger number of realizations, it is possible to use samples over a shorter time. This may relax the stationarity assumption. 58 5.3 Experiment using a simple wave model Note that for a sufficiently large number of realizations, it is possible to compute the covariance estimate by averaging over ensembles over one simulation time step. In this case the algorithm reduces to the ensemble Kalman filter. The choice of using only two realizations is set by a practical consideration. The algorithm is intended to be used for a future study using an operational storm surge prediction model, where two wind fields produced by two different meteorological models are available. Assuming that the two wind fields are generated from the same atmospheric random process, the two storm surge forecasts are produced by running the storm surge model twice, each run is driven by one of the two wind fields. In this case, no model error statistics have to be specified explicitly. 5.3 Experiment using a simple wave model To check the validity of the algorithm proposed in this chapter we performed an experiment using a simple wave model. This model is a simplified version of the De St. Venant equation, which can be applied for modelling tidal flow in a long and narrow estuary. The governing equations of this model are ∂ξ ∂u +D =0 ∂t ∂x ∂ξ cu ∂u +g + =0 ∂t ∂x D ξ(x = 0, t) = ξb (t) + wb (t) (5.16) (5.17) (5.18) ξ(x, t = 0) = 0 (5.19) u(x, t = 0) = 0 (5.20) u(x = L, t) = 0 (5.21) where ξ(x, t) is the water level above a reference plane, u(x, t) is the average current velocity over a cross section, D is the water depth, g is the acceleration of gravity and c is the friction constant. This system of equations is solved numerically by using a finite difference method. A Crank-Nicholson scheme with the implicity factor θ = 0.4 is used and the discretization in space is based on the use of a staggered grid. The uncertainty of the model is assumed to be caused by an inaccurate sea-estuary boundary. A colored noise process, denoted by wb (tk ), modelled as an AR(1) process is used to model this uncertainty: wb (tk ) = αb wb (tk−1 ) + ǫ(tk ) (5.22) ∆t where αb is a correlation parameter defined by αb = e− Tc , ∆t is the time step, Tc is the correlation time parameter and ǫ is a Gaussian white noise process. The correlation time is set to 2 hours and the standard deviation of ǫ is chosen to be 20 cm for this study. Moreover, a Gaussian white noise process with standard deviation of 2 cm is used to represent the observation error. For this experiment, the values of the other parameters are D = 10 meter, c = 0.0002s−1, L=60 km, time step ∆t=1 minutes, and number of grid points for spatial discretization n Two-sample Kalman filter for steady state data assimilation 59 = 80, hence ∆x = L/(n − 1). Moreover, at the sea-estuary boundary, x = 0, a periodic water level is generated according to ξb (tk ) = 0.5sin(2πtk /T ), where T =3 hours. Using this model, the open- and closed-loop processes described in Section 5.2 are carried out. At each iteration, the model is run twice, each with a different noise realization, for the same simulation period of 600 hours. An artificial measurement generated at location x = 0.3L is inserted into the model at each time step in the closed-loop runs. The gain matrix computed at one iteration is inserted into the model used in the next iteration. The procedure is repeated until no significant change in the gain matrix is apparent. The results in term of the water level gain matrices are presented in Figure 5.1a-5.1d. We observe that there is no significant difference between the gain matrix estimated after the second closed-loop iteration from the one estimated after the third iteration. This indicates that the algorithm has converged. 1 (a) 0.5 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 x [km] 40 50 60 1 (b) 0.5 0 1 (c) 0.5 0 1 (d) 0.5 0 1 (e) 0.5 0 Figure 5.1: Estimated gain of water-level ξ (a)open-loop, (b)-(d)closed-loop estimates, 1st-3rd iteration steps, (e) computed using RRSQRT with 50 modes. The vertical line shows the location of the assimilation station. To evaluate the performance of the algorithm, we compare the gain matrix estimated 5.4 Experiment using the Dutch Continental Shelf Model 60 by using our algorithm to the one obtained by using a Reduced Rank Square-Root (RRSQRT) Kalman filter (Verlaan and Heemink, 1997). The result, shown in Figure 5.1e, has been obtained by running the RRSQRT Kalman filter until the Kalman gain matrix has become constant. The number of modes used to produce this result is 50, which turned out to be sufficient to estimate the Kalman gain accurately. Comparing the gain matrix estimated at the third closed-loop iteration with the one obtained using the RRSQRT Kalman filter, we see that they are very similar with each other. This indicates that for this model the algorithm proposed in this chapter can reproduce the steady-state Kalman filter gain. 5.4 Experiment using the Dutch Continental Shelf Model The second experiment is carried out by applying the proposed algorithm to the Dutch Continental Shelf Model (DCSM). This is an operational model used in the Netherlands for real-time storm surge prediction. In the operational setup, a steady state Kalman filter based on the work of Heemink and Kloosterhuis (1990) has been implemented to improve the forecast’s accuracy of the model. Hence, in this study, it is possible to compare the gain matrix computed by using our proposed algorithm to the one computed using this steady state Kalman filter. The governing equations used in the DCSM are the nonlinear 2-D shallow water equations. However, for the propagation of the forecast error covariance matrix a linearized system of equations is used instead (see e.g. Verlaan 1998; ten Brummelhuis and Heemink 1990; Heemink and Kloosterhuis 1990): ∂ξ ∂u ∂v +D +D =0 ∂t ∂x ∂y ∂ξ cf u τx ∂u +g − fv + − =0 ∂t ∂x D D ∂v ∂ξ cf v τy +g + fu + − =0 ∂t ∂y D D (5.23) (5.24) (5.25) where: u, v = depth-averaged currenct in x and y direction respectively ξ = water level above the reference plane D = water depth below the reference plane g = acceleration of gravity f = coefficient for the Corriolis force cf = coefficient for the linear friction τx , τy= wind stress in x and y direction respectively These equations are discretized using an Alternating Directions Implicit (ADI) method and a staggered grid that is based on the method by Leendertse (1967) and Stelling (1984). Two-sample Kalman filter for steady state data assimilation 61 In the implementation, the spherical grid is used instead of the rectangular (e.g. Verboom et al., 1992). Two sources of uncertainty are usually considered: uncertainty at the open boundary and uncertainty of the meteorological forcing. However in this experiment we assume that the source of uncertainty is only due to an erroneous wind forcing. To account for this uncertainty noise processes wu and wv are added directly to the u and v velocities instead of to the depth average momentum equations. It is generally believed that the noise process is correlated in time, hence an AR(1) process is chosen to represent the uncertainty in each direction. Denoting wu (x, y, tk ) and wv (x, y, tk ) as the zonal and meridional directions respectively, the noise processes are written as wu (x, y, tk ) = αw wu (x, y, tk−1) + ǫu (x, y, tk ) (5.26) wv (x, y, tk ) = αw wv (x, y, tk−1) + ǫv (x, y, tk ) (5.27) where αw is a correlation parameter and ǫu , ǫv are white noise processes. The spatial characteristic is specified by using a spatial correlation given by √ E[wu (x1 , y1 , tk )wu (x2 , y2 , tk )] 2 2 p = e− (x1 −x2 ) +(y1 −y2 ) /dw (5.28) 2 2 E[wu (x1 , y1 , tk ) ]E[wu (x2 , y2 , tk ) ] √ E[wv (x1 , y1 , tk )wv (x2 , y2 , tk )] 2 2 p = e− (x1 −x2 ) +(y1 −y2 ) /dw (5.29) E[wv (x1 , y1, tk )2 ]E[wv (x2 , y2 , tk )2 ] E[wu (x1 , y1, tk )wv (x2 , y2 , tk )] p =0 (5.30) E[wu (x1 , y1 , tk )2 ]E[wv (x2 , y2 , tk )2 ] where p E[∆u(x1 , y1 , tk )2 ] = p E[∆v(x2 , y2 , tk )2 ] = σw (5.31) and dw is a characteristic distance for the spatial correlation. The noise term is introduced on a coarser grid to reduce the number of noise inputs. The noise at other grid points are interpolated subsequently by using bilinear interpolation. The DCSM covers an area in the north-east European continental shelf, i.e. 12oW to 13o E and 48oN to 62o N, as shown in Figure 5.2. The resolution of the spherical grid is 1/8o × 1/12o , which is approximately 8 × 8 km. With this configuration there are 201 × 173 grid with 19809 computational grid points. The time step is ∆t = 10 minutes and the friction coefficient cf = 0.0024. For the wind input noise parameters the following values are used: σǫ = 0.003 m/s/time-step, αw = 0.9 per time-step, which corresponds to a correlation time of 1.6 hours, and dw = 19 grid cells. The noise is defined at a coarser grid, where each coarser grid consists of 28 × 28 computational grid cells. Moreover, a white noise process with standard deviation σm = 0.10 meter is used to represent the uncertainty of the observations. There are eight observation stations which are included in the assimilation, five of which are located along the east coast of the UK and the others along the Dutch coast (see Figure 5.2). Using the specifications described above, three months artificial observation data are generated at these assimilation stations. The generated observations are used in the closedloop iteration to improve the gain matrix estimates. Figure 5.3 shows the gains in term of 5.4 Experiment using the Dutch Continental Shelf Model 62 0 62 200 20 20 200 0 0 20 60 50 50 50 50 200 200 50 50 50 50 50 54 a 50 200 52 50 50 50 2 50 50 50 56 50 latitude (o N) 50 50 5 0 200 50 1 58 8 e 3 b d 7 4 5 c 6 f g h 50 50 50 0 20 −10 −5 0 longitude (o W/E) 5 10 Figure 5.2: DCSM area with 50 m and 200 m depth contours and some water-level observation stations. Assimilation stations: 1. Wick, 2. North Shields, 3. Lowestoft, 4. Sheernes, 5. Dover, 6. Vlissingen, 7. Hoek van Holland, 8. Den Helder. Validation stations: a. Aberdeen, b. Euro Platform, c. Cadzand, d. Roompotsluis, e. IJmuiden, f. Harlingen, g. Huibertgat, h. Delfzijl. water level computed by using our algorithm as well as by using the operational steady state Kalman filter. Here we show only the gains for the assimilation station Wick. By visual inspection we see that the iteration is convergent. Moreover, the results computed at the fourth closed-loop iteration shows similar structure and magnitude as the one computed by using the steady state Kalman filter (Heemink and Kloosterhuis, 1990). Some minor differences are apparent at points further away from the assimilation station. This may be due to sampling error, which is a common difficulty in ensemble-based approaches. The limited number of samples may produce spuriously large magnitude of covariance estimates between greatly separated grid points where the covariance is actually small. However, despite of these differences it is clear from the pictures that in general the algorithm that we propose in this chapter can reproduce the gain estimated by Two-sample Kalman filter for steady state data assimilation using the steady state Kalman filter. Figure 5.3: Water-level gain for assimilation station Wick: (a) open-loop estimate, (b)-(e) 1st-4th closed-loop iteration estimates, and (f) computed using steady-state Kalman filter. 63 5.4 Experiment using the Dutch Continental Shelf Model 64 3 obs det open−loop 2.5 2 Water level [m] 1.5 1 0.5 0 −0.5 −1 −1.5 00:00 06:00 12:00 12−Jan−2005 18:00 24:00 Figure 5.4: Water level timeseries at Wick on 12 January 2005: observed data, deterministic model without assimilation, with assimilation using the open-loop gain Besides performing visual checks on the estimated gain, another evaluation was also carried out. This evaluation is done by implementing the gain matrices computed using our proposed algorithm in a data assimilation system similar to the operational one, where the nonlinear version of system of equations (5.28)-(5.30) has been used (see Stelling, 1984). For this evaluation, the real water level data observed within the period of 1 November 2004 - 1 February 2005 have been assimilated. Figure 5.4 presents three waterlevel timeseries at Wick for the period of 12 January 2005 00:00 - 24:00. These timeseries refer to water levels obtained from observations, forecasts using deterministic model without data assimilation, and forecasts with data assimilation using the gain matrix obtained in the open-loop step. This figure demonstrates that the differences between the forecasts with data assimilation and the observations are always smaller than the differences between the forecasts without data assimilation and the observations. To acquire a better idea about the filter performance, the root-mean-square of the water level innovation is calculated over the whole simulation period: v u N u1 X t RMS = (ξ o(tk ) − ξ a (tk ))2 (5.32) N k=1 where ξ o is the observed water level and ξ a is the analyzed water level. To check whether the data assimilation works also at other locations, the RMS water level innovation is also computed in some validation stations, where the observation data has not been used for the assimilation. In this experiment we used 8 assimilation stations and 8 validation stations. Figure 5.5 and 5.6 show the RMS water level innovation at the assimilation and validation stations respectively. In these figures we also present the RMS innovation Two-sample Kalman filter for steady state data assimilation 65 of the water level obtained by using the steady-state Kalman filter. It is clear from these pictures that using the gain matrices computed at both the open- and closed-loop steps, the filter reduces the RMS values of the water level innovations at both the assimilation and validation stations. At most assimilation stations, the largest reduction of the RMS water level innovation is obtained by the assimilation setup with the gain matrix computed at the open-loop step. This should be expected as the open-loop gain is bigger in the absolute sense than the closed-loop ones. As a result the assimilation system with the open-loop gain forces the states to be closer to the observations. The closed-loop gains however are spatially smoother than the open-loop one. Moreover, we can also see that the RMS water level innovation converges to the one obtained using the steady-state Kalman filter. 30 25 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop Steady−state KF cm 20 15 10 5 0 wick nort lows sher dovr vlis hoek denh Figure 5.5: RMS of innovations with respect to water level observations at assimilation stations 5.5 Experiment using the three-variable Lorenz model To verify whether the algorithm proposed in this chapter can also be applied for an unstable dynamical system, we also performed experiments using a three-variable Lorenz model. The Lorenz model is a simplified system of the flow equations governing thermal convection. It was introduced by Lorenz (1963) and has been the subject of extensive studies motivated by its chaotic and strongly nonlinear nature and yet small in dimension. Due to these properties, it is often used as a testbed for examining various data assimilation methods for systems with strongly nonlinear dynamics (e.g. Miller et al. 1994, Evensen 1996). 5.5 Experiment using the three-variable Lorenz model 66 30 25 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop Steady−state KF cm 20 15 10 5 0 abdn eurp cadz room ijmd harl huib delf Figure 5.6: RMS of innovations with respect to water level observations at validation stations The governing equations of the Lorenz model are dx = σ(x − y) dt (5.33) dy = ρx − y − xz (5.34) dt dz = xy − β (5.35) dt where x, y, and z are the dependent variables, varying over dimensionless time t. For this experiment, we applied the parameter values, as first used by Lorenz to obtain chaotic solutions: σ = 10, ρ = 28, and β = 8/3. The model was integrated with a second order Euler numerical scheme with a time step ∆t = 1/60 time units. To account for model uncertainty, three independent white noise processes are added to each variable at each time step. The white noise processes √ are drawn from a Gaussian distribution with zero mean and variance equal to 0.5 ∆t. Using this system, we computed a true reference solution for a duration of 40 time units and with initial values (xo , yo , zo ) = (1.508870, −1.531271, 25.46091). A sequence of observations is generated at intervals of 0.25 time units by adding to the reference solution a generated random number representing the observation error. For this study, the observations are generated for all variables x, y, and z and the observation noise is generated from an unbiased Gaussian distribution with variance equal to 2.0 independently for each variable. We implemented the proposed algorithm with the Lorenz model described above to estimate the corresponding steady state gain matrix. The two forecasts are generated by running the system twice for the same period, where each run is perturbed independently by noise, drawn from a distribution with the same statistics as has been used to generate Two-sample Kalman filter for steady state data assimilation the true solution. Since the system is unstable, the two realizations of the open-loop system follow completely different trajectories. Therefore, the open-loop estimate can not be computed since the difference between the two realizations is not stationary. To solve this problem, the open-loop step is skipped and a reasonable gain matrix should be used as the starting point for the closed-loop iteration. This will induce a feedback mechanism, which makes the overall system stable. For the experiment presented in this chapter we used the gain matrix K = 0.5I as the starting point for the closed-loop iteration, where I is a 3 × 3 identity matrix. The closed-loop estimation was iterated 15 times and the results are presented in Figure 5.7. This figure shows the elements of the 3 × 3 gain matrix K, where the bars grouped per element show the estimates of the respective element produced from consecutive iterations. It can be seen from the figure that the estimates of the gain matrix converge after seven iterations. The method to compute the forecast error covariance matrix in this experiment is similar to the method used in Yang et al. (2006b) to compute the estimate of the background covariance for a 3DVAR assimilation system. The difference is that they used the time average of the difference between the estimate and the true solutions. In reality, however, the true solution is never known. The next step is to use the gain matrix estimated above in a data assimilation experiment to examine its performance. Figure 5.8a shows both the true reference as well as the model solution without any assimilation for variable x, while Figure 5.8b presents the results when the constant gain matrix obtained from the 15th closed-loop iteration is used for assimilation. The model solution without any data assimilation follows completely different trajectory from the true one. On the other hand, Figure 5.8b demonstrates that in this experiment the assimilation system is able to track the true solution. It is generally able to reproduce the correct amplitudes in the peaks of the solution. There are a few points where the solution deviates rather significantly from the true solution, for example at time t = 17, t = 24, t = 31, and t = 37. However, the solution is quickly recovered and it starts to trace the true solution again. For comparison we also conducted a data assimilation experiment using the ensemble Kalman filter (Evensen, 2003). Here the ensemble consists of 1000 members and the results are presented in Figure 5.8c. Qualitative examinations on these plots indicates that the EnKF performs better in tracking the true solution, especially at the times where the solution obtained with assimilation using the constant gain matrix deviates rather significantly from the true one. To acquire a better insight about the performance, the RMS error of all variables with respect to the true solution is shown in Figure 5.9. It is clear from this figure that the solution obtained using EnKF is more accurate than the results using the algorithm proposed in this chapter. It seems that for a strongly nonlinear system like the Lorenz model, the assimilation system should include a flow dependent forecast error covariance estimate for accurate results. However, the computational cost of the proposed algorithm is much cheaper than the EnKF. In practice, a decision has to be made involving the trade-off between the desired accuracy and the computational cost required to achieve it. 67 5.6 Conclusion 68 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 K(1,1) K(1,2) K(1,3) K(2,1) K(2,2) K(2,3) K(3,1) K(3,2) K(3,3) Figure 5.7: The estimated gain matrix K of Lorenz model 20 (a) x 10 0 −10 −20 0 5 10 15 20 25 30 35 20 40 (b) x 10 0 −10 −20 0 5 10 15 20 25 30 35 20 40 (c) x 10 0 −10 −20 0 5 10 15 20 Time (scaled) 25 30 35 40 Figure 5.8: Timeseries of variable x: (a) without assimilation, (b) with assimilation using two-sample method, (c) with assimilation using EnKF. The dashed line shows the true reference solution, the solid line is the estimated solution, and the diamonds show the simulated observation 5.6 Conclusion In this chapter a new filter algorithm is proposed. The algorithm is based on estimating the error covariance by using two forecast samples. This covariance is used to define a gain Two-sample Kalman filter for steady state data assimilation 69 14 Without assimilation Two−sample method EnKF 12 10 RMSE 8 6 4 2 0 x y z Figure 5.9: RMSE of forecast of variables x, y , and z without assimilation, with assimilation using two-sample method, and with assimilation using EnKF matrix for a steady state data assimilation system. Intended for a steady-state operation, it is applicable for models where approximately a statistically stationary condition will occur. Its applicability also requires the models to have short memory to allow for the ergodicity assumption. The algorithm consists of an iterative procedure for improving the estimate of covariance. Twin experiments have been performed to evaluate the performance of the algorithm. In the first experiment a simple one-dimensional wave model is used, while for the second one, we used the operational storm surge model: the Dutch Continental Shelf Model (DCSM). The evaluation is done by comparing the gain matrix computed using the proposed algorithm to the one computed by using the exact steady state Kalman filter. Both experiments show that the results using the proposed algorithm converge to the ones produced by using the classical Kalman filter. An experiment using the three-variable Lorenz model has also been conducted to see the possibility of applying the proposed algorithm in an unstable dynamical system. Since the system is unstable, the open-loop estimate is skipped and the closed-loop iteration should be initialized with a reasonable gain matrix. Using a diagonal matrix as the starting point, the closed-loop iteration has been shown to converge. Moreover, the data assimilation system with the constant gain matrix, estimated using the proposed algorithm seems to do a reasonably good job in tracking the true solution. This result demonstrates the potential use of the proposed algorithm also for unstable systems. In the experiment using the DCSM, the two forecasts were generated by running the model twice for the same period and by perturbing both runs using statistically independent perturbations drawn from a known distribution. In practice however this distribution is unknown. Therefore, in the next study, we are going to explore the possibility of using two wind fields produced by two different atmospheric models to generate proxies for the forecast error of the DCSM. The two wind fields are considered as two realiza- 70 5.6 Conclusion tions of a random atmospheric process. Using the algorithm proposed in this chapter, the two forecasts will be generated by running the model twice, each run will use one of the two wind fields as input without any artificially generated perturbations nor assuming any distribution. With this approach explicit modelling of model error covariance is avoided. 6 Storm surge forecasting using a two-sample Kalman filter∗ 6.1 Introduction In the Netherlands, storm surge forecasting system has both social and economic importance. It is required for the operation of some moveable storm surge barriers for flood prevention. Moreover, the passage towards Rotterdam harbor requires water level prediction especially for very large ships. The storm surge forecast is performed by using a numerical model based on shallow water equations called the Dutch Continental Shelf Model (DCSM). The main source of uncertainty in a storm surge forecast model is the meteorological forcing and meteorological effects outside the domain of the storm surge model (Gerritsen et al. 1995). To cope with this error, the operational system assimilates the observed water level into the model. The data assimilation used in the operational system is based on a steady state Kalman filter (Heemink 1988; Heemink 1990). Like many other data assimilation algorithms, the steady state Kalman filter requires specification of the model error statistics. However, the true error statistics are not known and difficult to estimate. One usually resorts to make some parameterizations or to model the structure of the error covariance. In the operational system, this is done by assuming certain structure of the model error covariance, like spatially uniform error variance and isotropic spatial correlation. In this paper, we explore another way of working around the estimation of the model error covariance for the DCSM. The idea of the approach followed in this study is to use the difference between two similarly skillful models (realizations) to represent the unknown true model error. Here the error covariance is computed by averaging the difference between the two realizations over time. In the field of ocean modelling, for example, the two realizations can be obtained by driving the ocean model under study by using two wind fields produced independently by two meteorological centers. This approach is also ∗ This chapter is based on Sumihar et al. 2009, submitted to Monthly Weather Review 71 72 6.1 Introduction used, for example, by Alves and Robert (2005) and Leeuwenburgh (2007). However, their approach is in the framework of ensemble Kalman filtering, while ours is of steady-state Kalman filtering. For our study, two wind fields from two different meteorological centers are available as input forcing to the DCSM. Those are the Royal Dutch Meteorological Institute (KNMI) version of HIRLAM (Und´en et al. 2002) and the UK Meteorological Office North Atlantic European model (Lorenc et al., 1991). In the remainder of this paper, these wind fields are called simply HIRLAM and UKMO respectively. Here it is assumed that each wind field is deviated from the true one by a correlated noise, where both the true wind field and noise are unknown. The difference between two model results, obtained by driving the DCSM by HIRLAM and UKMO wind field, can be used as proxy to the unknown true error. The error covariance is estimated from this difference by averaging over time. The error covariance in turn is used to compute a Kalman gain. The computation of the forecast error covariance, and subsequently the Kalman gain, is done by using a two-sample Kalman filter, proposed by Sumihar et al. (2008). The two-sample Kalman filter is an iterative algorithm for computing a steady state Kalman gain of a stochastic system. At each iteration, the algorithm uses two samples of the stochastic process of interest to compute the covariance of the process. In turn, the covariance is used to define a Kalman gain. The covariance estimate is improved at the next iteration, by feeding back the Kalman gain into the system and regenerating two independent samples of the system. The iteration is stopped when the estimate has converged. The approach described in this paper offers several advantages. The main advantage of our approach is that it avoids explicit modelling of the error covariance. It avoids, for example, the necessity to assume isotropic spatial correlation or to parameterize the error covariance. The second advantage is its ease of implementation. The computer code of the two-sample Kalman filter is easy to develop and it is separated from the model code. It only requires the output of the model. Once the Kalman gain is ready, its implementation for data assimilation only requires a few additional codes to the existing computer programs of the model. Another advantage is that it is computationally very cheap. The Kalman gain is computed off-line and only once. Therefore, running the filter adds only a small fraction to the running time of the model itself. The use of a steady state filter has been proven effective in many systems like, for example, shelf sea systems (e.g. Canizares et al. 2001, Verlaan et al. 2005, Sørensen et al. 2006) and coastal systems (e.g. Heemink 1990, Oke et al. 2002, Serafy and Mynett 2008). Its application can also be found in oceanography (Fukumori et al. 1993), where the resulting solution was found to be statistically indistinguishable from the one obtained by using full Kalman filtering (Fu et al. 1993). Moreover, Sørensen et al. (2006) compared the properties of a steady state Kalman filter, to the more sophisticated methods: EnKF (Evensen 1994) and RRSQRT Kalman filter (Verlaan and Heemink 1997) and demonstrated the effectiveness of the steady state Kalman filter over the other methods for the given system. Hence, the steady state Kalman filter is recommended due to its low com- Storm surge forecasting using a two-sample Kalman filter putational cost, especially if a real time application is considered. A disadvantage is that the applicability of steady-state Kalman filter depends on the application. If the errorcovariances or observation network are not approximately stationary the approach breaks down. The plan of this chapter is as follows. The approach used in this study is described in Section 2. It consists of the description about the two-sample Kalman filter algorithm as well as a system transformation. The two-sample Kalman filter is based on the assumption that model error is uncorrelated in time. However, the error in wind field is generally believed to be correlated in time. Therefore, a system transformation is required to transform the system representation into the one, which is driven by uncorrelated noise. Section 3 describes a twin experiment using a simple one-dimensional wave model to demonstrate the performance of the two-sample Kalman filter in combination with the system transformation. The experiments using the DCSM are described and discussed in Section 4. Section 5 briefly concludes the paper. 6.2 Approach 6.2.1 Introduction In this section, we give a description about the approach used in our study. First, we describe briefly the two-sample Kalman filter algorithm. As we will see, like any other Kalman filtering algorithms, the two-sample Kalman filter algorithm requires the system of interest to be driven by white noise process. However, in certain applications, such as storm surge prediction, it is generally believed that the noise process is correlated in time. In fact, it should be considered to be autocorrelated in time and space, because the statistics of many natural processes can be approximated this way (Lermusiaux et al. 2006). In this case, we need to transform the model representation such that it is driven by white noise. In our study, like in the operational system at KNMI, we assume that the error can be represented as an AR(1) process. Therefore, here we describe a system transformation for the case when noise is governed by an AR(1) process. The transformation, however, will be similar for other ARMA type error model as well. 6.2.2 Two-sample Kalman filter algorithm The two-sample Kalman filter is an iterative procedure for computing a steady-state Kalman gain for a system with time invariant forecast error covariance (Sumihar et al. 2008). The idea of this algorithm is to generate two independent forecast samples of the system under study and use their difference to compute the forecast error covariance by averaging over time. The covariance is in turn used to compute a gain matrix estimate. The estimate is then improved iteratively until the solution has converged. 73 6.2 Approach 74 Suppose we have a stochastic process Xf (tk ) governed by Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + η(tk ) (6.1) Xa (tk+1 ) = Xf (tk+1 ) + K(yo (tk+1 ) − HXf (tk+1 ) − ν(tk+1 )) (6.2) where M is model dynamics, B input forcing matrix, u(tk ) input forcing, η(tk ) whitenoise process representing model and/or input forcing error, ν(tk ) white-noise process representing observation error, yo (tk ) observation, H observation matrix, Xa (tk ) analysis state vector, and K is a constant gain matrix. For a stable and time-invariant system, steady state will occur. Since η(tk ) and ν(tk ) are white noise processes, it can be shown that at steady state, the covariance matrices of the processes Xf (tk ) and Xa (tk ) are given respectively by Pf = MPa M⊤ + Q (6.3) Pa = (I − KH)Pf (I − KH)⊤ + K R K⊤ (6.4) The constant gain matrix K is defined to minimize the variance of Xa (tk ), which is equivalent to minimizing the trace of Pa : K = Pf H⊤ (HPf H⊤ + R)−1 (6.5) The objective of the two-sample Kalman filter algorithm is to obtain K by using two samples of the process Xf (tk ), k = 1, ..., N. Suppose we have two realizations of the random proccess Xf (tk ): xf1 (tk ) and xf2 (tk ), k = 1, ..., N. We define e1,2 (tk ) as the difference between these two series: e1,2 (tk ) ≡ xf1 (tk ) − xf2 (tk ) (6.6) The readers can verify that e1,2 (tk ) has the following statistics: E[e1,2 (tk )] = 0 (6.7) f (6.8) E[e1,2 (tk )e⊤ 1,2 (tk )] = 2P where Pf ∈ Rn×n is the covariance of random process Xf (tk ) defined as Pf = E[(Xf (tk )− E[Xf (tk )])(Xf (tk ) − E[Xf (tk )])⊤ ]. Given a sample e1,2 (tk ), k = 1, ..., N, and assuming that the time interval is sufficiently long that they are reasonably independent of each other, we can estimate the covariance, Pf , by averaging over time: Pf = N 1 1 X e1,2 (tk )e⊤ 1,2 (tk ) 2 N k=1 (6.9) The two-sample Kalman filter algorithm consists of generating two samples of Xf (tk ) from the dynamics (6.1)-(6.2), utilizing the two samples to compute the forecast error covariance Pf using equation (6.9), and using Pf to determine the gain matrix K with Storm surge forecasting using a two-sample Kalman filter equation (6.5). The value of K is fed back to the analysis equation (6.2) and all these steps are repeated until the estimate of K has converged. The solution of the two-sample Kalman filter algorithm is the value of K at the final iteration. Nevertheless, for practical purposes, one can also use the estimate of K from the intermediate iteration steps. The two-sample Kalman filter algorithm should be started from an initial value of K. Basically, any value of K that stabilizes the overall dynamics can be chosen. One can use the gain matrix already used in an existing data assimilation system based on an optimum interpolation, for example. However, for the applications that we have worked with, we have found that K = 0 is a good initial guess (Sumihar et al. 2008). In this study, we also use K = 0 as initial value. For accordance with our previous work, we call throughout the step with K = 0 as the open-loop step and the following ones as closed-loop steps. This algorithm is similar with the analysis-ensemble method (Fisher 2003), where analyses from different time levels are used to compute the error covariance. The new aspects of the two-sample Kalman filter algorithm are the use of the open-loop estimate and the iterative improvement of the estimated covariance. It also shares similarity with the ensemble Kalman filter (Evensen 2003) in the sense that it uses ensemble of model state realizations for computing error covariance. But the two-sample Kalman filter uses only two samples long in time of the model state and the averaging is performed over time. In fact, this algorithm can be extended for more than two samples. Using more samples, the averaging can be done over a shorter time. When the sample size is big enough, it reduces to the EnKF. The choice of using two samples is set by a practical consideration. Here, we want to investigate the possibility of using two different wind fields to determine error covariance of a storm surge forecast model. The use of two forecast samples is similar to the NMC method (Parish and Derber 1992). The difference is that the NMC method uses the differences between two forecasts at different ranges valid at the same time. In practice, however, the NMC method requires a factor to rescale the estimated error statistics, which is obtained empirically (Yang et al. 2006a). The two-sample Kalman filter removes this requirement, since due to its iterative update, the resulting covariance estimate is already optimal. 6.2.3 System transformation The main source of error in a storm surge forecasting system is the uncertain input wind forcing (Gerritsen et al. 1995). In our study, two wind fields are available. Each wind field is available in a finite dimension in contrast to the real atmospheric system which has infinite dimension. Only variability on spatial scales greater than a few grid cells can be reproduced reliably. Moreover, there are also errors due to parameterizations and omitted physics. As a result, there is always uncertainty in the wind field. For accounting this uncertainty, we consider the wind field here as deviated from the ’true’ wind field by a colored noise process, which is assumed to be governed by an AR(1) process. As we have seen in the previous section, the two-sample Kalman filter algorithm requires the system of interest to be driven by white noise process. This section describes how we transform 75 6.2 Approach 76 the system representation into the one driven by white noise process. Suppose that we have a dynamic system Φ driven by an uncertain input forcing u ˆ(tk ) ∈ q R as the following z(tk+1 ) = Φz(tk ) + Fˆ u(tk ) (6.10) ˆ (tk ) = u ¯ (tk ) + w(t ˆ k) u (6.11) where z(tk ) ∈ Rp denotes the system state at time tk and F is the p × q input forcing matrix. The uncertain input forcing u ˆ(tk ) is modelled as a stochastic process, which can ¯ (tk ) ∈ Rq and noise w(t ˆ k ) ∈ Rq terms. Both be written as the sum of the true forcing u ¯ (tk ) and w(t ˆ k ) are unknown. We suppose further that the noise vector w(t ˆ k ) in equation u (6.11) is governed by an AR(1) process ˆ k ) = Aw(t ˆ k−1 ) + ηˆ(tk ) w(t (6.12) where A is a time-invariant p×p diagonal matrix whose elements determine the correlationˆ ∈ Rp×p . time, and ηˆ(tk ) ∈ Rp is a white noise vector with constant covariance Q The objective of the transformation is to obtain another representation of the dynamics (6.10)-(6.11) such that it is driven by a white noise process. The standard approach of working with a system driven by colored-noise is to augment the state vector z(tk ) with ˆ k ). However, in this case it is not possible to use this approach directly, the noise vector w(t ˆ (tk ), without any access to the realizations of since we only have the realizations of u ˆ k ) separately from those of u ˆ (tk ). In the following, we propose a method to find an w(t ˆ k ) by utilizing two samples of u ˆ (tk ). These, in turn, will be used in approximate of w(t the state augmentation procedure. ˆ (tk ), namely Suppose we have two independent realizations of the input forcing u ˆ 1 (tk ) and u ˆ 2 (tk ). We introduce u 1 ˆ 2 (tk )) u(tk ) ≡ (ˆ u1 (tk ) + u 2 ˜ 1 (tk ) ≡ u ˆ 1 (tk ) − u(tk ) w ˜ 2 (tk ) ≡ u ˆ 2 (tk ) − u(tk ). w (6.13) (6.14) (6.15) ˜1 The term u represents the average of the two input realizations. On the other hand, w ˜ 2 can be shown to evolve in time according to and w ˜ 1 (tk ) = Aw ˜ 1 (tk−1 ) + η˜1 (tk−1 ) w (6.16) ˜ 2 (tk ) = Aw ˜ 2 (tk−1 ) + η˜2 (tk−1 ) w (6.17) ˜ 1 and w ˜ 2 as the transformed Due to the symmetry with noise process (6.12), we call w colored noise and η˜1 (tk ) and η˜2 (tk ) as the transformed white noise processes. The transformed white noise processes can be shown to be a combination of the unknown original ηˆ1 (tk ) and ηˆ2 (tk ) 1 η˜1 (tk ) = (ˆ η1 (tk ) − ηˆ2 (tk )) 2 1 η2 (tk ) − ηˆ1 (tk )) η˜2 (tk ) = (ˆ 2 (6.18) (6.19) Storm surge forecasting using a two-sample Kalman filter 77 Using these new variables, we define the transformed input forcings as ˜ 1 (tk ) = u(tk ) + w ˜ 1 (tk ) u (6.20) ˜ 2 (tk ) = u(tk ) + w ˜ 2 (tk ) u (6.21) Notice the symmetry between the transformed input forcing and the original one (6.11). We note here that with this transformation we actually retain the original input forcings. ˜ k ) separately. But now we have the components u(tk ) and w(t ˜ k ) are available sepSince with this transformation the realizations of u(tk ) and w(t arately, we can now use them in a state augmentation procedure. Denoting x(tk ) ≡ 0 Φ F F z(tk ) , η(tk ) ≡ ,M ≡ , and B ≡ , we can rewrite the ˜ k) w(t η˜(tk ) 0 A 0 dynamical system (6.10)-(6.11) as x(tk+1 ) = Mx(tk ) + Bu(tk ) + η(tk ) (6.22) Equation (6.22) is the transformed dynamical system representation, which is driven by white noise process. It is now similar to the system representation (6.1), on which the two-sample Kalman filter is based. 6.2.4 Summary The two-sample Kalman filter algorithm is used in this study in combination with the system transformation described previously. In short, here we describe the two-sample Kalman filter algorithm in combination with the system transformation above on a stepby-step basis: • Initialization: – compute and store input forcing average using equation (6.13): u(tk ), k = 1, ..., N – compute and store the approximate white noise processes using equation (6.14) and (6.16) as well as equation (6.15) and (6.17): η˜1 (tk ) and η˜2 (tk ), k = 1, ..., N – specify initial guess of Kalman gain K • Closed-loop step: – insert K into equation (6.2) – generate two realizations of Xf (tk ) of dynamics (6.1) using u(tk ), η˜1 (tk ) and η˜2 (tk ): xf1 (tk ) and xf2 (tk ), k = 1, ..., N – estimate Pf using equation (6.9) – estimate K using equation (6.5) 78 6.3 Experiment with a simple wave model – repeat closed-loop step until Pf , and therefore also K, have converged It is important to note here that under the transformation explained in Section (6.2??), we retain the original open-loop realizations. The closed-loop realizations, however, will be different from the original ones. Nevertheless, it can be shown that the difference between the two transformed samples are the same as the difference between the original ones. Therefore, using the two-sample Kalman filter, we will obtain the same estimate of forecast error covariance estimate and hence, the same Kalman gain estimate. 6.3 Experiment with a simple wave model Before we apply our method to the DCSM, we first tested if the combination between the sample transformation and two-sample Kalman filter will produce a correct solution. This check is performed by using twin experiment with a simple one-dimensional wave model. It is a simplified version of the De St. Venant equation, which can be used for modelling a tidal flow in a long and narrow estuary. The governing equations of the one-dimensional wave is as follows: ∂u ∂ξ +D =0 (6.23) ∂t ∂x ∂u ∂ξ cu +g + =0 (6.24) ∂t ∂x D ξ(x = 0, t) = ξb (t) (6.25) ξ(x, t = 0) = 0 (6.26) u(x, t = 0) = 0 (6.27) u(x = L, t) = 0 (6.28) where ξ(x, t) is the water level above the reference plane, u(x, t) is the average current velocity over a cross section, t is the time, x is the position along the estuary, D is the water depth, g is the gravitational acceleration and c is a friction constant. For this study, the system of equations (6.23)-(6.28) is solved numerically using a Crank-Nicholson scheme with implicity factor θ = 0.4 and the discretization in space is based on a staggered grid. In this experiment, the following values for the parameters are used: D = 10 meter, c = 0.0002s−1, L=60 km, time step ∆t=1 minutes, and number of grid points N = 80 for the spatial discretization, hence ∆x = NL−1 . Moreover, at the sea-estuary boundary we assume that the water level is periodic according to ξb (tk ) = 0.5 sin( 2π t ), where T =3 T k hours. The uncertainty of the model is assumed to be due to the inaccuracy of the set-up at the sea-estuary boundary. An additive colored noise modelled as an AR(1) process is used to represent this uncertainty: wb (tk ) = αb wb (tk−1 ) + ǫ(tk ) (6.29) where αb determines the correlation time, which in this experiment is set to 2 hours and ǫ is a Gaussian white noise process with standard deviation 20 cm. Storm surge forecasting using a two-sample Kalman filter The experiment setup is as follows. An artificial measurement of water level is generated at each time step, at location x = 0.3L. This is done by running the wave model and adding to the simulated water level corresponding to this location, at each time step, a random number drawn independently from a Normal distribution with zero mean and standard deviation 2 cm, to simulate observation noise. The artificial observation is generated for a 600 h simulation period. The discretized version of the system of equations above can be written in different ways. For our purpose, two model representations are considered. The first representation is where the noise process wb is available separately and augmented to the state vector. On the other hand, in the second representation, the noise process wb is considered to be added directly to the input forcing ξb , hence they are not available separately. The first representation is driven by white noise, while the second by colored noise. The two-sample Kalman filter algorithm is then implemented to each to these model representations. At each iteration, two independent samples of each representation are generated for the same simulation period. Since in the first representation, the model is driven by white noise, we can apply the two-sample Kalman filter algorithm directly. This will provide the ’true’ solution. We call the samples of the first representation as original samples. For the samples of the second representation, we apply the system transformation described in Section (6.2.3) prior to implementing the two-sample Kalman filter. Here we call them as the transformed samples. In this experiment, both original and transformed samples are generated using the same noise process realizations. Therefore, the solutions based on the original and transformed samples should be equal if the transformation is correct. The gain matrices computed in this experiment are depicted in Figure 6.1. Pictures on the left-hand side panel show the gain matrices associated with water level, while those on the right-hand side show the ones associated with depth-average current velocity. Visual inspection on these pictures indicates that at each iteration step the gain matrices obtained using both the original and the transformed samples are very similar to each other. The two-sample Kalman filter uses the difference between the two sample of each representation. Since the model is linear, the difference between the transformed samples will result in the same error propagation model, as obtained by the original samples. Hence, the solution computed using both the transformed and original samples are the same. Note that the gain estimates resulted from the open-loop step by using transformed samples is exactly equal with the ones from using original samples. However, there is a minor difference in the closed-loop solutions, which may be due to rounding error and the fact that we used different realizations of observation noise for all samples. This result confirms that the transformation is correct. To make the presentation complete, we compare the gain estimated by using our algorithm to the one obtained by using the RRSQRT Kalman filter (Verlaan and Heemink, 1997) as shown in Figure 6.1e. The gain is obtained by running the RRSQRT Kalman filter until it reaches the steady state. The RRSQRT Kalman filter in this experiment uses 50 modes, which is sufficient to estimate the steady-state Kalman gain accurately. It can be 79 6.3 Experiment with a simple wave model 80 K−wl K−u 1 1 (a) 0.5 0.5 0 0 0 20 40 60 1 1 (b) 0.5 0.5 0 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 0 20 40 60 1 1 (c) 0.5 0.5 0 0 0 20 40 60 1 1 (d) 0.5 0.5 0 0 0 20 40 60 1 1 (e) 0.5 0.5 0 0 0 20 40 x [km] 60 x [km] Figure 6.1: Estimated gain of water-level ξ (left-hand panels) and velocity u (right-hand panels): (a)open-loop, (b)-(d)closed-loop estimates, 1st-3rd iteration steps, (e) RRSQRT with 50 modes. The full-line represents the gain computed by using the transformed samples, the dashed-line using the original samples, and the vertical line shows the location of the assimilation station. seen here that the solutions of the two-sample Kalman filter converge to the ones obtained by using the RRSQRT Kalman filter. This shows that for this model, using the transformed sample and the algorithm proposed in this paper, we can reproduce the optimal gain estimated by using the steady-state Kalman filter. Storm surge forecasting using a two-sample Kalman filter 6.4 Experiments with the Dutch Continental Shelf Model 6.4.1 Introduction In this section we present and discuss some experiments using the DCSM (Gerritsen et al. 1995). We first describe briefly the DCSM and the two wind fields used in this study. The exposition continues with presenting some estimates of the wind noise statistics. The solution of the two-sample Kalman filter is given afterwards. We implemented the resulting Kalman gain estimates in a data assimilation system. The performance of the data assimilation system is evaluated. The ultimate goal of the DCSM is to make storm surge forecast. Hence, we also show and discuss the forecast performance of DCSM in this section. In the end, a comparison is made between the performance of our system to the one based on the assumption that the system noise is isotropic. 6.4.2 The Dutch Continental Shelf Model The DCSM is a large-scale water level forecast and storm surge forecasting model that encloses the North West European shelf, including the British isles (Figure 3.1). It was developed as a joint activity of the National Institute of Coastal and Marine Management (RIKZ), Delft Hydraulics and KNMI. This model is used to produce six-hourly surge forecasting in an automatic prodction line at KNMI. The error in DCSM is assumed to come from the uncertain wind input. To account for this, an error term should be added to the governing equations. There are three possible ways of doing this. We may try to add an error term to the wind velocity, to the wind stress or to the depth-average water current. The operational system represents this uncertainty as an error term added to the depth-average water current. The temporal correlation of the wind noise is accounted for by using an AR(1) model. Moreover, the noise is assumed to be isotropic, that is the spatial correlation length is independent from the flow direction. In this study, AR(1) model is also used to represent the time correlation of the wind noise. However, we model the uncertain wind input as an error term added to the wind stress. Moreover, we do not assume anything about the structure of the spatial correlation. We do assume, however, that the difference between the two wind fields described below are characteristics of the true error. This assumption is validated in the next subsection. 6.4.3 Wind stress error statistics The operational DCSM uses the forecasts of the Royal Netherlands Meteorological Institute (KNMI) high-resolution limited area model (HIRLAM) as input forcing. The HIRLAM system is a complete numerical weather prediction system including data assimilation with analysis of conventional and non-conventional observations and a limited 81 6.4 Experiments with the Dutch Continental Shelf Model 82 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 UKMO UKMO area forecasting model with a comprehensive set of physical parameterization. The system is owned and developed jointly by the National Meteorological Services in Denmark, Finland, Iceland, Ireland, Netherlands, Norway, Spain and Sweden. The complete scientific documentation of HIRLAM system can be downloaded from http://hirlam.org. The Dutch version of HIRLAM used in this study covers an area between 12o W to 13oE and 48oN to 62.33oN. This wind product was used to calibrate the DCSM (Gerritsen et al. 1995). In fact, this is a dedicated version of HIRLAM used for driving the DCSM. Another wind field available through KNMI for our study as input to DCSM is the one produced by the UK Met Office (UKMO) North Atlantic and European model. This is an operational forecast model in the 6- to 48-hour period. It covers an area between 70o W to 35oE and 32.5oN to 75o N, with grid size 0.11o × 0.11o . The wind data is projected into the DCSM area by using bilinear interpolation. In this study we used the wind fields from both sources and applied them to the algorithm explained in Section (6.2.2). Both wind fields are available within the period of 01 November 2004 until 28 January 2005 with a time step of three hours. An inspection on the two wind fields indicates that the magnitude of the UKMO wind stress tend to be smaller than that of HIRLAM. Figure (6.2) illustrates this condition. The picture on the left hand side shows a typical scattered plot of the wind stress of the two wind fields at a location above the sea area of DCSM. This scatter plot clearly shows that there is a systematic difference between UKMO and HIRLAM wind stresses. 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 −1 −1 −0.5 0 HiRLAM 0.5 1 −1 −1 −0.5 0 HiRLAM 0.5 1 Figure 6.2: Scattered-plot of wind stress HiRLAM vs UKMO at Hoek van Holland, (left) original and (right) after correction. Full line shows best fit of the dots, while dashed line gives reference to perfect positive correlation. Before the two wind fields can be used in the experiment, we need to make some corrections to reduce the systematic difference between the two wind fields. In our study, Storm surge forecasting using a two-sample Kalman filter 83 this is done by adjusting the UKMO wind stress to have the same scale with that of HIRLAM. The reason is because HIRLAM wind field was used in the calibration of the DCSM, hence it is expected to give better results. The adjustment of the UKMO wind stress is performed by using linear regression and the correction is imposed to wind stress data at all grid points on the wet area of DCSM. The picture on the right hand side in Figure (6.2) presents the scattered plot of wind stress at the same location as on the left hand side after correction. Figure (6.3) presents the contour of the slope of the line of best fit used for correcting the UKMO wind stresses. The contour of the slope, both in zonal and meridional directions, shows that the biggest slope and the steepest transition reside around the land-sea boundaries. This may be due to the difference in schemes used for interpolating the wind fields into the grid points of the DCSM. The UKMO wind fields are available in a coarser grid and in our study the interpolation is done without taking the land-sea difference into consideration. On the other hand, for HIRLAM wind fields, the interpolation near the coasts has been modified to use only sea points of the atmospheric model for sea points in the storm surge model (Gerritsen et al. 1995; Verlaan et al. 2005). 62 1.2 1.2 1.2 1 1.2 2 1.6 8 2 1. 1.6 1.61.6 1.8 4 48 1.4 1.2 2 1.8 1.8 1.2 1.4 4 1. 4 1. 1.4 1. 1.8 1.6 1 1.8 1.6 2.2 .8 1.2 2.2 1.6 Latitude [deg] 8 1. 1.4 2 1.8 1.4 Latitude [deg] 1.4 .6 1.4 1 1.2 6 50 1.2 1.41.6 1.8 1.8 2 1. 4 2 2. 1.8 2 .2 1.2 .6 22 1 2 .4 1.6 1.4 1.2 1.6 1.4 1.8 2 1.6 1.6 1.2 1.8 1.6 1.4 2 2.2 1.8 1.6 1.6 1.2 1.2 52 1.4 1.6 11.6.8 8 2.4 1.2 1.4 54 1.4 1.6 4 6 1. 1.2 56 1. 1.6 2.2 50 1.61.8 1. 8 1. 1.41.6 1.8 1.6 1.8 1.2 1.4 54 1.6 2 1.8 1.2 1.4 1.6 0.8 1.6 2 2.4 1.4 1.6 2 1.4 52 1.4 1.2 1.4 58 1.8 1 1.6 1.8 2.6 2.6 2.2 1.4 1.6 1.4 1 2.6 .8 1 1.4 1.6 1.8 2 2.2 1.4 58 56 1.2 0.8 60 1.8 1.6 1.8 1.6 1.2 1.4 60 1. 1.4 1.4 62 48 −10 −5 0 Longitude [deg] (a) 5 10 −10 −5 0 Longitude [deg] 5 10 (b) Figure 6.3: Contours of slope for (a) u- and (b) v-direction respectively used for correcting UKMO wind stresses. Contour interval is 0.2. In this study, each of the two wind fields are assumed to deviate from the true one by a time-correlated noise term as in equation (6.11). Using HIRLAM and corrected UKMO wind fields, the average wind field is computed using (6.13) and the approximate of colored noise realization is computed using equation (6.14) and (6.15). Figure (6.4) presents some typical autocorrelation plots of the computed colored noise obtained in this experiment. By curve fitting to a theoretical exponential autocorrelation function, the correlation time of the noise is found to be about 5 hours. 6.4 Experiments with the Dutch Continental Shelf Model 1 1 0.8 0.8 0.6 0.6 Autocorrelation Autocorrelation 84 0.4 0.4 0.2 0.2 0 0 −5 −4 −3 −2 −1 0 Time [day] (a) 1 2 3 4 5 −5 −4 −3 −2 −1 0 Time [day] 1 2 3 4 5 (b) Figure 6.4: Autocorrelation function of the wind noise at Hoek van Holland, (a) udirection and (b) v-direction. The full-line is the empirical autocorrelation function obtained from the difference between UKMO and HiRLAM, while the dashed-line is autocorrelation function of a theoretical AR(1) process with correlation-time about 5 hours. Figure (6.5a)-(6.5b) show the estimates of the spatial correlation of wind stress error in zonal and meridional directions, respectively, associated with the point located at Hoek van Holland. The spatial correlation varies smoothly over the model domain. Moreover, the structure shows a local characteristic, where the highest correlation is at the point of reference and it decreases for the locations further away. We also observe that the correlation profile of the zonal direction tends to stretch in the east-west direction, while that of the meridional direction spread more in south-north direction. This shows that the noise has different characteristics from the ones used in the operational system, where the spatial correlation is assumed to be isotropic. We also show in Figure (6.5c)-(6.5d) the estimates of spatial cross-correlation of wind stress error in the two directions, with respect to the error at Hoek van Holland. The figures show that the cross correlations are effectively zero everywhere. This result validates the assumption that wind stress error in the two directions are uncorrelated of each other. We have tried also to validate the correlation structure against observation. A set of observed 10 m wind speed data is available in our study to validate the correlation structure. Figure (6.6) depicts the location of the wind observing stations. The wind data is available in the period of November 2004 until January 2005, with a time interval of 1 h. To match with the wind data of UKMO and HIRLAM, which are available with time interval of 3 h, we smoothen the observed wind data by using running average technique, with a time window of 6 h. In performing the validation, we assume that observed data Storm surge forecasting using a two-sample Kalman filter 62 85 62 0 0 0 0 −0.1 −0.1 −0.1 60 0 0 −0.1 0. 1 0 1 0.2 0 .1 0 0 0 −0 50 0.1 0 0.1 0.6 0.9 0.4 0.7 0 −0 .1 0.2 50 5 0. 0.8 .1 0. 0. 1 0.7 0.4 −0 0.3 0 52 0.6 52 0.1 0.2 0.3 0.4 0.5 0.8 0.9 0 0 0.1 0.2 0 0 0 0.1 0.1 1 −0. 54 0 0.2 0 0 0 0 0.2 0 54 0 0 56 0.3 0 0.1 .1 0.2 0 0 Latitude [deg] 0 0.1 −0 0 0 0.2 56 .1 58 1 0. 0.1 0 −0.1 −0.1 Latitude [deg] −0 0 58 −0.1 0 −0.1 −0.1 60 0 0 0 0 48 0 48 −10 −5 0 Longitude [deg] 5 10 −10 −5 0 Longitude [deg] (a) 5 10 (b) 62 62 0 0 0 0.1 −0 0 .1 0 0 .1 0 0 −0.1 0 0.1 0 −0.1 0.1 0 0 0 0 0 0 0.1 1 0. 52 0 0 0 52 54 0 0 0 0.2 0.1 0 54 −0.1 0 0.1 −0.1 −0.1 Latitude [deg] 0 0.1 0 0 56 0.1 56 0 Latitude [deg] 1 0. 0.1 58 0 0 −0 0 0.1 0 0 0.1 .1 0 −0 0 0 −0.1 −0.1 .1 −0 0.1 58 60 .1 0 −0 60 0 0 0.1 50 0.2 0 50 48 48 −10 −5 0 Longitude [deg] (c) 0.1 − 0 0 1 0. 0 0 5 10 0 −10 −5 0 Longitude [deg] 5 10 (d) Figure 6.5: Spatial correlation of the wind noise associated with Hoek van Holland, (a) u-direction and (b) v-direction, and spatial cross-correlation (c) vuref and (d) uvref . Contour interval is 0.1. is generated from the same distribution from where UKMO and HIRLAM wind data are drawn. Therefore, we can compute the observed correlation in the same way as used to compute the correlation from the UKMO-HIRLAM difference. An ideal validation of wind error correlation would be to compare the correlation structure estimate to those obtained from observation with the technique described by e.g. Hollingsworth and L¨onnberg (1986), Daley (1991) and Xu and Wei (2001). However, it requires observation data from a lot of different locations. The fact that we only have observed wind data from a few locations prevents us from using this technique. Therefore, we set the objective of the validation here as to see if the resulting correlation structure is 6.4 Experiments with the Dutch Continental Shelf Model 86 54 3 1 o latitude ( N) 2 10 9 11 6 8 7 5 4 52 50 0 5 longitude (o W/E) Figure 6.6: Locations of wind observing stations: 1. K13, 2. Meetpost Noordwijk, 3. Huibertgat, 4. Cadzand, 5. Vlissingen, 6. Oosterschelde, 7. Vlakte van de Raan, 8. Schaar, 9. L.E.Goeree, 10. Europlatform, 11. Hoek van Holland. better supported by the available data than the one based on the isotropic assumption as used in the operational system. As pointed earlier, unlike the one based on isotropic assumption, the spatial correlation structure estimated by using UKMO-HIRLAM difference in zonal direction is different from the one in meridional direction. In this validation, we would like to check if the available data supports this. For producing a clear illustration, we should choose a reference station, whose relative locations with respect to other stations is such that the error correlation in one direction is significantly different from the one in other direction. Based on this idea, we select station K13 as the reference station for computing correlation of wind stress error at different locations. Figure (6.7) shows the spatial correlation estimates computed from observation-HIRLAM difference, UKMO-HIRLAM difference, and from Storm surge forecasting using a two-sample Kalman filter a system based on an isotropic assumption. For the isotropic assumption based system, the correlation length is set to 19 computational grid cells. In this pictures, the correlation of wind stress error is represented as vector composed by the correlation in eastward and northward directions. 87 6.4 Experiments with the Dutch Continental Shelf Model 88 K13 K13 54 52.2 latitude (o N) latitude (o N) 53 52 51 OBS UKMHIR ISO OBS UKMHIR ISO 50 3 4 5 o longitude ( W/E) 6 51.2 3.2 7 (a) 4.2 o longitude ( W/E) (b) K13 K13 54 52.2 latitude (o N) latitude (o N) 53 52 51 OBS UKMHIR OBS UKMHIR 50 3 4 5 o longitude ( W/E) (c) 6 7 51.2 3.2 4.2 o longitude ( W/E) (d) Figure 6.7: Correlation of wind stress error with respect to wind stress error at stations K13: (a) uu+vv spatial auto-correlation, (b) zoomed version of (a), (c) vuref +uvref spatial cross-correlation, (d) zoomed version of (c). Each vector represent the resultant of correlation vector in u− and v−directions, where the u component is defined as eastward direction and v component is the northward direction. The north-eastward vector around the below-right corner of each panel shows the vector of perfect correlation, i.e. the correlation in both u- and v -directions is equal to one. Figure (6.7a)-(6.7b) show the spatial auto correlation of the wind stress error. The figure clearly shows that, at most locations, the observed correlation in the meridional direction is larger than in the zonal one. An exception is the correlation with the error at station Huibertgat, where the correlation in zonal direction is larger than meridional. This Storm surge forecasting using a two-sample Kalman filter may be due to the relative location of Huibergat with respect to K13. Nevertheless, the figures show that wind stress error spatial correlation is indeed direction dependent. As can be seen from the figures, this feature is better represented by the correlation structure computed from UKMO-HIRLAM difference than the one based on isotropic assumption. Figure (6.7c)-(6.7d) show the spatial cross correlation of the wind stress error. Here we show only those, which correspond to the ones computed from observation-HIRLAM and UKMO-HIRLAM differences. The cross-correlation based on isotropic assumption is equal to zero. From these figures we see that the observed correlation is small, although they are non-zero at most of the locations. Note that at the reference location (K13), the cross-correlation is very close to zero. The same characteristic can be observed for the ones computed from UKMO-HIRLAM difference, but with relatively smaller magnitude. We have not yet understood why the cross-correlation at other locations is non-zero. This may be due to the fact that we use a limited sample size for estimating a small correlation. Nevertheless, since the cross correlation is practically small, although they are non-zero, we may expect that this will not give a deteriorating effect on the resulting analysis system, which is based on the assumption that the cross correlation of the wind stress error is zero. 6.4.4 Water level observation data The observation data available for data assimilation experiments in this study is the observed water level obtained from sixteen observing stations located along the British and Dutch coast as shown in Figure (3.1). Eight of these stations are used as assimilation stations; these are the stations where observed data is assimilated into the DCSM: five stations are located along the east coast of the UK and three others are along the Dutch coast. These are the same observation stations used in the operational data assimilation system. The other stations are used as validation stations, where observed data are used only for validation of the model results but not used in data assimilation. The water level at all of the locations is measured every 10 minutes. The observed water level data from all assimilation stations are used both for computing the steady-state Kalman gain using the two-sample Kalman filter as well as for data assimilation experiments. The water level data contains two components: surge and astronomical tide. In the operational system, the astronomical tide is predicted using harmonic analysis at each observation location using hundreds of tidal constituents. The harmonic analysis prediction for astronomical tide is more accurate than using the DCSM. Since the data assimilation is intended for correcting the storm surge error, in the operational system the astronomical tide component of the observed data is replaced by the one predicted using the DCSM, obtained by running the DCSM without any wind input forcing (Gerritsen et al. 1995). In this study, this adjusted observation is used. 89 90 6.4 Experiments with the Dutch Continental Shelf Model 6.4.5 Computing Kalman gain using the two-sample Kalman filter The DCSM and wind data described previously have been used to compute the estimate of Kalman gain using the approach described in Section (6.2). The experiment setup in this study is as follows. At each iteration step, the model is run twice for the same simulation period. In each run, the DCSM is driven by one of the two wind fields. Independent random numbers drawn from a Gaussian white noise with standard deviation σobs 10 cm is added at each time step to the observed water level at each assimilation station to represent observation noise. The simulation is run for the period of 01 November 2004 until 28 January 2005. To eliminate the spin-up effect, the first four days simulation results are excluded in computing the covariance. Figure (6.8) presents the contour plots of the water-level gain at each iteration step associated with assimilation station Hoek van Holland. The pictures show that the estimate of the gain profile is smooth and confined to a limited area around the assimilation station. Near the assimilation station the gain has the biggest value and it decreases over distance from the assimilation station. Moreover, the gain at locations far from the assimilation station is practically zero. This means that correction associated with certain assimilation station is propagated only to the nearby locations and that no correction will be imposed to the locations further away from the assimilation station. The pictures show also that before convergence, the gain computed at one iteration step becomes smoother and more localized than the ones computed at the previous steps. The open-loop gain has the biggest magnitude compared to the ones computed in the closed-loop steps. In the open-loop run, the difference between the two samples is the biggest. On the other hand, the inclusion of observation data in the closed-loop runs has brought both samples closer to the data and hence, to each other. This in turn will reduce the covariance of the difference between the two samples and hence the gain will become smaller. The gain profile does not change significantly starting from the third closed-loop step. It indicates that the filter converges. The structure of the Kalman gain with respect to other variables are more complex than the ones for water-level (not shown). This is due to the fact that the other variables are related nonlinearly to the water level at assimilation stations through the nonlinear model dynamics. We observed that the elements of Kalman gain associated with flow velocities u and v are confined to an area around respective assimilation stations more limited than the ones associated with water level. The biggest change in the Kalman gain estimate occurs from the open-loop to the first closed-loop iteration. At the next iterations, only slight difference in the solution can be noticed. The structure of the gain remains the same from one iteration to the other. Only the magnitude changes with iteration. The structure of the elements of the Kalman gain estimates associated with the noise term has also a smooth structure around the respective assimilation stations (not shown). Starting from the first closed-loop iteration, the structure of the elements of the Kalman gain associated with the noise terms remains, but the magnitude slightly increases. This indicates that for producing better model results, more correction on the erroneous wind Storm surge forecasting using a two-sample Kalman filter 62 91 62 0 0 60 60 0 0 0 0 58 0 01 0. 54 0 0.02 0 0.01 2 0.0 04 0. 0.03 0. 07 52 0 0 0.06 0. 04 0 05 0. 0 54 0.03 52 0 0 56 5 Latitude [deg] 0 0 0 Latitude [deg] 0 0 56 0.0 58 0 0 50 50 0 0 0 0 48 48 −10 −5 0 Longitude [deg] 5 10 −10 −5 0 Longitude [deg] (a) 5 10 5 10 5 10 (b) 62 62 60 60 0 0 0 0 58 58 0 Latitude [deg] 0 54 0.01 2 0.0 56 0 0 54 0.01 03 02 0. .03 0 0.04 50 4 52 0. 0.0 52 0 0 56 0 Latitude [deg] 0 0 50 0 0 0 0 48 48 −10 −5 0 Longitude [deg] 5 10 −10 −5 0 Longitude [deg] (c) (d) 62 62 0 60 0 0 60 0 58 0 58 0 0 0 0 Latitude [deg] 0 0 54 56 0 Latitude [deg] 0 56 54 0.01 0.01 0.0.03 4 02 0. 0 0.0.03 4 02 0. 52 0 52 50 50 0 0 48 0 0 48 −10 −5 0 Longitude [deg] (e) 5 10 −10 −5 0 Longitude [deg] (f) Figure 6.8: Water-level gain for assimilation station Hoek van Holland: (a) Open-loop estimate, (b)-(f) Closed-loop estimates, 1st-5th iteration steps. Contour interval is 0.01. 6.4 Experiments with the Dutch Continental Shelf Model 92 input is needed. The magnitude of the increment itself decreases with iteration, indicating that the iteration is convergent from below. However, it also shows significantly non-zero values at locations further away from the assimilation stations. It is not clear if it is the true structure or if it is a spurious correlation due to sampling error, which is a common problem in an ensemble-based data assimilation system. Some occurrence may be related to spurious correlation, for example the anomalously big values at those location around land-sea area. However, despite of this, in this experiment there is no indication found that the iteration will blow up. Moreover, our previous work (Sumihar et al. 2008), which used the same length of samples as used here, showed no indication of any serious spurious correlation problem. Hence, for this study we retain the estimates of Kalman gain as they are and use them in the subsequent data assimilation and forecast experiments. 6.4.6 Hindcast experiment We have computed the estimates of Kalman gain using the two-sample Kalman filter. We proceed now with implementing the Kalman gain for data assimilation. For this experiment, we use the HiRLAM wind fields as input forcing to the DCSM. As the first test of the performance of the data assimilation system, we use the water level data from the same period as is used for computing the Kalman gain. The observation data is assimilated every time step (10 min). For this study, the only objective information available for evaluation is the observed water-level data. The test is performed by comparing the resulting analyses to the observed data obtained at all available locations. The performance of the data assimilation system is evaluated in term of the rms of water level residual computed over the whole period of the experiment. The results for the assimilation stations are shown in Figure 6.9. This figure shows the rms of water-level residual of the analyses obtained by data assimilation using the Kalman gain estimated at each iteration. Shown also in this figure are the results of deterministic forecast, which are obtained without data assimilation. Here we see that the data assimilation system has successfully brought the model closer to the observed data. It is found that the data assimilation has reduced the water-level residual by approximately 40%-70% depending on the locations. For most of the stations, the rms water-level residual obtained by data assimilation using the open-loop gain is found to be the smallest. This is because at open-loop computation, the two realizations of the storm surge system are still relatively very different from each other. Therefore, the variance of their difference is large and so are the value of the gain matrix elements corresponding to these locations. On the other hand, in the closed-loop computations, since observations are assimilated, the two realizations become similar and the variance decrease. As the consequence, the open-loop gain pulls the model towards the observations more than the closed-loop ones. The rms increases at the next iterations and converges after the 3rd closed-loop step. Figure 6.10 shows the results for the validation stations. Here we see that the data assimilation has steered the model closer to the observed data, even at the locations where data is not used in the assimilation. At these locations, the data assimilation has reduced Storm surge forecasting using a two-sample Kalman filter 93 20 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 18 RMS water−level residual [cm] 16 14 12 10 8 6 4 2 0 wick nort lows sher dovr vlis hoek denh Figure 6.9: RMS analysis residual of water level at assimilation stations, with σobs = 0.10 m. the water-level residual by approximately 30%-70%. The fact that data assimilation has improved the model results at validation stations shows that the correction imposed by the steady-state Kalman filter are not limited only to the assimilation stations, but also to other locations. This may indicate that the forecast error covariance matrix computed here has a consistent structure so that it propagates the correction to the other part of the model area properly. 6.4.7 Forecast experiment The DCSM is intended for making forecast of storm surge. It is therefore important to test the performance of the data assimilation system by checking the resulting forecast quality. In fact, the forecast residual is a more important measure of the quality of the data assimilation system than is the fit to data in the analysis itself. It is a direct measure against independent observation data. Here, the data assimilation system is used to initialize the DCSM before generating forecast. The model is initialized by driving the DCSM with HIRLAM analyses wind and assimilating the water level data continuously every time step for the whole period of the experiment. The forecast experiment is performed by using HIRLAM forecast wind from the period of September 2007 until January 2008. HIRLAM forecast wind is available every 6 hours and each forecast extends to a period of 48 hours. Considering some missing wind data, in total we have 600 forecast runs. 6.4 Experiments with the Dutch Continental Shelf Model 94 30 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop RMS water−level residual [cm] 25 20 15 10 5 0 abdn eurp cadz room ijmd harl huib delf Figure 6.10: RMS analysis residual of water level at validation stations, with σobs = 0.10 m. Figure 6.11 shows the rms of forecast water-level residual versus forecast time, for the observing station located at Hoek van Holland. The results shown here are obtained by applying a 6 h running average on the time series of the rms residual. In this picture, we show the forecast performance initialized by analyses using Kalman gain obtained from each iteration of the two-sample Kalman filter. For comparison, we show also the deterministic forecast, which is obtained without data assimilation. The forecast performance in general decreases in forecast time. However, an obvious improvement due to data assimilation can be seen for the forecast horizon up to 15 hours. The performance becomes worse than the deterministic forecast at forecast time 15-24 hours. The performance with data assimilation is expected to converge to the deterministic forecast if we extend the forecast time further. The degrading performance after 15 hours can be explained as follows. The data assimilation system is supposed to correct the error due to the uncertain wind input. However, there is also another source of error, namely the surge from outside the model area. The data assimilation may have incorrectly updated the model state in this area where there is no assimilation station. When this incorrectly updated wave arrives at the forecast station, it degrades the forecast performance (Gerritsen et al. 1995). The performance becomes worse than the deterministic forecast after this time. After sometime, however, the effect of initial condition vanishes and the performance is expected to converge to the deterministic one. We also observe in Figure 6.11 that the forecast performance is improved from one Storm surge forecasting using a two-sample Kalman filter 95 hoekvhld RMS water level residual [m] 0.2 0.15 0.1 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 0.05 0 0 6 12 18 24 30 Forecast horizon [h] 36 42 48 Figure 6.11: Forecast performance at Hoek van Holland. iteration to the next ones. Using the open-loop gain, although the rms residual at initial condition is smaller than the ones using the gains from latter iterations, the forecast performance deteriorates quickly in forecast time. As a result, the running average of the rms residual produces bigger values than that of the latter iterations. Largest improvement can be seen to occur at the first two iterations. However, the difference is no longer significant after 2nd closed-loop iteration. Figure 6.12 shows the forecast performance at one of the validation stations, where data is not used for assimilation. Here we see that in general it has similar pattern like the one at Hoek van Holland. This confirms once again that the data assimilation using our approach above has successfully improved the short term forecast of the DCSM, at both assimilation and validation stations. The time interval where the forecast performance is improved over deterministic forecast is limited by the physics of wave propagation in the model area and dependent on the locations. In principle, it is determined by the time required by a gravity wave to travel from the open ocean to the location of interest. For the stations along the Dutch coast, the travel time is about 12 hours. For the operational system, in practice, one uses the forecast with data assimilation up to 15 hours and use the deterministic forecast afterwards (Gerritsen et al. 1995). It should be noted here that this limiting characteristic time is actually not prohibitive for the operational purposes. The forecast system is used, for example, to give accurate forecasts at least 6 hours ahead for the proper closure of the moveable 6.4 Experiments with the Dutch Continental Shelf Model 96 ijmdbthvn RMS water level residual [m] 0.2 0.15 0.1 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 0.05 0 0 6 12 18 24 30 Forecast horizon [h] 36 42 48 Figure 6.12: Forecast performance at IJmuiden. storm surge barriers in the Eastern Scheldt and the New Waterway. It is also used as an aid to decide whether a warning should be issued, while warnings are issued at least 6 hour ahead to provide time for the necessary preparations (Verlaan et al. 2005). 6.4.8 Comparison with isotropic assumption based steady-state Kalman filter In this section, we compare the performance of the data assimilation above (hereafter, UKMHIR) to the one based on an isotropic assumption (hereafter, ISO). For a fair comparison, the statistics of the model error for the ISO system are set to resemble those obtained by using HIRLAM and UKMO wind stress difference, except now the error spatial correlation is set to be isotropic. Here, the spatial correlation length is set to 19 computational grid size (about 150 km) for both zonal and meridional directions. The standard deviation of the error is set to be spatially homogeneous and equal to the spatial average of the ones obtained by using HiRLAM and UKMO wind fields difference. Like the UKMHIR, the steady-state Kalman gain for the ISO system is obtained by using the two-sample Kalman filter as well. At each iteration, the DCSM is run twice for the same period and at each run the model is perturbed by independent random numbers drawn from a Normal distribution having the statistics specified above. The random field is generated by using covariance matrix decomposition technique (e.g. Fenton and Grif- Storm surge forecasting using a two-sample Kalman filter fiths 2007). Due to computational resource limitation, the random field is generated on a coarser grid, which is five times the computational grid. A test has been performed by estimating the spatial correlation from the resulting random field and confirmed that it has the desired statistics. Note that in the operational system, the random field is defined on a coarser grid, which is 28 times as large as the computational grid. To simulate the observation error, independent random numbers are added to the observed water level with the same statistics as used in the UKMHIR. Like the previous experiment, the two-sample Kalman filter is run until the fifth closed-loop step. It is found that the iteration converges after the third loop. For this comparison, the gain estimate obtained at the fifth closed-loop iteration from both systems are implemented for data assimilation. The comparison is performed by evaluating the forecast quality of the DCSM, initialized by the UKMHIR and ISO data assimilation systems. The evaluation is performed by using the observed data from the same period as used in the experiment described in Section (6.4.7). In this way, we ensure that the data used for comparison is independent from those used to compute the Kalman gain. We found that in general, the UKMHIR performs better than the ISO for short term forecast. Figure 6.13 shows a result, which illustrates this. The forecast based on UKMHIR system is better than ISO for the short run forecast up to 15 hours. Afterwards, it becomes worse up to 24 hours and then converge to the one of ISO. The overshoot in the period between 15-24 hours of the UKMHIR is bigger than that of ISO. This may indicate that the UKMHIR data assimilation system performs improper adjustment on the model variables further away from the assimilation stations worse than the ISO. As mentioned earlier, the performance degradation after 15 hours is likely to be caused by unnatural correction of the external surge by the data assimilation. It seems that the unnatural correction imposed by the UKMHIR is bigger than by the ISO. Nevertheless, as far as the operational purposes are concerned, this overshoot should not really be an issue. Moreover, the effect of initial condition vanishes after sometime and the performance of the two systems converges to each other afterwards. 6.4.9 Tuning of observation error statistics A question occurred to us in the course of the research process whether the specified observation error variance is already optimal. To check this, we evaluate the statistics of the innovation realizations (Maybeck 1979). Ideally, when a Kalman filter has been well tuned, the number of innovation realizations lying outside the theoretical +/- 1 σ should be about 32%. Figure 6.14a illustrates the innovation sequence associated with the setup with observation error standard deviation σobs =10 cm. In this figure, we see that with σobs =10 cm, only a little fraction of the innovation sequence lies outside the +/- 1 σ band. This indicates that the specified observation error variance is too large and the filter setup requires additional tuning. As a follow up to this finding, we have set the observation error 97 6.4 Experiments with the Dutch Continental Shelf Model 98 hoekvhld RMS water level residual [m] 0.2 0.15 0.1 0.05 Deterministic UKMO−HIRLAM ISOTROPIC 0 0 6 12 18 24 30 Forecast horizon [h] 36 42 48 Figure 6.13: Forecast performance at Hoek van Holland, initialized by data assimilation systems based on UKMO-HiRLAM difference and isotropic assumption. For each system, the steady-state Kalman gain is obtained from the 5th closed-loop iteration of the twosample Kalman filter. Shown also for comparison is the deterministic forecast without data assimilation. variance to σobs =5 cm and checked the statistics of the innovation realizations as shown in Figure 6.14b. Here, we see that the fraction of the innovation sequence lying outside the +/- 1 σ is closer to the ideal one. This result indicates that we may still retune the observation error variance even smaller to obtain an optimal setup. We have also performed additional hindcast and forecast experiments as above, with observation error standard deviation σobs =5 cm. Here, the computation of the Kalman gain estimate as well as the hindcast experiment are performed in the period of November 2004 until January 2005. The rms residual obtained from the hindcast experiment at assimilation and validation stations are shown in Figure 6.15 and 6.16, respectively. Here, we see that the two-sample Kalman filter is also convergent and the water level residual converges to some values smaller than those obtained using σobs =10 cm, as shown in Figure 6.9 and 6.10. However, it produces larger rms water level residual in the first few iterations, as can be seen at station Aberdeen, Harlingen and Delfzijl. This may be due to fact that, because more weight is now given to observation, correction imposed on these locations is larger, while the gain estimates are not yet close to optimal values. Forecast experiment is performed in the period of August 2007 until January 2008, Storm surge forecasting using a two-sample Kalman filter 0.15 99 residual std Percentage outside one−sigma: 11.8714 Water level residual [m] 0.1 0.05 0 −0.05 −0.1 −0.15 30 31 32 33 34 35 36 37 Number of days since 01 November 2004 38 39 40 (a) 0.15 residual std Percentage outside one−sigma: 26.7634 Water level residual [m] 0.1 0.05 0 −0.05 −0.1 −0.15 30 31 32 33 34 35 36 37 Number of days since 01 November 2004 38 39 40 (b) Figure 6.14: Sequence of water level innovation and the +/- 1σ interval for Hoek van Holland with (a) σobs = 0.10 m and (b) σobs = 0.05 m. Data assimilation is performed by using Kalman gains from the 5th closed-loop iterations, computed for each setup with different observation noise standard deviation, σobs . which is different from the period used in preparing the Kalman gain and hindcast experiments. This gives the opportunity to validate the forecast system against data which is independent from the ones used to prepare the Kalman gain. Figure 6.17 shows the rms of forecast water level residual versus forecast horizon. Comparing this result to the one obtained using σobs =10 cm as shown in Figure 6.11, we see that here the performance difference between data assimilation systems using the gain estimate from the first few iterations is more significant. Together with the hindcast results, it indicates that using a smaller observation error standard deviation, the two-sample algorithm requires more 6.4 Experiments with the Dutch Continental Shelf Model 100 20 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 18 RMS water−level residual [cm] 16 14 12 10 8 6 4 2 0 wick nort lows sher dovr vlis hoek denh Figure 6.15: RMS analysis residual of water level at assimilation stations, with σobs = 0.05 m. 30 RMS water−level residual [cm] 25 20 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 15 10 5 0 abdn eurp cadz room ijmd harl huib delf Figure 6.16: RMS analysis residual of water level at validation stations, with σobs = 0.05 m. iterations to give a satisfactory results. It also produces a slightly larger overshoot at forecast time around 15 h. This may be due to a larger unnatural correction imposed on the area near the open ocean, because the data assimilation system puts more weight to the observation. As a result, it deteriorates the forecast at this time more than the one with σobs =10 cm. Storm surge forecasting using a two-sample Kalman filter 101 hoekvhld RMS water level residual [m] 0.2 0.15 0.1 Deterministic Open−loop 1st Closed−loop 2nd Closed−loop 3rd Closed−loop 4th Closed−loop 5th Closed−loop 0.05 0 0 6 12 18 24 30 Forecast horizon [h] 36 42 48 Figure 6.17: Forecast performance at Hoek van Holland, with σobs = 5 cm. We conclude this section by stating that setting the observation error to σobs =5 cm causes the innovation statistics to be more consistent with the ideal one. It also slightly improves the analysis results at locations further away from assimilation stations. However, the impact on forecast errors is small. 6.4.10 Discussion An issue of ensemble-based data assimilation is the possible spurious correlation due to sampling error. This problem can be eliminated by using large number of samples. In the two-sample Kalman filter, this means to use very long time series of the two model realizations. In principle, since the algorithm is intended to be run off-line and only once, there should not be any computational problem. Nevertheless, if the samples are not sufficiently long that spurious correlation problem occurs, we may combine this algorithm with the covariance localization technique proposed by Gaspari and Cohn (1999), which has been examined by e.g. Hamill and Withaker (2001), Mitchell and Houtekamer (2000), Houtekamer and Mitchell (1998, 2001), Sørensen et al. (2004a), and Buehner (2004). The main idea of this method is to use a cut-off radius, beyond which observations are not used, to avoid having to estimate the small correlations associated with remote observations. Another point of discussion here is whether the solution is in fact optimal. It should be noted here that the two-sample Kalman filter will give an estimate of a steady-state 6.5 Conclusion 102 Kalman gain, which is optimal for a given model error covariance. Different model error covariance specification will give different solution. The example of the UKMHIR and ISO systems illustrate this. For both systems, the Kalman gain is taken from the fifth closed-loop step. That is, it is taken after the iteration converge; hence, it is the optimum solution for the respective error statistics specification. The example in Section (6.46.4.8) illustrates how model error specification impacts the performance of the resulting data assimilation system. In this case, it is found that the UKMHIR approach performs better than the ISO. However, it does not necessarily mean that the UKMHIR gives the optimum solution for the DCSM. The optimum solution will be achieved only if the true error covariance is used. In practice, this is not possible to achieve, since true state is never known. Moreover, available observation data is usually much less than the model variables. One can only try to approximate the unknown true error covariance. The specification of error covariance is important for the data assimilation. A wrong specification of model eror covariance may give negative impact on the analysis as illustrated in Dee (1995). For the experiments performed in this study, however, there is no indication of negative impact of both data assimilation systems. This may indicate that the error covariance used in this study is reasonably good for the DCSM. A complicating factor is that the model error covariance may vary in time. It is already understood, at least conceptually, that model error at each time is dependent on the actual system state (e.g. Dee 1995) in an unknown manner. Hence, approximating the error covariance by a constant one may never result in the most optimal solution. Therefore, this motivates a further study to extend the approach described here to work with more samples. By using more samples, it is possible to compute the covariance by averaging over a shorter time. The covariance is then recomputed using data from different analysis time window. In this way, the covariance estimate will vary depending on the time used for the computation. 6.5 Conclusion We have presented in this paper an application of the two-sample Kalman filter. The implementation is done with an operational storm surge prediction model, the Dutch Continental Shelf Model (DCSM). Two wind fields are available for driving the model, namely HIRLAM and UKMO. The two forecast samples required for computing the Kalman gain are generated by running the model twice, each run driven by either one of the two windfields. The basic assumption in this work is that the difference between the two wind fields is characteristic for the unknown true error. In this paper we suggest to use a surrogate quantity for the unknown true error in order to estimate the forecast error covariance used to define a steady-state Kalman gain. The main advantage of this approach is that we can avoid explicit modelling of the error covariance. It is generally believed that the error in the wind fields is correlated in time. Since the two-sample Kalman filter is developed based on the assumption that the system is Storm surge forecasting using a two-sample Kalman filter driven by white noise, a modification on the system representation is required before we can implement the algorithm. The objective of the transformation is to construct a new model representation, in which the driving error process becomes white noise. In this paper we present a transformation for the case where the source of uncertainty is in the input forcing. The transformation is done by making use of two samples of the uncertain input forcing. The method for transformation is presented for the case where noise can be represented by an AR(1) process. However, similar technique can be used for other ARMA type model error as well. The system transformation in combination with the two-sample Kalman filter algorithm is verified by using twin-experiment with a simple one-dimensional wave model. Here it is possible to generate samples for both the transformed as well as the original model representation. The verification is done by comparing the results to a reference, obtained using the original representation. The experiments show that the results are consistent with the reference. Finally the algorithm is implemented with the DCSM. Some care must be taken beforehand because there is a systematic difference between HIRLAM and UKMO wind fields. We perform a correction on the UKMO wind fields to eliminate this systematic difference. Using HIRLAM and corrected UKMO wind fields, we compute the estimate of the steady-state Kalman gain. It turns out that the algorithm converged after three iterations. An evaluation is performed by implementing the gain estimates in a data assimilation system. We checked the performance of the assimilation system on both assimilation stations, where data is used for the assimilation, and validation stations, where data is not used for the assimilation. The experiments show that the assimilation system with the estimated gain is successful in bringing the model closer to data in all observation stations. Another evaluation is also used by using independent data, which is not used in the computation of the Kalman gain. Here, the data assimilation system is used to give better estimate of the initial condition for making forecasts. It is found that the forecast initialized by using this data assimilation performs better than the one without data assimilation. In the end, a comparison is made between the data assimilation system proposed here to the one based on an isotropic assumption. The short-run forecast based on the approach proposed here is found to perform better than the one with isotropic assumption. An additional experiment is performed on tuning the observation error standard deviation. It shows that for the setup used in this study, a more accurate analysis can be obtained by reducing the standard deviation of the observation error. It also indicates the significance of latter iterations of the two-sample Kalman filter algorithm in producing more accurate analyses. It is important to note here that the two-sample Kalman filter will produce a gain matrix, which is optimal for the given specified model error covariance. Therefore, it will be optimal only if the true error is indeed statistically stationary and the true error covariance is known. Since it is not possible to obtain the information about the true error, we have to approximate them. It is known that the model error changes with time along with the actual system state. For our next study, we are going to extend our approach 103 104 6.5 Conclusion to work with more than two samples. By using more samples, it is possible to compute the covariance by averaging over shorter time. In this way we can relax the stationarity assumption. 7 Ensemble storm surge data assimilation and forecast 7.1 Introduction Probabilistic forecast system has been used operationally in the atmospheric field since the past decade. Some examples of meteorological centers where such system is operational are the National Centers for Environmental Prediction (NCEP), the Canadian Meteorological Center (CMC), and the European Centre for Medium-Range Weather Forecasts (ECMWF). The development of a probabilistic forecast system is motivated by the fact that forecast is always uncertain due to inaccurate initial condition and inevitable approximation taken to describe the physical system. The significance of a probabilistic forecast is that it can give users some idea about the forecast uncertainty and predictability of the system. However, a full probabilistic forecast, which evolves the full probability function of the system state in time, is computationally not feasible. A common approach to work around this problem is to use Monte Carlo method. The idea of this approach is to represent the probability distribution of the system state forecasts by using ensemble of possible forecasts drawn from that probability distribution. The forecast ensemble is generated by perturbing the uncertain components of the system, such as initial and boundary conditions, and propagate each sample in time by using the forecast model. Two advantages can be expected from an ensemble forecast system. First, the mean of the ensemble forecast provides a net improvement in average forecast skill. Leith (1974) showed that the mean of the ensemble forecast is statistically better than any individual forecast. Second, the spread of the forecast ensemble at a single time offers an information about the uncertainty of the forecast at that time. It may give a qualified estimate of the confidence in the forecast (Frogner and Iversen 2001). This may help the decision making process related to the product of the forecasts. At the Norwegian Meteorological Institute, an ensemble prediction system has been operational for a number of years and it is called the LAMEPS (Frogner and Iversen 2002; Frogner et al. 2006). Studies about the LAMEPS performance in predicting geopotential height, MSL pressure, and precipitation events can be found in Frogner and Iversen (2002) and Frogner et al. (2006). Here, we perform a study using the 10 m wind speed ensemble 105 7.2 Data assimilation algorithm 106 generated by the LAMEPS and use them as meteorological forcing for a storm surge forecasting model. In this paper, we present our work with a storm surge forecast model, called the Dutch Continental Shelf Model (DCSM), which is operational in the Netherlands. It is known that one of the main sources of uncertainty in operational water level forecasts is uncertainty in the meteorological forcing (Gerritsen et al. 1995). Here, we explore the possibility of representing this uncertainty by using ensemble of meteorological forecasts produced by the LAMEPS. The idea is to generate storm surge forecasts ensemble by forcing the DCSM using each member of the 10 m wind speed LAMEPS ensemble. The main objective of this study is to investigate the possible use of the resulting forecasts ensemble to estimate the forecast error covariance for data assimilation. Here, the data assimilation is carried out in the framework of ensemble Kalman filtering. In this framework, the forecast error covariance to define a Kalman gain and the Kalman gain is used to update each member of the forecasts ensemble by assimilating observational data. The main topic of this study is about forecast error covariance modelling. In the operational storm surge forecast system, the data assimilation coupled to the DCSM is based on a steady state Kalman filtering and the forecast error covariance is assumed to be isotropic. In a previous study by Sumihar et al. (2009) the isotropic assumption is relaxed by computing the error covariance from the difference between the wind products of two meteorological centers: UK Met Office and the Royal Dutch Meteorological Institute (KNMI). This method is similar to multi-model ensemble approach (Hagedorn et al. 2005), but only two models are used and the forecast error covariance is computed by averaging the ensemble difference over time. In this paper, we extend this work and use more ensemble members for computing the covariance. By using more ensemble members, the time averaging can be done over shorter time. This offers an additional advantage to generate flow dependent error covariances. Like the previous work, the main advantage of this approach is that explicit modelling of the forecast error covariance is avoided, because information about the forecast error covariance is contained within the ensemble members. We shall first describe the data assimilation procedure used in this study and error modelling approach in Section 7.2. The set-up of the experiments is given in Section 7.3. Section 7.4 presents the estimates of the statistics of the wind stress error. The results of hindcast experiments are given and evaluated in Section 7.5, while Section 7.6 presents a number of forecast experiments. Section 7.7 describes and discusses the comparison of the ensemble based data assimilation with a steady-state Kalman filter. The chapter concludes in Section 7.8. 7.2 Data assimilation algorithm The data assimilation algorithm used in this study is an extension of the two-sample Kalman filter algorithm proposed by Sumihar et al. (2008). The two-sample Kalman Ensemble storm surge data assimilation and forecast 107 filter is an iterative procedure for computing the steady-state Kalman gain of a stochastic process by using two samples of the process. The computation of the steady-state Kalman gain is performed by making use of the difference between the two samples to compute the forecast error covariance. The two-sample Kalman filter is based on the assumption that the error process is stationary and ergodic. Therefore, it is possible to compute the covariance by averaging over time. Here, the algorithm is extended to work with n samples, where n > 2, so that the averaging can be done over a shorter time. Moreover, the computation of the Kalman gain is done at each analysis period. Hence, it can relax the assumption that the error process is stationarity over a long time. Nevertheless, we assume that within each analysis window, the error covariance is constant. It is known that the error in wind forcing is correlated in time. Like in Sumihar et al. (2009), we assume that each member of the wind ensemble is deviated from the ’true’ wind field by a colored noise term, represented as an AR(1) process. Since in the framework of Kalman filtering the error process, which perturbs the system should be white noise, we need to transform the representation of the wind forcing ensemble in such a way that it is driven by white noise. In this section, we shall first introduce a generalization of the sample transformation used in Sumihar et al. (2009). Subsequently, we describe the analysis algorithm, which is an extension of the two-sample Kalman filter. The section concludes with a summary of the overall algorithm used in this study. 7.2.1 System representation transformation One of the main sources of error in a storm surge forecasting system is the uncertain input wind forcing (Gerritsen et al. 1995). To account for this uncertainty, we use an ensemble of wind field and consider each wind field as deviated from the ’true’ wind field by a colored noise process, governed by an AR(1) process. As we have stated earlier, a Kalman filter algorithm requires the system of interest to be driven by white noise process. This section describes how we transform the system representation into the one driven by white noise process. Suppose that we have a dynamic system Φ driven by an uncertain input forcing u ˆ(tk ) ∈ Rq as the following z(tk+1 ) = Φz(tk ) + Fˆ u(tk ) (7.1) ˆ (tk ) = u ¯ (tk ) + w(t ˆ k) u (7.2) where z(tk ) ∈ Rp denotes the system state at time tk and F is the p × q input forcing matrix. The uncertain input forcing u ˆ (tk ) is modelled as a stochastic process, which can ¯ (tk ) ∈ Rq and noise w(t ˆ k ) ∈ Rq terms. Both be written as the sum of the true forcing u ¯ (tk ) and w(t ˆ k ) are unkown. We suppose further that the noise vector w(t ˆ k ) in equation u (7.2) is governed by an AR(1) process that reads ˆ k ) = Aw(t ˆ k−1 ) + ηˆ(tk ) w(t (7.3) 7.2 Data assimilation algorithm 108 where A is a time-invariant p×p diagonal matrix whose elements determine the correlationˆ k ) ∈ Rp×p . time, and ηˆ(tk ) ∈ Rp is a white noise vector with covariance Q(t The objective of the transformation is to obtain another representation of the dynamics (7.1)-(7.2) such that it is driven by a white noise process. The standard approach of working with a system driven by colored-noise is to augment the state vector z(tk ) with ˆ k ). However, in this case it is not possible to use this approach directly, the noise vector w(t ˆ (tk ), without any access to the realizations of since we only have the realizations of u ˆ k ) separately from those of u ˆ (tk ). In the following, we propose a method to find an w(t ˆ k ) by utilizing n samples of u ˆ (tk ). These, in turn, will be used in the approximate of w(t state augmentation procedure. ˆ i (tk ), i = Suppose we have n independent members of the input forcing ensemble, u 1, ..., n. We introduce n 1X ˆ i (tk ) u(tk ) ≡ u n i=1 (7.4) ˜ i (tk ) ≡ u ˆ i (tk ) − u(tk ) w (7.5) ˜ i can The term u represents the average of the n input realizations. On the other hand, w be shown to evolve in time according to ˜ i (tk ) = Aw ˜ i (tk−1 ) + η˜i (tk−1 ) w (7.6) Notice the symmetry between Equation (7.6) and the noise process (7.3). Due to this ˜ i as the transformed colored noise and η˜i (tk ) as the transformed white symmetry, we call w noise processes. Using these new variables, we define the transformed input forcings as ˜ i (tk ) = u(tk ) + w ˜ i (tk ) u (7.7) ˜ i (tk ) = Equation (7.7) shows that the original set of input forcing is retained; that is u ˆ i (tk ), i = 1, ..., n. However, under this transformation, we can work with the average u and colored-noise terms separately. Note that if we have n independent realizations of ˆ i (tk ), by using Equation (7.4)-(7.5) we can compute the input forcing ensemble average, u ˜ i . Moreover, u and the approximate of colored-noise for each member of the ensemble, w by using Equation (7.6), we can compute the approximate of white-noise realizations, η˜i of the respective ensemble member. We can now use the transformed input forcing in a stateaugmentation procedure. For 0 zi (tk ) , ηi (tk ) ≡ ,M ≡ each ensemble member, by denoting xi (tk ) ≡ ˜ i (tk ) w η˜i (tk ) Φ F F , and B ≡ , we can rewrite the dynamical system (7.1)-(7.2) as 0 A 0 xi (tk+1 ) = Mxi (tk ) + Bu(tk ) + ηi (tk ) We now have a system representation, which is driven by white noise. (7.8) Ensemble storm surge data assimilation and forecast 109 7.2.2 Analysis algorithm The algorithm described here is an extension of the two-sample Kalman filter algorithm (Sumihar et al. 2009). To maintain the similarity with our previous works, we start the description by supposing that we have a stochastic process Xf (tk ) governed by Xf (tk+1 ) = MXa (tk ) + Bu(tk ) + η(tk ) a f o f X (tk+1 ) = X (tk+1 ) + K(y (tk+1 ) − HX (tk+1 ) − ν(tk+1 )) (7.9) (7.10) where M is model dynamics, B input forcing matrix, u(tk ) input forcing, η(tk ) whitenoise process representing model and/or input forcing error, ν(tk ) white-noise process representing observation error, yo (tk ) observation, H observation matrix, Xa (tk ) analysis state vector, and K is a constant gain matrix. For a stable system with a slowly varying error statistics, we can assume that within any time-window with sufficiently short time length, the error covariance is constant. Within each time-window, the covariance matrices of the processes Xf (tk ) and Xa (tk ) are given respectively by a f Pf = MPa M⊤ + Q (7.11) ⊤ (7.12) ⊤ P = (I − KH)P (I − KH) + K R K The objective of the algorithm described here is to find such K that minimizes the error variance within the corresponding time-window. The algorithm consists of an iterative procedure for improving the estimate of K. Starting from an initial value of the gain matrix, Ko , the solution of Equations (7.11)-(7.12) are computed, so we have Pfo and Pao . In turn, Pfo is used to compute the gain matrix for the next iteration, K1 , using an analysis-variance-minimizing equation: K = Pf H⊤ (HPf H⊤ + R)−1 (7.13) The new gain matrix is inserted into the equations system (7.11)-(7.12) and new solutions are computed. The procedure is repeated until the iteration has converged. Since K is defined to minimize the analysis error variance of the next iteration, we have Pao ≥ Pa1 ≥ ... ≥ Paj . Because Pa > 0, the iteration will converge. When the iteration has converged, the solution of the algorithm is an optimal gain matrix, that is the one which gives minimum analysis error variance. Using this gain matrix in the analysis equation above, we will obtain analysis state with minimum error variance. In this paper, we use the idea above using n realizations of the process Xf (tk ) to approximate the solutions of equations (7.11)-(7.12) for obtaining Pf . Suppose we have n realizations of the random proccess Xf (tk ): xf1 (tk ), ..., xfn (tk ), within a time window k = 1, ..., N. An unbiased estimator of the forecast error covariance using these samples can be obtained by averaging over time and ensemble members: Pf = N n 1 X 1 X f (x (tk ) − x ¯f (tk ))(xfi (tk ) − x ¯f (tk ))⊤ N k=1 n − 1 i=1 i (7.14) 7.2 Data assimilation algorithm 110 where x ¯f (tk ) is the forecast ensemble mean, given by n x ¯f (tk ) = 1X f x (tk ) n i=1 i (7.15) Note that in practice we do not need to compute the full covariance matrix Pf . Instead, it is sufficient to compute Pf H⊤ and HPf H⊤ . For a reasonable number of observations, this can reduce the computational burden considerably. A common problem of working with a small number of ensemble members is the spurious correlation at distant locations due to sampling error. To work around this problem, a covariance localization should be applied. This can be done by using Schur product of the empirical covariances with an analytical correlation function ρ with local support (Gaspari and Cohn 1999). The Schur product is defined by (ρ ◦ Pf )ij = ρ(i, j)Pfij (7.16) and the analytical correlation function used in this study is the Gaussian function ρ(z, L) = exp(− z2 ) 2L2 (7.17) where z is the distance between a location to a point of reference and L is the decorrelation length. 7.2.3 Summary The analysis procedure consists of splitting the time period into several short time windows, where at each time window, the forecast error covariance can be assumed to be constant. At each time window, the analysis is performed on each ensemble member using the algorithm above. After the analysis, the analysis ensemble at the end of each time window is used as the initial conditions for the model at the next time window. To summarize, for each time window, the data assimilation procedure can be described on the step by step basis as follows: • Initialization: – compute and store input forcing average using equation (7.4): u(tk ), k = 1, ..., N – compute and store the approximate white noise processes for each ensemble member using Equation (7.6): η˜i (tk ), i = 1, ..., n, k = 1, ..., N – specify initial guess of Kalman gain K • Closed-loop step: – insert K into equation (7.10) Ensemble storm surge data assimilation and forecast – generate n realizations of Xf (tk ) of dynamics (7.9)-(7.10) using u(tk ), η˜i (tk ): xfi (tk ), i = 1, ..., n, k = 1, ..., N – estimate Pf using equation (7.14) – estimate K using equation (7.13) – repeat closed-loop step until Pf , and therefore also K, have converged In our experience, the iteration converges after three or four steps. But in practice, it may not be necessary to iterate the procedure until convergence is reached. The solutions from intermediate iteration steps may also give a satisfactory results. The analysis system described above can be seen as a generalization of the ensemble Kalman filter (Evensen 1994). When the ensemble size is sufficiently large, the averaging can be performed over the ensemble at a single time. Hence, it reduces to the EnKF. This procedure shares similarity as well with the method proposed by Xu et al. (2008a,2008b), in the sense that it uses ensemble from extended time to compute the forecast error covariance. The data assimilation procedure described here is also similar to the one described by Sørensen et al. (2004b), that both procedures are based on the assumption that the covariance is only slowly varying in time, hence also the gain matrix. The difference is that, in their work, the (weighted) averaging is done in term of gain matrix between instantaneous gain matrix and the smoothed gain matrix corresponding to the previous time step (time regularization). In our algorithm, the averaging is done over covariance estimate within a time window. 7.3 Setup of experiment The experiment setup used in this study consists of the DCSM as storm surge forecast model, an ensemble of 10 m wind speed forecasts produced by LAMEPS as input forcing, and in-situ observations of water level for both data assimilation and validation of the resulting storm surge forecasts ensemble. The period of the experiment is between August 2007 until January 2008. This section describes briefly each of these components. 7.3.1 The Model The results presented in this paper are obtained by using the Dutch Continental Shelf Model (DCSM). This model is used as one of the main components of the storm surge forecasting system, which is operational at the Dutch Royal Meteorological Institute, KNMI. The development of the model was started in the late 1980s (Verboom et al. 1992; Gerritsen et al. 1995). The DCSM is based on the two dimensional shallow water equations, which describe the large-scale motion of water level and depth-integrated horizontal flow. The time step for numerical integration of the shallow water equations is 10 minutes. The implementation of the DCSM is based on the software package WAQUA, which is de- 111 7.3 Setup of experiment 112 veloped based on the works of Leendertse (1967) and Stelling (1984). The DCSM covers the area of the northwest European continental shelf to at least the 200 m depth contour, i.e. 12oW to 13o E and 48oN to 62.33oN (Figure 3.1). The resolution of the spherical grid is 1/8o × 1/12o . The DCSM is driven by two input forcing: one at the open boundaries and the other on the sea surface. At the open boundaries, the tidal constants for ten tidal constituents, M2, S2, N2, K2, O1, K1, Q1, P1, ν2, and L2 are specified. These tidal components were estimated on the basis of results from encompassing models, matched with nearby coastal and pelagic tidal data. The prescription of ten tidal constituents along the sea boundary gives a sufficiently detailed tidal input for tidal simulations in the North Sea, including the nonlinear Southern part (Gerritsen et al. 1995). The input forcing on the sea surface is expressed in term of wind stress and computed from 10 m wind speed data. In this study, we use an ensemble of 10 m wind speed data, generated by the Norwegian Meteorological Institute LAMEPS, as input forcing to the DCSM. 7.3.2 10 m wind speed forecasts ensemble The ensemble of wind input used in this study is obtained from the ensemble weather forecasting system, called LAMEPS, which is operational at the Norwegian Meteorological Institute. It is an ensemble of the Norwegian version of the limited-area model HIRLAM. LAMEPS consists of 20 perturbed forecast members and one unperturbed (control) forecast. The horizontal resolution of the model is 0.25 degrees or approximately 28 km, and the model has 31 vertical levels. The area covered by LAMEPS is depicted in Figure 7.1. LAMEPS is a result of continuation and improvement of the targeted ensemble prediction system, TEPS (Frogner and Iversen 2001). In TEPS, the ensemble members are generated by maximizing the total energy norm of perturbations in a targeted specific area of interest: northern Europe and parts of the north Atlantic Ocean (Frogner and Iversen 2001). It uses a singular vector (SV)-based method to identify the directions of fastest growth (Buizza and Palmer 1995). Singular vectors maximize growth over a finite time interval and are consequently expected to dominate forecast errors at the end of that interval and possibly beyond. Here the optimisation interval is 48 h (Frogner and Iversen 2001). LAMEPS uses the same perturbations as TEPS, but it uses a forecast model with finer spatial resolution. The reason for using finer resolution model is to obtain better descriptions of ground-surface properties and dynamical features responsible for significant weather (Frogner and Iversen 2002). It is hoped that a mesoscale LAMEPS utilizing global perturbations targeted on northern Europe would enable better forecasts of possible extreme events compared to TEPS or pure deterministic forecasts. The development of LAMEPS was encouraged by a preliminary feasibility study by Frogner and Iversen (2002). Each LAMEPS member is generated by perturbing only the initial and lateral bound- Ensemble storm surge data assimilation and forecast Figure 7.1: LAMEPS and DCSM area. ary fields of the HIRLAM analysis. The perturbations are taken from the difference of each TEPS member relative to the TEPS control. These are provided once every day at 12 UTC. However, LAMEPS is started with a time-lag of 6 hours from these data due to a delay in producing and disseminating them. The run at 18 UTC has forecast length of 60 hours with lateral boundary conditions from TEPS used every third hour (Frogner 2008, personal communication). Some studies have been done to evaluate the quality of the ensemble in terms of precipitation rates, mean-sea-level pressure, geopotential height, two-metre temperature (Frogner and Iversen 2002; Frogner et al. 2006). In our study, we are interested especially in the 10 wind speed output of the LAMEPS to be used as input to the DCSM. Although no study has been published yet regarding the verification of LAMEPS with respect to wind, the LAMEPS has been operational and routine verification is also done on wind every 3 months. Shown also in Figure 7.1 is the DCSM area. To use the ensemble wind output of the LAMEPS, the wind field is projected to the DCSM area using a routine available at the Norwegian Meteorological Institute. For the area of DCSM, which is covered by the area 113 7.4 Wind stress error statistics 114 of LAMEPS, the wind speed is interpolated linearly to the DCSM grid. For the area of DCSM outside LAMEPS area, which resides on the lower left and right corners of the DCSM area, the wind field is determined by Archimedean extrapolation. It is an iterative method, where the wind field in the area of extrapolation is initialized by the average value of the wind field in the active area. The estimate is then improved by an iterative procedure by making use of the value of the wind field nearby the respective points in the extrapolation area. The extrapolation is likely to introduce more uncertainty in the wind field. However, it is expected not to give significant influence to the area of our interest, which is around the Dutch coast. LAMEPS data is available with time interval of six hours, while the DCSM is integrated with time step of 10 min. To provide wind data with time interval 10 min, the resulting wind stress is interpolated linearly in time. The interpolation is performed internally by the simulation software used to run the DCSM. 7.3.3 Water-level data Observed water level for this study is available from 16 stations. Some stations are located along the British coast, the others along the Dutch coast and some stations are located off shore. The water level is measured every 10 minutes in all locations. The observed water level data is preprocessed to eliminate erroneous data before being used for evaluating the model performance as well as for data assimilation. For the experiments described later, eight stations are used as assimilation stations, where data is used in data assimilation. The other stations are used as validation stations, where data is not assimilated but used only for validation (Figure 5.2). We note here that the aim of the model is to give accurate prediction of the storm surge. The observed water level, however, gives the total water level, which is the combination between the tidal component and storm surge. If the observations are used as they are, the data assimilation scheme will also correct for errors in the astronomical tide. In turn, this correction will disturb the surge. In order to reduce the effect of the tidal component on the data assimilation and evaluation, we remove the tidal component from the observations by replacing the harmonic analysis tide in the measurement locations with the astronomical tide as calculated by the DCSM (Gerritsen et al. 1995). By astronomical tide calculated by the DCSM, we mean the periodic simulated water level produced by running the DCSM without any wind forcing. These adjusted water level observations are used for data assimilation as well as evaluation of the resulting forecasts ensemble throughout this study. 7.4 Wind stress error statistics We would like to use the LAMEPS 10 m wind speed to determine the forecast error statistics of the DCSM. The 10 m wind speed is used to compute the wind stress as input Ensemble storm surge data assimilation and forecast 115 1 1 0.8 0.8 0.6 0.6 Autocorrelation Autocorrelation forcing to the DCSM. This section presents the resulting estimates of these statistics. The 10 m wind speed ensemble used in this study is the one in the period of August 2007 January 2008. In this study we focus ourselves to determining error covariance by using the resulting storm surge forecast ensemble. Nevertheless, we also need to estimate the correlationtime parameter of the wind error from the wind stress ensemble. Ideally, the correlationtime parameter should be determined at each grid point in the model domain and recomputed at different times according to the flow. However, in this study we assume that the correlation-time parameter is spatially uniform and constant in time. By using the algorithm described in Section (7.2.1), we have computed the estimated colored-noise realizations for the whole period where data is available in this experiment. It is now possible to compute the empirical correlation function by using these realizations. The correlation-time parameter is estimated by curve-fitting the correlation function of a theoretical AR(1) process to the empirical one. The procedure is repeated for each grid point in the model domain. The spatial average of the correlation-time parameter is found to be about five hours and will be used in the subsequent experiments. Figure 7.2 shows a typical autocorrelation function at one location above the sea area of the DCSM. 0.4 0.4 0.2 0.2 0 0 −5 −4 −3 −2 −1 0 Time [day] 1 2 3 4 5 −5 −4 −3 −2 −1 0 Time [day] 1 2 3 4 5 Figure 7.2: Autocorrelation function of the wind noise at one location above the sea area of DCSM, (a) u-direction and (b) v-direction. The full-line is the empirical autocorrelation function, while the dashed-line is the autocorrelation function of an AR(1) process with correlation-time about 5 hours. The error covariance estimate is computed using ensemble within a shorter time window, which corresponds to data assimilation period, and it is recomputed for different time windows. One question to be answered in this step is how long the time window should be for computing such estimate. Our interest is to produce covariance estimate, 7.5 Hindcast experiment 116 which reflects the true error covariance. While the true statistic is unknown, it is known to be dependent on the actual system state (Dee 1995). Ideally, we would like to use the ensemble only at a single time to estimate the error covariance at that time, like in EnKF (Evensen 1994). However, for small number of ensemble members like in our case, this approach suffers from sampling error. Basically, the time-window should be sufficiently short to capture the actual change in the forecast error covariance, but should be long enough to reduce sampling error. Figure 7.3 illustrates the effect of using an ensemble within an extended time on the spatial correlation estimate of the wind stress error. It shows some examples of wind stress error covariance estimates obtained using a single-time ensemble and the ones obtained using wind stress ensemble in a 24 h time window. The estimates obtained using a singletime ensemble have a complex structure, where the isolines are tightly packed. Moreover, they show significantly non-zero correlation at distant locations away from the reference point, which is probably a spurious correlation due to sampling error. On the other hand, the estimates obtained using ensemble from a 24 h interval have a much smoother structure. The area with non-zero correlation is more localized around the reference point and the correlation decreases with distance from the reference point. Moreover, the short duration structure at distant locations have significantly been removed as well. Nevertheless, for this example, there are still non-zero correlations at distant locations from the reference point for the meridional direction. This indicates the need of using a localization technique to confine the area with non-zero correlations to a limited area around the reference point. Figure 7.3 also illustrates that the spatial correlation of the wind stress error has different structure for zonal and meridional directions. Moreover, we have estimated the wind stress error covariance at different time windows and observed that the estimates evolve in time (see, for example, Figure 7.4). All these together are different from the ones used in the operational system, where the error covariance is assumed to be isotropic and constant in time. The question remains whether it will give more accurate analyses and, in turn, more accurate forecasts. We will see this later in the next sections. 7.5 Hindcast experiment We have performed a hindcast experiment using the procedure and data described earlier. This section presents and discusses the results of the experiment. The period of experiment is from 01 August 2007 until 31 January 2008. The DCSM is run starting from a rest situation. To eliminate the spin-up effect, the DCSM is run for 17 days before the hindcast experiment is started. The model is run by forcing the DCSM by each member of the 10 m wind speed ensemble. Therefore, at the end of day 17, there are 21 model states as initial conditions for each ensemble member as the starting points for the hindcast experiments. Note that, except the control run, the perturbed members of LAMEPS are actually disconnected in time. Each perturbed member of the wind ensemble for driving Ensemble storm surge data assimilation and forecast 0.4 0 0.6 .4 0.2 −0.4 0.4 0 0 .2 0 −0 .2 0.6 0.4 −5 0 Longitude [deg] 5 10 0 62 (d) 0 −0.2 0 0 .4 0.2 0.8 −0.4 0 0.4 −0.2 0.2 −10 62 0.4 .2 −0 .4 0.6 10 0 0.6 0 0.2 .4 −0 0.2 0 0 5 00.80.6 .4 0.4 0 0 50 0 Longitude [deg] (c) 0.2 −0 .2 0.2 −0.4 .2 0.2 0.6 0.8 48 −5 0.8 52 48 −10 0.6 0 0.2 0 0.4 .2 −0 0 0.8 −0.4 4 0 −0.2 −0 50 0 0 2 −0. 0 0. 0.2 0.4 0 −0.4 −0 0 .2 0 .2 0 2 0.6 0. 0 00.2 .4 −0 52 .2 −0 0.2 54 0.8 0.6 0 0.2 0.4 0 2 −0. 0.8 .20.6 00.4 0.6 −0 56 6 0. 0.4 4 0. 0 0 0.8.6 0.6 0 0 −0.2 0.4 2 . −0.40.2 0.2 0.2 0.4 0 −0.2 −0.4 .4 −0 0 00.2−0.2 .4 −0.4 −0.2 0 .4 0 2 0 0.04.6 0.80 0.2 0.4 0 58 0.2 0.8 0 0. −0.4 0.4 0.6 −0 0.2 0.4 −0 .4 0.2 −0 54 − 0. 0. 4 2 0.2 .2 −0 .2 00 0.4 0.6 0 8 0. 0.2 0.4 0 0.2 0.8 − 58 56 60 0 0 0.2 −0.2 (b) −0.4 −0.2 Latitude [deg] .4 0 60 62 −0 0 0.2 − 0.2 Latitude [deg] (a) 0.2 0.2 62 117 0 0 60 −0.4 2 0 −0. −0.2 0 0 0.2 0 0.2 0 0.8 0.8 0.6 4 0. 0.4 50 0 0 0 0.2 0 0.6 0 50 0.6 0 0.2 0.4 Latitude [deg] −0.2 0 0.2 0.2 Latitude [deg] 2 0.4 0.6 0 2 0. 0 0 0. 0 2 52 0 0 0.6 0 0 54 0. 4 0.6 0. 52 0 −0.2 0.4 0 0.2 0 6 .2 0 0. 54 0 0.4 4 0. −0 0.4 0 0 56 0 0 −0.2 0.4 0 −0.2 0 56 0.2 58 0 58 0 0 0 0 0.4 0 0 0.2 0 0.2 4 0. 0.2 60 0 48 0 48 −10 −5 0 Longitude [deg] 5 10 −10 −5 0 Longitude [deg] 5 10 Figure 7.3: Spatial correlation of the wind noise with respect to a location at Hoek van Holland. Figure (a) and (b) are obtained using a single time ensemble at 18:00, 07 September 2007, while (c) and (d) obtained using ensemble between 18:00, 07 September 2007 - 18:00, 08 September 2007. Figure (a) and (c) are for zonal direction, while (b) and (d) meridional. Contour interval is 0.2. the DCSM from rest is obtained by connecting the first 18 h segment, starting from respective analysis time, of that member within the whole 17 days initialization period. This procedure potentially produces unrealistic jumps in the resulting wind timeseries at connecting times. However, since the length of each segment is reasonably short, we expect that this discontinuity will not be too large. In fact, we have observed that this procedure produces a smooth ensemble of DCSM outputs. Moreover, connecting the wind field at 18 to the initial wind ensemble at the next forecast period is found to reduce the bias in the wind timeseries significantly, which grows larger otherwise. The water level observational error standard deviation is set to 0.10 m, which is equal 7.5 Hindcast experiment 118 −0 (b) 60 −0.4 0.4 0.4 0.2 0.4 5 10 0.8 0.2 0 −10 62 0.2 0 0.2 0.2 0 0 Longitude [deg] 0 0.40.6 0.2 0 0.2 −5 0.6 0.6 0 Latitude [deg] 0 0.4 0.2 0.4 0.6 Latitude [deg] −00.2 0.6 −0.2 48 −10 6 0 0.2 0.2 −0.4 50 2 0.4 0. 44 0. 0. −0.2 0. 0.4 0.4 0 0.2 0.4 0.2 4 0. 0.8 0.2 0.2 0.2 2 −0. 0 48 0.2 4 0. 54 0. 2 50 6 0.2 −0.2 2 0. 0 6 0. 0. 0.4 0.6 00.20.4 56 52 0.4 0.2 0.4 0 0.6 2 0. 0 0.4 0.6 0.2 0.8 0 2 −0.2 0 0.4 −0.4 0.04 0.6 0. 58 2 0. 0 0 .2 −0 0 0.4 4 0. 0.2 −0 0. .2 2 −0.4 0 −0 .2 0 2 0 0.4 −0.2 0 0 0.2 54 0. 0 −0.2 0 0 0.2 0.2 −0.4 0.4 − .2 −0 0 58 52 −0.4 60 56 .2 .4 −0 (a) 0 62 −0.4 62 −5 0 Longitude [deg] 5 10 62 0 0 0 (d) −0 .2 (c) 2 0. .2 0.4 0.2 2 0 0. 0.2 0.4 0.6 0.2 0.2 52 8 0. 0.2 0 0.4 48 −5 0 Longitude [deg] 5 10 0. 2 0 −10 0 0 0 0.2 48 .2 .4 0 −0 0.2 −0 0 50 −0.2 0.4 50 0.8 0.4 0 0 0.2 0.6 52 0.2 0.4 54 0.2 Latitude [deg] 0.4 0 0.4 0. 2 0.4 2 54 56 0.6 2 0. 0. 4 0. 0.6 0.2 0.4 0.2 0.2 0 0.2 56 0.2 −0 58 0.2 4 0. Latitude [deg] 0 0.2 0.2 2 0. 0 0.4 0.4 0. 0.4 2 0.2 58 0.2 0 0.2 0 60 0 60 −10 −5 0 Longitude [deg] 5 10 Figure 7.4: Spatial correlation of the wind noise with respect to a location at Hoek van Holland. Figure (a) and (b) are obtained using a single time ensemble at 18:00, 12 September 2007, while (c) and (d) obtained using ensemble between 18:00, 12 September 2007 - 18:00, 13 September 2007. Figure (a) and (c) are for zonal direction, while (b) and (d) meridional. Contour interval is 0.2. to the one used in the operational system. The time window for computing the forecast error covariance is chosen to be 24 h. Because the LAMEPS is analyzed once per day at 18:00, the time window is chosen to start from 18:00 at one day up to 18:00 at the next day. At each time window, the observation data is assimilated at each time step. For covariance localization, the decorrelation length is set to 200 km. This value is chosen to leave the error covariance structure, obtained empirically, around the respective reference point unchanged, while the covariance at distant locations is reduced to practically zero. Ideally, the data assimilation system should produce analysis ensemble with minimum analysis ensemble mean error with an ensemble spread, which can represent the statistics Ensemble storm surge data assimilation and forecast of this error. However, since we are working with real data, we do not have any access to the truth. Therefore, it is impossible to check the error statistics. An alternative way for ensemble verification, which is common in practice, is to verify it against analysis data. In our study, however, we choose to verify the ensemble against observation data. Verification against observation has several advantages. Unlike the verification against analysis, verification against observation avoids being affected by the local systematic errors of the analysis. It also provides a validation that is more independent from the forecast and analysis systems. Different procedures are performed for evaluating the property of the analysis ensemble. The first one is to evaluate the resulting ensemble spread time-series, to check the effect of iteration in computing the Kalman gain and to see if there is any indication whether the ensemble would collapse. The second evaluation is a standard check of Kalman filter performance, by checking if the water level innovations have the same statistics as the theoretical ones (Maybeck 1979). The other checks are performed on evaluating the two most basic quantities of an ensemble system: spread and skill (Buizza and Palmer 1998). The ensemble skill is defined in term of root-mean-square (rms) water-level residual. Similarly, the ensemble spread is measured as the rms of the difference between the ensemble members and the ensemble mean. We check the reliability property of the analysis ensemble by using rank histograms (Hamill 2001). The final check is performed in term of root mean square (rms) of the water level residual. To check the effect of observation error standard deviation on the resulting analyses ensemble, the experiment and evaluation are performed twice, each one with different value of observation error standard deviation. The standard deviation of observation error used here is σobs = 10 cm and σobs = 5 cm. At each experiment, the observation error standard deviation is set the same for all assimilation stations. 7.5.1 Time series of ensemble spread Before we perform any data assimilation experiment, we first check the characteristic of the resulting storm surge ensemble obtained by forcing the DCSM with each member of the LAMEPS wind. Figure (7.5) shows the evolution of water level ensemble spread in time on top of a plot of surge observed at Hoek van Holland. The observed surge is obtained by subtracting astronomical tides component from observed total water level. For this presentation, the observed surge is segmented into 24 h time intervals as used in the hindcast experiment and the average over each 24 h interval is shown. To give a better visibility of the ensemble spread variation, the observed surge is scaled by a factor of 0.2. The first characteristic than can be seen from this figure is that the resulting storm surge ensemble has a spread, which is varying in time. As expected, it is in contrast with that in the operational system, where the error variance is assumed to be constant in time. The second thing that this figure suggests is that the ensemble spread is large at times when the observed surge is large positive. The correlation between the ensemble spread and observed surge at different locations is found to be around 0.6. This indicates that large 119 7.5 Hindcast experiment 120 spread tends to occur at times of large positive surge. hoekvhld 0.3 ens spread obs surge Water level [m] 0.2 0.1 0 −0.1 −0.2 10 20 30 40 50 60 70 Number of days since 01 August 2007 80 90 100 180 190 0.3 Water level [m] 0.2 0.1 0 −0.1 −0.2 90 100 110 120 130 140 150 160 Number of days since 01 August 2007 170 Figure 7.5: Timeseries of water-level ensemble spread (std) without data assimilation averaged over all observation stations and observed surge at Hoek van Holland scaled by a factor of 0.2. The assimilation of observation data should reduce the uncertainty of model results, as suggested by the Kalman filter equation. For our case, this should be reflected by the reduction of ensemble spread after data assimilation. Figure (7.6) shows the time series of ensemble spread before and after data assimilation using Kalman gains estimated from different iterations. This figure clearly shows that the ensemble spread is reduced after data assimilation consistently throughout the run. The reduction is more significant at times where the ensemble spread is large. It suggests that the more uncertain the model result, the more significant the effect of assimilated observation is in reducing the analysis uncertainty. As expected from theory, the Kalman filter will put more weight to the observation if the model result is more uncertain. This will result in larger correction on the model result for producing the analysis ensemble. Moreover, the ensemble spread is reduced further by data assimilation using the Kalman gain from the first and second Ensemble storm surge data assimilation and forecast 121 closed-loop steps. This suggests that iterating the estimation of the forecast error covariance, hence also the Kalman gain, helps reduce the uncertainty of the analysis. Further, Figure (7.6) also shows that there is no indication that the ensemble would collapse. This suggests that LAMEPS wind stresses are effective in maintaining the DCSM forecast uncertainty in a reasonable level. 0.1 Ensemble spread [m] without DA open−loop 1st closed−loop 2nd closed−loop 0.08 0.06 0.04 0.02 0 10 20 30 40 50 60 70 Number of days since 01 August 2007 80 90 100 180 190 Ensemble spread [m] 0.1 0.08 0.06 0.04 0.02 0 90 100 110 120 130 140 150 160 Number of days since 01 August 2007 170 Figure 7.6: Time series of water-level ensemble spread averaged over all observation stations. Each line, with decreasing line-width, represents the ensemble spread without data assimilation, with data assimilation using open-loop gain, 1st closed-loop gain and 2nd closed-loop gain, respectively. 7.5.2 Innovations statistics One way of checking if a Kalman filter performs well is to check if the innovation realizations have the same statistical descriptions as the theoretical ones. This is sometimes called residual monitoring (Maybeck 1979). From the Kalman filter equations, it can be shown that sequence of the innovations have zero mean and variance σ 2 = (HPf H⊤ + R). 7.5 Hindcast experiment 122 Since the Kalman filter is based on the assumption that all error is Normally distributed, we can check the Kalman filter performance by simply checking the sequence of the innovations: about 68% of the samples lie within the 1σ bound, 95% within the 2σ bounds, etc. We have computed the innovation realizations as the difference between ensemble mean and observation. Figure 7.7 shows the sequence of the water level innovations and the 1σ bound within a little portion of the experiment period for one of the assimilation stations, obtained with σobs = 10 cm. Shown also in the picture is the percentage of the residuals lying within the 1σ bound over the whole period of the experiment. The effect of data assimilation can be seen to reduce the bias in the innovation sequence and bring the innovation more into the 1σ bound. After data assimilation, the percentage of innovation samples within the 1σ reaches more than 90%, which is bigger than it should be. This indicates that some parameters may not have optimally been tuned. Figure 7.8 shows the innovation sequence and the 1σ bound obtained with σobs = 5 cm. Here, we see that the realizations of the innovation have better similarity with the theoretical statistics. This indicates that in the first experiment, the observation error standard deviation σobs was too large and that the performance of the Kalman filter can be improved by reducing σobs . 7.5.3 Rank histograms Another diagnostic for ensemble reliability is the rank histogram (Hamill 2001). Rank histograms are generated by repeatedly tallying the rank of the verifying observation, at each time step, relative to values from an ensemble sorted from the lowest to the highest. The rank histogram permits a quick examination of some qualities of the ensemble. A reliable ensemble will show up as a flat rank histogram. Consistent biases in the ensemble will show up as a sloped rank histogram, while a lack of variability in the ensemble will show up as a U-shaped, or concave, population of the ranks. On the other hand, if the ensemble dispersion is too big, then the observation will populate the middle ranks more than the extremes’. Non-uniform rank histograms implies that the ensemble relative frequency will be a poor approximation to probability (Wilks 2006). As pointed by Hamill (2001), Sætra et al. (2004) and Candille et al. (2007), in general, verification against uncertain observation data requires the inclusion of the observation uncertainty. Hamill (2001) specifically indicated that the shape of the rank histogram is affected by the observation noise as well. In practice, imperfect observations, contaminated by instrumentation error for example, will be used for verification. Moreover, even if the instrumentation error is small, there is always representativeness error, which causes the model results to deviate from the observation. Hamill (2001) suggested that if the observational errors are a significant fraction of the spread in the ensemble, then rank histograms should not be generated by ranking the observation relative to the sorted ensemble. Instead, the rank histogram should be generated by ranking the observation relative to a sorted ensemble with random observational noise added to each member. The Ensemble storm surge data assimilation and forecast residual std 0.2 0.15 Percentage within one−sigma: 65.6079% 0.15 Water level residual [m] Water level residual [m] 0 −0.05 0 −0.05 −0.1 −0.15 −0.15 −0.2 −0.2 146 147 148 Number of days since 01 August 2007 149 150 145 residual std 0.2 146 147 148 Number of days since 01 August 2007 149 Percentage within one−sigma: 92.988% 0.15 150 residual std 0.2 Percentage within one−sigma: 91.3022% 0.1 Water level residual [m] 0.1 Water level residual [m] 0.05 −0.1 0.05 0 −0.05 0.05 0 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 145 Percentage within one−sigma: 94.3074% 0.1 0.05 0.15 residual std 0.2 0.1 145 123 146 147 148 Number of days since 01 August 2007 149 150 145 146 147 148 Number of days since 01 August 2007 149 150 Figure 7.7: Sequence of water level innovation and the +/- 1σ interval for assimilation station at Hoek van Holland, with σobs = 0.10 cm: (a) without data assimilation, with data assimilation using (b) open-loop, (c) 1st closed-loop and (d) 2nd closed-loop Kalman gains. specification of the observation noise standard deviation should not be a problem, since it is already specified and used for the data assimilation system anyway. Following this suggestion, we perturbed each ensemble member with random noise drawn from observation error distribution. The resulting rank histograms obtained with σobs = 0.10 cm are presented in Figure 7.9. The rank histograms show that most of the observations are within the span of the ensemble. Without data assimilation, however, the observations populate more the lower ranks. For the period after November 2007, the rank histogram clearly indicate negative bias, where the left-extreme is over populated and the population decreases with increasing ranks. The rank histograms for ensemble after data assimilation suggest that the effect 7.5 Hindcast experiment 124 residual std 0.2 0.15 Percentage within one−sigma: 41.5588% 0.15 0.05 0 −0.05 0 −0.05 −0.1 −0.15 −0.15 −0.2 −0.2 146 147 148 Number of days since 01 August 2007 149 150 145 residual std 0.2 Percentage within one−sigma: 74.4523% 0.15 147 148 Number of days since 01 August 2007 149 150 residual std Percentage within one−sigma: 71.8137% Water level residual [m] 0.1 0.05 0 −0.05 0.05 0 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 145 146 0.2 0.1 Water level residual [m] 0.05 −0.1 0.15 Percentage within one−sigma: 78.3614% 0.1 Water level residual [m] Water level residual [m] 0.1 145 residual std 0.2 146 147 148 Number of days since 01 August 2007 149 150 145 146 147 148 Number of days since 01 August 2007 149 150 Figure 7.8: As Fig.(7.7), with σobs = 0.05 cm. of data assimilation is to reduce the bias between observation and model results. This is clearly illustrated for the histograms associated with data after November 2007. The over populated extreme ranks are shifted to populate the middle ranks after data assimilation. There is not much difference between the pattern of the rank histograms after data assimilation using Kalman gain obtained from open-loop and 1st closed-loop iteration. After data assimilation, the rank histograms have a bell-like shape, which suggests that the ensemble spread is too large. The rank histograms of the ensemble analysis with σobs = 0.05 cm have different shape from the one with σobs = 0.10 cm, as shown in Figure 7.10. Without data assimilation, more observation now falls in the extreme ranks, the extreme lower rank being more populated than the upper rank. The effect of data assimilation can be seen again as to reduce the bias by bringing more observation to the middle ranks. The figures also show that implementing gain estimates from latter iterations produces rank histograms, Ensemble storm surge data assimilation and forecast 0.14 125 0.14 without data assimilation 0.12 with open−loop gain 0.12 0.08 0.08 P(rank) 0.1 P(rank) 0.1 0.06 0.06 0.04 0.04 0.02 0.02 0 1 4 7 10 13 16 19 0 22 1 4 7 10 Rank 13 16 19 22 Rank 0.14 0.14 with 1st closed−loop gain 0.12 with 2nd closed−loop gain 0.12 0.08 0.08 P(rank) 0.1 P(rank) 0.1 0.06 0.06 0.04 0.04 0.02 0.02 0 1 4 7 10 13 16 Rank 19 22 0 1 4 7 10 13 16 19 22 Rank Figure 7.9: Rank histograms for water level aggregated over all observations, with σobs = 0.10 cm: (a) without data assimilation, with data assimilation using (b) open-loop, (c) 1st closed-loop, and (d) 2nd closed-loop Kalman gains. which are closer to flat. However, the rank histograms still contains overpopulated extreme ranks, which indicate possible bias in the model. Some evaluations indicate that this may be due to the fact the performance of DCSM is different for different locations. The area around the northern part of the Dutch coast, for example, is known to be difficult to reconstruct with the present grid size used in DCSM. This may have caused the remaining bias indicated by the rank histograms. 7.5.4 Water level residual rms A final check of the performance of the analysis system is to check the actual water level residual, computed as the difference between observed data and the ensemble mean. A good analysis system should produce an ensemble analysis with rms residual, which is smaller than the one without data assimilation, at all locations. Here, we also check the effect of data assimilation using the gain matrix estimated from different iterations. To gain more insight about the performance of the analysis system, we split the presentation 7.5 Hindcast experiment 126 0.14 0.25 with open−loop gain 0.12 0.2 0.1 0.15 P(rank) P(rank) 0.08 without data assimilation 0.06 0.1 0.04 0.05 0.02 0 1 4 7 10 13 16 19 0 22 1 4 7 10 Rank 13 16 19 22 Rank 0.14 0.14 with 1st closed−loop gain 0.12 with 2nd closed−loop gain 0.12 0.08 0.08 P(rank) 0.1 P(rank) 0.1 0.06 0.06 0.04 0.04 0.02 0.02 0 1 4 7 10 13 Rank 16 19 22 0 1 4 7 10 13 16 19 22 Rank Figure 7.10: As Fig.(7.9), with σobs = 0.05 cm. Note that the figure in the first panel has different scale from the others. into two groups: assimilation stations and validation stations. Figure (7.11) shows the rms residual at assimilation stations obtained with σobs = 10 cm. Shown also in the figure is the rms residual of deterministic forecast, obtained by running the DCSM with the control run of LAMEPS. From this figure we see that there is no significant difference between the analyses based on deterministic and ensemble system. The advantage of using ensemble system compared to the deterministic one, as pointed by Leith (1974), is not found here. A discussion about this is given in Section 7.7. After data assimilation, the rms residual decreases significantly, showing that the assimilation system has successfully steered the model closer to the observed data. The figure also shows that the smallest rms residual is obtained by assimilation system using open-loop gain. The rms residual increases, with smaller increments, for assimilation system using gain from later iterations. The analysis algorithm in Section 7.2 shows that the uncertainty of model results is reduced after each iteration. After each iteration, this results in an assimilation system, which puts less and less weight to the observed data. Therefore, the rms water-level residual increases with iterations. Ensemble storm surge data assimilation and forecast 127 0.2 Control run EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop 0.18 0.16 RMS water level residual [m] 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 wick nort lows sher dovr vlis hoek denh Figure 7.11: RMS of water-level residual at assimilation stations, with σobs = 0.10 cm. 0.2 Control run EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop 0.18 0.16 RMS water level residual [m] 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 abdn eurp cadz room ijmd harl huib delf Figure 7.12: RMS of water-level residual at validation stations, with σobs = 0.10 cm. The rms residual at validation stations obtained with σobs = 10 cm, presented in Figure (7.12), shows that the data assimilation system has also successfully forced the model to be closer to the observed data in the validation stations. This indicates that the covariance structure estimated by using the procedure described in this paper is spatially consistent so that it distributes the innovations at assimilation stations consistently to other locations as well. There is not much difference in the rms residual obtained using data assimilation systems with the gain matrix from different iterations, except at the stations located in the 7.5 Hindcast experiment 128 northern part of the Dutch coast, i.e. Harlingen, Huibertgat, and Delfzijl. The results obtained with σobs = 5 cm are shown in Figure (7.13)-(7.14) for assimilation and validation stations, respectively. Here, we see again that the analysis system has successfully steered the model closer to the observation at all stations. At assimilation stations, the rms residual with data assimilation is smaller than those obtained with σobs = 10 cm. The reason for this is that, with smaller standard deviation of observation error, the analysis system puts more weight on the observation. As a result, it will pull the model even closer to the data. Therefore, it produces smaller rms residual. However, in the validation stations, the rms residual obtained with assimilation using open-loop gain is larger than those with σobs = 10 cm. The rms residual decreases as the gain estimate from latter iterations is used for assimilation. In the end, using the Kalman gain from the 2nd closed-loop step for data assimilation, there is not much difference between system with σobs = 10 cm and σobs = 5 cm. This indicates that for smaller observation error standard deviation, the iteration becomes important in obtaining better analysis accuracy. 0.2 Control run EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop 0.18 0.16 RMS water level residual [m] 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 wick nort lows sher dovr vlis hoek denh Figure 7.13: RMS of water-level residual at assimilation stations, with σobs = 0.05 cm. 7.5.5 Discussion For the system used in our study, it is traditional to set the observational error standard deviation to σobs = 10 cm. This value has been used for the operational data assimilation system with DCSM. It covers both the instrumentation as well as the representativeness error. Moreover, it was also chosen to impose a better computational conditioning of the data assimilation scheme. Perturbation of each ensemble member with observation error is also required for ensemble data assimilation to avoid underestimation of the analysis Ensemble storm surge data assimilation and forecast 129 0.2 0.18 0.16 Control run EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop RMS water level residual [m] 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 abdn eurp cadz room ijmd harl huib delf Figure 7.14: RMS of water-level residual at validation stations, with σobs = 0.05 cm. error covariance matrix. As pointed earlier, a good analysis system is the one, which produces analysis ensemble with small analysis ensemble mean rms residual, with a spread that can represent the statistics of the residual. We have seen that the analysis systems implemented in this study have successfully reduced the rms residual at all locations. The analysis systems have also been shown to reduce the bias between model and observation. However, the diagnostics used to check the reliability of the analysis ensemble have found to indicate that the ensemble is overdispersive. On the other hand, as also argued by Leeuwenburgh (2007), this finding may also indicate that the specified observation noise variance is too large. In fact, considering the size of the ensemble spread in Figure 7.6, it can be seen that the observation noise is more dominant in determining the total spread of the ensemble used in the verification. We have, therefore, performed also an experiment with σobs = 5 cm. The resulting ensemble is shown to have better reliability property, indicated by a flat rank histogram. However, the rank histograms also indicate that the ensemble is still biased. Nevertheless, the results presented in the remaining of this paper will be the ones obtained using σobs = 5 cm. 7.6 Forecast experiment We have also performed forecast experiments using the DCSM as the storm surge forecast model and the forecast wind field of LAMEPS as input forcing ensemble. Here we are interested in checking the effect of data assimilation on the forecast performance. Like in the operational system, the data assimilation is used to provide more accurate initial 130 7.6 Forecast experiment conditions for forecast runs. A more accurate initial condition is naturally expected to produce more accurate prediction, at least for short term forecast. This section describes and discusses the forecast performance of the ensemble system studied in this paper, in term of ensemble spread and skill. Following the fact that LAMEPS is analyzed once per day at 18:00, the forecast runs in this experiment are also performed once a day, started from 18:00 up to the next 60 h. A 24 h hindcast run is performed on each ensemble member to provide initial ensemble, which represent the uncertainty of the initial condition, for each forecast run. That is, the ensemble at final time of a hindcast period is used as initial ensemble for the next forecast run. Figure 7.15 presents the rank histograms of the forecast ensemble obtained with σobs = 0.05 m, aggregated over all observation locations, over all forecast runs in the interval 01 August 2007-31 January 2008. For the interval 0-6 h and 6-12 h, the rank histograms show a U-shape indicating underdispersion. This is consistent with the rank histograms of the analysis ensemble. After 12 h, the lower rank of the histograms become more populated and the rank histograms become sloped. The sloped rank histograms indicate that after sometime the model results are biased. The same pattern is found with the setup using σobs = 0.10 m. However, instead of showing underdispersion, the rank histograms obtained with σobs = 0.10 m show overdispersion in the short term forecasts. The rankhistograms become sloped for longer forecast time. We note also here that there is no visible difference between rankhistograms of forecast ensemble initialized from hindcasts using gain estimates obtained from different iterations nor from the one without data assimilation. As pointed earlier, connecting the wind ensemble timeseries at 18 h to the initial ensemble of the next forecast run reduces the bias significantly, resulting in a more reliable initial ensemble. For longer term forecasts, the reliability is expected to be similar since the effect of initial condition will vanish after sometime. This indicates that evaluating ensemble reliability alone is not sufficient for checking the impact of data assimilation on forecast performance. The other property of the forecast ensemble that we check is the ensemble forecast skill. The measure of ensemble forecast skill used here is the rms of ensemble-mean forecast water level residual. For illustration, Figure (7.16) presents the results associated with observation station at an assimilation station located at Vlissingen and a validation station at IJmuiden. The results are averaged over all forecast runs within the period of 01 August 2007 to 31 January 2008 and a 6 h running average is applied to the resulting rms forecast residual timeseries. Shown in the pictures are the rms residual of ensemblemean forecast, preceded with and without hindcasts with gain estimated from different iterations. Shown also in the pictures is the rms residual of the deterministic forecast, that is obtained by forcing the DCSM with the control-run member of the LAMEPS wind, without any data assimilation. In general, the rms residual increases with forecast time, indicating that the forecast performance degrades in time. This can be explained by the fact that after sometime model results become negatively biased, as indicated by the rank histograms earlier. The Ensemble storm surge data assimilation and forecast interval 0−6 h 0.25 0.2 P(rank) 0.15 0.15 0.1 0.1 0.05 0.05 0.05 1 4 7 10 13 16 19 0 22 1 4 7 10 Rank 13 16 19 0 22 interval 18−24 h interval 24−30 h 0.15 0.1 0.1 0.05 0.05 0.05 10 13 16 19 0 22 1 4 7 10 Rank interval 36−42 h 13 16 19 0 22 interval 42−48 h P(rank) P(rank) 0.15 0.1 0.05 0.05 0.05 4 7 10 4 7 10 13 16 19 0 22 13 16 19 22 interval 48−54 h 0.15 0.1 1 22 0.2 0.1 0 1 0.25 0.2 0.15 19 Rank 0.25 0.2 16 interval 30−36 h Rank 0.25 13 0.15 0.1 7 10 0.2 P(rank) P(rank) 0.15 4 7 0.25 0.2 1 4 Rank 0.25 0.2 0 1 Rank 0.25 P(rank) 0.15 0.1 0 interval 12−18 h 0.25 0.2 P(rank) P(rank) interval 6−12 h 0.25 0.2 P(rank) 131 1 4 7 10 Rank 13 16 19 22 Rank 0 1 4 7 10 13 16 19 22 Rank interval 54−60 h 0.25 P(rank) 0.2 0.15 0.1 0.05 0 1 4 7 10 13 16 19 22 Rank Figure 7.15: Rank histograms of forecast ensemble for different forecast intervals, initialized with hindcast using 2nd closed-loop gain, with σobs = 0.05 m. pictures also suggest that the forecast residual contains a periodic component. The periodic component is likely to be due to the remaining astronomical tides components of water level, which can not be predicted perfectly by the DCSM. The performance of short term forecasts of the deterministic system is similar to that of ensemble-mean forecast without hindcast. However, after sometime, they diverge from each other and the deterministic forecast performs better than the ensemble-mean forecast without hindcast. This shows that, unlike what is expected, the ensemble prediction sys- 7.6 Forecast experiment 132 vlissgn RMS water level residual [m] 0.2 0.15 0.1 Deterministic EPS without DA EPS with open−loop gain EPS with 1st closed−loop gain EPS with 2nd closed−loop gain 0.05 0 0 10 20 30 Forecast horizon [h] 40 50 60 ijmdbthvn RMS water level residual [m] 0.2 0.15 0.1 Deterministic EPS without DA EPS with open−loop gain EPS with 1st closed−loop gain EPS with 2nd closed−loop gain 0.05 0 0 10 20 30 Forecast horizon [h] 40 50 60 Figure 7.16: RMS of forecast water-level residual vs forecast horizons, for observation station located at Vlissingen and IJmuiden. tem degrades the average performance of the corresponding deterministic single forecast system, instead of improving it. A possible explanation of why this occurs is discussed in the next section. The figures also suggest that applying data assimilation improves the performance of short-term forecasts. The improvement occurs at all stations used in this study, at both the assimilation and validation stations. This may indicate that all gain estimates are reasonably consistent in spreading the correction to other model variables and locations. As suggested by the figures, on average, the improvement obtained by implementing the gain estimate from open-loop step is less than the ones obtained from later iterations. Even if the analysis residual, as shown in the previous section, suggests that the hindcast using open-loop gain gives initial conditions, which are closest to the observations, the performance of forecasts starting from these initial conditions is found to deteriorate more Ensemble storm surge data assimilation and forecast quickly than the ones initialized with hindcast using gain estimates from later iterations. This shows that the iterations of the gain estimating algorithm proposed in this study improve the gain estimates to be more optimal in spreading the corrections to other model variables and locations. Based on these figures, however, we see that there is hardly any difference between forecasts initialized from hindcast using the 1st and 2nd closed-loop gain estimates. The length of forecast period, where the performance is improved relative to those without data assimilation, is determined by the location of the stations. For the locations along the Dutch coast, the improvement occurs up to about 12 h. This period is approximately equal to the time required by a gravity wave to travel from the open ocean, near the north-west open boundary condition of the DCSM, to the Dutch coast. The data assimilation may have incorrectly updated the model states in this area. In turn, the incorrectly updated state propagates as a gravity wave along the British coast to the Dutch coast and deteriorates the forecast performance when it arrives at these locations. The effect of initial condition will vanish after sometime. The forecast performance eventually converges to the performance of ensemble-mean forecast without data assimilation. 7.7 Comparison with a steady-state Kalman filter In the previous sections, we have shown that the analysis systems based on the ensemble forecast have successfully brought the DCSM closer to observation data. Moreover, short term forecasts initialized from those analyses have also been shown to have better performance compared to the ones without data assimilation. Since the operational system couples the DCSM with a steady-state Kalman filter, a question remains as whether the ensemble based analysis system performs better than the one based on a steady-state Kalman filter. This section presents and discusses some results of the comparison. For this comparison, we use a steady-state Kalman filter prepared in our earlier work (Sumihar et al. 2009). Here, the Kalman gain is computed by using a two-sample Kalman filter implemented on the DCSM with two different wind fields as input forcing. The two wind fields are obtained from two different meteorological centers, namely the KNMI and UK Met Office. The steady-state Kalman gain used for this comparison is obtained from the fifth closed-loop iteration. At this step, the two-sample Kalman filter iteration has converged and therefore, it provides an optimal gain for that steady-state Kalman filter setup. For the experiments with the steady-state Kalman filter, we use the control run of LAMEPS wind as input forcing to the DCSM. The period of experiment is from 01 August 2007 until 31 January 2008. The comparison between the ensemble based analysis system and the one based on the steady-state Kalman filter is performed in term of their skill, defined by rms water level residual. Figure (7.17) gives an example of rms hindcast water level residual evolution obtained using the ensemble based analysis system with open-loop gain and the steadystate Kalman filter. This figure shows that the hindcast performance of the two analysis 133 7.7 Comparison with a steady-state Kalman filter 134 systems is varying in time, where one performs better than the other at some time intervals, while worse at other intervals. Shown also in Figure (7.17) is the observed surge, scaled with a factor of 0.2. The figure suggests that the performance variation of the two analysis systems is related to the size of the observed surge. Therefore, in the following we stratify the surge into three groups according to the size of the observed surge: positive surge defined as the time when observed surge is larger than 20 cm, negative surge when surge is smaller than -20 cm, and small surge when surge is between -20 and 20 cm. The comparison between the analysis systems is performed at each category. hoekvhld 0.3 obs surge rms residual (EPS with open−loop gain) rms residual (SSKF) Water level [m] 0.2 0.1 0 −0.1 −0.2 10 20 30 40 50 60 70 Number of days since 01 August 2007 80 90 100 180 190 0.3 Water level [m] 0.2 0.1 0 −0.1 −0.2 90 100 110 120 130 140 150 160 Number of days since 01 August 2007 170 Figure 7.17: Timeseries of rms of hindcast water-level residual obtained using ensemble based analysis system with open-loop gain and using steady-state Kalman filter and observed surge at Hoek van Holland scaled by a factor of 0.2. Figure (7.18) presents the resulting rms hindcast water-level residual averaged over all assimilation stations. To show the impact of data assimilation, we show also in the figure the rms water-level residual associated with deterministic run, where the DCSM is forced by using the control run of LAMEPS wind. Moreover, shown also in the figure is the rms water level residual associated with the ensemble system without data assimilation. From Ensemble storm surge data assimilation and forecast 135 the results of control run and ensemble run without data assimilation, we can see that both systems are similarly skillful in reproducing both positive and negative surge, and it gives the smallest residual during the time of small surge. The rms residual associated with the ensemble system without data assimilation is slightly smaller than that of the control run. However, the opposite is observed for negative surge. Assimilation Stations 0.16 Control−run without KF EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop Control−run with SSKF 0.14 RMS water level residual [m] 0.12 0.1 0.08 0.06 0.04 0.02 0 negative surge small surge positive surge Figure 7.18: RMS of hindcast water-level residual averaged over all assimilation stations, stratified into positive surge: surge larger than +20 cm, negative surge: surge less than -20 cm, and small surge: surge between -20 and 20 cm. With data assimilation, for ensemble based analysis systems, we can see that the rms residual increases as the gain estimate from later iteration is used, as expected from theory. For positive surge, the rms residual associated with the ensemble based analysis systems is consistently smaller than that of the steady-state Kalman filter. The figure suggests also that both ensemble based and steady-state Kalman filter perform equally well during small surge. However, the ensemble based analysis systems perform consistently worse than the steady-state Kalman filter in reproducing negative surge. This may indicate that the resulting storm surge ensemble during negative surge produces a worse estimate of the forecast error covariance than what is used in the steady-state Kalman filter. To gain more insight about the ensemble characteristics during the time of the three surge types, we show in Figure (7.19) the resulting water level ensemble spread averaged over all assimilation stations. Here, we show the ensemble spread at the 2nd closed-loop iteration; hence, it gives the best estimate relative to the earlier iterations. We show also in 7.7 Comparison with a steady-state Kalman filter 136 the same picture the forecast error standard deviation with respect to water level averaged over all assimilation stations as used in the steady-state Kalman filter. The figure clearly shows that the ensemble spread is the smallest in the time of negative surge and the largest at positive surge. Moreover, the figure also suggest that relative to the one assumed in the steady-state Kalman filter, the resulting ensemble underestimate the forecast error during the time of negative surge. This results in an analysis system with a poorer performance than the steady-state Kalman filter. 0.025 Water level ensemble spread [m] 0.02 0.015 0.01 0.005 0 negative surge small surge positive surge SSKF Figure 7.19: Water level ensemble spread averaged over all assimilation stations. Note that ensemble spread is just one determinant factor of the analysis performance. Another factor influencing the analysis performance is the correlation structure. When these factors are incorrectly specified, the analysis performance will degrade quickly. Nevertheless, this figure suggests that there is a systematic underestimation of the ensemble spread in times of negative surge, which is also responsible for the poorer analysis performance. To make the presentation complete, we show in Figure (7.20) the rms hindcast waterlevel residual averaged over all validation stations. General pattern observed at assimilation stations can also be seen here, at least for the performance of analysis system using the 2nd closed-loop Kalman gain estimate: the ensemble based analysis system performs slightly better during the time of positive surge, equally well at small surge, and worse at negative surge than the steady-state Kalman filter. This figure also suggests that, without data assimilation, the DCSM performs worse during negative surge than positive surge. Ensemble storm surge data assimilation and forecast 137 However, we have checked that this is due to the fact that the skill of DCSM is different at different locations. The DCSM is known to perform inadequately in reproducing the water level in the northern part of the Dutch coast. The detail geometrical structure in the area can not be well resolved with the present grid size of the DCSM and the area is shallow. It is even more difficult to predict the water level at times of negative surge. Validation Stations 0.16 Control−run without KF EPS without KF EPS with KF, open−loop EPS with KF, 1st closed−loop EPS with KF, 2nd closed−loop Control−run with SSKF 0.14 RMS water level residual [m] 0.12 0.1 0.08 0.06 0.04 0.02 0 negative surge small surge positive surge Figure 7.20: As Fig.(7.18), for validation stations. We have also performed some forecast runs initialized from hindcast runs using the steady-state Kalman filter with the control-run member of LAMEPS wind as input forcing. For comparing the forecasts performance, we group the forecast runs, based on the initial time of each forecast, into the three categories described earlier. For illustration, we show in Figure (7.21) the rms of forecast water level residual at Vlissingen, averaged over the number of times each category occurs within the whole period of experiment. To increase the visibility of the graphs, we limit the presentation only up to 20 h forecast time. Shown also in the figures are the results associated with the deterministic forecast obtained by running the DCSM with the control run of LAMEPS and with the ensemble forecast without data assimilation. These figures suggest that during positive surge, the ensemble forecast performs slightly better than the single member control forecast. In the time of small and negative surge, however, the performance of ensemble forecast is slightly worse than the control run. The advantage of ensemble system over a single deterministic forecast (Leith 1974) is found only at the times of positive surge. Note that 7.7 Comparison with a steady-state Kalman filter 138 vlissgn (positive surge) 0.15 RMS water level residual [m] 0.13 Control run EPS without DA EPS with open−loop gain EPS with 1st closed−loop gain EPS with 2nd closed−loop gain Control run with SSKF 0.11 0.09 0.07 0 2 4 6 8 10 12 Forecast horizon [h] 14 16 18 20 vlissgn (small surge) 0.15 RMS water level residual [m] 0.13 0.11 0.09 Control run EPS without DA EPS with open−loop gain EPS with 1st closed−loop gain EPS with 2nd closed−loop gain Control run with SSKF 0.07 0.05 0 2 4 6 8 10 12 Forecast horizon [h] 14 16 18 20 vlissgn (negative surge) 0.15 RMS water level residual [m] 0.13 0.11 0.09 Control run EPS without DA EPS with open−loop gain EPS with 1st closed−loop gain EPS with 2nd closed−loop gain Control run with SSKF 0.07 0.05 0 2 4 6 8 10 12 Forecast horizon [h] 14 16 18 20 Figure 7.21: RMS of forecast water-level residual vs forecast horizon at Vlissingen, stratified into positive surge: surge larger than +20 cm, negative surge: surge less than -20 cm, and small surge: surge between -20 and 20 cm. all these facts are observed at most other locations as well. For the forecasts initialized from hindcasts, the figures indicate that, as shown earlier, Ensemble storm surge data assimilation and forecast all analysis systems produce improvements in short term forecasts over the control run as well as over the ensemble forecast without data assimilation. The figures also suggest that, in general, the forecasts initialized from hindcasts using the ensemble based analysis systems do not perform better than those using steady-state Kalman filter. At some locations, like Vlissingen, the performance of some of the ensemble based systems is slightly better than the steady-state Kalman filter. At other locations, their performance in forecasting positive and small surges is comparable with the steady-state Kalman filter. However, the steady-state Kalman filter is found to perform better than the ensemble-based systems for predicting negative surge. This result is found consistently on the stations located along the Dutch coast. A reason for this may be due to the difficulty with the atmospheric model for reconstructing the seaward wind near the land-sea area. It is known, for example, that the Dutch version of HIRLAM systematically underestimates the seaward wind above the sea near the land-sea area (de Vries, 2009, personal communication). Another possible reason for this may be due to the fact that this area is located relatively close to the southeast boundary of the LAMEPS. The wind initiated from this boundary is responsible to simulate the negative surge along the Dutch coast as well as along the British east coast. Since this area is located very closed to the south-east open boundary, the wind ensemble has not yet traveled sufficiently long when it arrives there. Hence, the ensemble has not yet spread sufficiently and hence underestimate the model uncertainty. As the consequence, our method performs even worse than the steady-state in reconstructing negative surges. The resulting ensemble-based analysis performance may also be associated with how the ensemble wind is generated. The ensemble is generated by perturbing the initial and boundary conditions. The perturbation is expected to grow to different trajectories, especially when the weather system is unstable. The seaward wind in this area, that is the southerly wind, is probably associated with a stable weather system. Thus, in this period, the ensemble members do not grow apart sufficiently and the ensemble spread becomes too narrow. Moreover, the wind ensemble is generated with an optimization interval of 48 h (Frogner and Iversen 2001). That is, the ensemble spread is expected to dominate the forecast error at the end of 48 h interval and possibly beyond. In our study, only the ensemble from the first 24 h is used to represent the error covariance. This naturally will result in wind ensemble with a spread smaller than its optimal value. We might try to use ensemble at later forecast time to compute the error covariance estimate. However, care must be taken on bias, which appear in the model forecasts even only after the forecast time 12 h as indicated by the rank histograms of the forecast ensemble on Figure (7.15). The LAMEPS wind ensemble is generated by perturbing the initial and boundary conditions of the Norwegian version of HIRLAM. This implicitly assumes that the forecast model is perfect. In fact, a numerical forecast model always contains model error. An approach in generating forecasts ensemble, which take into account the model error is by using the multi-model approach (e.g. Palmer et al. 2004; Hagedorn et al. 2005). Multi-model ensemble may consist of models from different centers or different versions of the same model (e.g. Houtekamer et al. 1996). The difference in the forecast mod- 139 7.8 Conclusion 140 els may simulate some of the flow-dependent unexplained intermittent sources of model error. Some studies have shown that multi-model approach performs better than singlemodel ensemble system (Atger 1999; Ziehmann 2000). A further study is required using multi-model ensemble approach to see if it may yield a better performance, both in the atmospheric models producing wind ensemble and in the storm surge forecasting models. It should be noted here that in most cases and locations, the ensemble forecasts performance initialized from hindcasts using 1st closed-loop Kalman gain estimate is better than those using open-loop Kalman gain estimate and it is slightly improved further using the 2nd closed-loop Kalman gain. This indicates the importance of the iterative procedure for improving the Kalman gain estimate. However, this is not true for the station located in the northern part of the Dutch coasts during negative surge. At these locations, the forecast performance deteriorates as Kalman gain from later iterations are used for analysis. This is likely because this area is very shallow and so it is difficult to predict the water level. It is even more difficult to predict the water level during negative surge. 7.8 Conclusion We have presented in this paper our work with the Dutch Continental Shelf Model (DCSM), an operational model used for storm surge forecast in the Netherlands. The uncertainty of this model is known to be mainly due to uncertain wind input. To correct the model from such error, in the operational system, the model is coupled with a data assimilation system, which is based on steady-state Kalman filter. The steady-state Kalman filter is developed based on the assumptions that the forecast error is stationary in time and homogeneous in space. In this work, we have relaxed these assumptions by representing the forecast uncertainty by using ensemble of storm surge forecasts. The ensemble of the storm surge forecasts is generated by forcing the DCSM using each member of the 10 m wind speed ensemble of the LAMEPS, which is an operational ensemble weather prediction system at the Norwegian Meteorological Institute. To account for the fact that wind error is correlated in time, each member of the wind ensemble is considered to be deviated from the truth by a colored-noise term, modelled by an AR(1) process. The temporal correlation parameter is determined empirically from the wind ensemble. The data assimilation used in this study is an extension of the two-sample Kalman filter published earlier (Sumihar et al. 2008). The two-sample Kalman filter is an iterative procedure for computing steady-state forecast error covariance, hence a steady-state Kalman gain, using two samples of the system of interest by averaging the difference of the two samples over time. In this work, the algorithm is generalized to work with n samples, where n > 2. By using more ensemble members, the averaging can be done over shorter time. The experiment is done by dividing the experiment period into shorter time windows. The data assimilation procedure is done at each time window. As the first step, we have chosen the time window length to be 24 h. Ensemble storm surge data assimilation and forecast The resulting analysis ensemble has shown to have smaller rms water-level residual than the one without data assimilation, at both assimilation and validation stations. This indicates that the estimated covariances have consistent structure that they have distributed properly the corrections to locations other than the assimilation stations. The forecasts initialized from these analyses are more accurate than the ones without data assimilation, at least for the short term forecasts. A comparison with a steady-state Kalman filter is performed. For the existing available data, the results do not suggest if the ensemble-based assimilation systems yield much better performance than the steady-state Kalman filter. For large positive surges the ensemble based prediction are a little better than the steady-state Kalman filter. On the other hand, there is an indication that the steady-state Kalman filter performs better than the ensemble-based systems for predicting negative surges. The ensemble spread during negative surge is found to be too narrow. Further effort is required to develop methods for generating better ensemble. 141 142 7.8 Conclusion 8 Conclusion This thesis presents several methods to improve the accuracy of water level prediction. In this study, as the first method to improve water level forecast accuracy, we used the method developed by Heemink et al. (1991) to predict periodic components of a harmonic analysis residual. The residual of harmonic analysis usually contains some periodic components with slowly varying harmonic parameters. Each periodic component oscillates with a frequency of a tidal constituent. The idea of this method is to model the harmonic analysis residual, called observed surge in this thesis, as the sum of several such periodic components. A state-space representation and a measurement model have been developed for this summation, the harmonic parameters being the state of the model. The time-varying harmonic parameters of the periodic component from within an observed surge timeseries are estimated by using a Kalman filter. The estimated harmonic parameters are used to forecast the periodic components, which in turn is used as correction to the forecast made by harmonic analysis in the future. This method has been shown to be able to extract the periodic component from within observed surge and it can be used to correct the forecast using harmonic analysis. It has been found to work reasonably well during calm period. However, during stormy period, the method might produce worse prediction at some locations. This may be due to the nonlinear interaction between tides and meteorological component. This hampers the method from increasing the accuracy of overall forecast using harmonic analysis. The other methods that we explored in this research fall into the category of data assimilation for model reinitialization. Almost all data assimilation methods require the specification of forecast error statistics. In reality, however, it is difficult to estimate such statistics, since the true state in never known exactly. One common approach of estimating the error statistics is by using the difference of several different similarly-skillful models as proxy to the unknown true error. To accommodate the study, we first developed a method for computing a steady-state Kalman filter, in the case when only two samples of the system are available. We call this method as two-sample Kalman filter. Using idealized experiments, where all statistics are known and errors are generated using random generator, we have been able to show that our method can reproduce the solution obtain by a Kalman filter. We have implemented the two-sample Kalman filter with the Dutch Continental Shelf Model (DCSM), a numerical model for storm surge prediction, which is operational in the 143 144 Netherlands. Here, the two samples of the DCSM are generated by running the DCSM twice using two different wind fields produced by two meteorological centers. The first wind field is produced by the Dutch version of High Resolution Limited Area Model (HIRLAM) operational at the Royal Dutch Meteorological Institute (KNMI), while the second one is from the UK Met Office (UKMO). The two wind fields are found to contain systematic difference. Some care must be taken to eliminate this systematic difference before implementing them to the two-sample Kalman filter. The data assimilation using Kalman gain estimated from this difference has successfully forced the DCSM closer to measurements, at all assimilation and validation stations used in this study. Forecasts initialized from these analyses are found to have better accuracy than the ones without data assimilation, at least for short terms. Due to the physics of gravity wave propagation in the area of DCSM, the period of improved accuracy is limited to about 12 h for the locations along the Dutch coast. We have also compared the forecast performance to those obtained using a steady-state Kalman filter based on an isotropic assumption. The system with UKMO-HIRLAM difference is found to outperform the isotropic-based steady-state Kalman filter. The two-sample Kalman filter is based on the assumption that model error is stationary, hence the forecast error covariance can be estimated by averaging over time. In fact, model error is varying in time along with the actual state. To further improve the accuracy, we have extended the two-sample Kalman filter algorithm to work with more than two samples. By working with more samples, the averaging can be done over a shorter time. Therefore, it relax the assumption of stationarity. By doing this, the performance of the analysis system can be improved further. We have implemented the extended version of the two-sample Kalman filter by using the wind ensemble of the LAMEPS. This is an ensemble weather forecast system operational at the Norwegian Meteorological Institute. It consists of 20 perturbed members and one control run. Here, following the fact that LAMEPS is analysed once a day at 18:00, the computation of the forecast error covariance is performed over 24 h time and all ensemble members. Like the steady-state Kalman filter, the ensemble-based analysis system has also successfully steered the DCSM closer to measurement and short-term forecasts are improved relative to the ones without data assimilation. We have also compared the performance to the steady-state Kalman filter based on UKMO-HIRLAM difference. The results of the experiments do not give a clear evidence if the ensemble-based method is better than the steady-state Kalman filter. In contrast, there is an indication that the steadystate Kalman filter performs better than the ensemble-based system in predicting negative surge, at least for the locations along the Dutch coast. The resulting ensemble spread during negative surge is found to be narrower than the standard deviation assumed by the steady-state Kalman filter. Further effort is required to develop methods for generating better ensemble. The LAMEPS wind ensemble is generated by perturbing the initial and boundary conditions of the Norwegian version of HIRLAM. This implicitly assumes that the forecast model is perfect. For future research, we recommend to study using multi-model wind Conclusion ensemble as input forcing to the DCSM. By using different forecast models, some of the unavoided model error may be better simulated. This may avoid the underestimation of the forecast error covariance. Moreover, the forecast models should cover the DCSM area in such a way that the DCSM area lies in sufficiently large distance from the model boundaries. This may help improve the performance of predicting negative surge. All the methods explored in this study focus on improving forecast accuracy by correcting the random component of the error. In fact, we have seen in this study that bias is also a pronounced error component. Therefore, we recommend for the future studies to focus on correcting systematic error component as well. 145 Bibliography Alves, O. and Robert, C. (2005) Tropical pacific ocean model error covariances from monte carlo simulations, Q.J.R.Meteorol.Soc., 131, pp. 3643–3658. Anderson, J. L. and Anderson, S. L. (1999) A monte carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts, Monthly Weather Review, 127, pp. 2741–2758. Atger, F. (1999) The skill of ensemble prediction systems, Monthly Weather Review, 127, pp. 1941–1953. Bertino, L., Evensen, G., and Wackernagel, H. (2002) Combining geostatistics and kalman filtering for data assimilation in an estuarine system, Inverse Problems, 18, pp. 1–23. Borovikov, A., Rienecker, M., Keppenne, C., and G.C.Johnson (2005) Multivariate error covariance estimates by monte carlo simulation for assimilation studies in the pacific ocean, Monthly Weather Review, 133, pp. 2310–2334. Buehner, M. (2004) Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operational nwp setting, Q.J.R.Meteorol.Soc., 128, pp. 1–30. Buizza, R. and Palmer, T. N. (1995) The singular-vector structure of the atmospheric global circulation, Journal of The Atmospheric Sciences, 52, 9, pp. 1434–1455. — (1998) Impact of ensemble size on ensemble prediction, Monthly Weather Review, 126, pp. 2503–2518. Candille, G., Cote, C., Houtekamer, P., and Pellerin, G. (2007) Verification of an ensemble prediction system against observations, Monthly Weather Review, 135, pp. 2688–2699. Canizares, R., Madsen, H., Jensen, H. R., and Vested, H. J. (2001) Developments in operational shelf sea modelling in danish waters, Estuarine, Coastal and Shelf Science, 53, pp. 595–605. Daley, R. (1991) Atmospheric Data Analysis, Cambridge University Press. Daley, R. and Barker, E. (2001) Navdas: formulation and diagnostics, Monthly Weather Review, 129, pp. 869–883. de Meyer, F. (1984) Time-variant tidal analysis and the kalman filter, Annales Geophysicae, 2, 5, pp. 513–528. 146 BIBLIOGRAPHY Dee, D. P. (1991) Simplification of the kalman filter for meteorological data assimilation, Q.J.R.Meteorol. Soc., 117, pp. 365–384. — (1995) On-line estimation of error covariance parameters for atmospheric data assimilation, Monthly Weather Review, 123, pp. 1128–1145. Demirov, E., Pinardi, N., Fratianni, C., Tonani, M., Giacomelli, L., and de Mey, P. (2003) Assimilation scheme of the mediterranean forecasting system: operational implementation, Annales Geophysicae, 21, pp. 189–204. Devenyi, D., Benjamin, S. G., and Weygandt, S. S. (2004) The ruc 3dvar: operational performance and recent improvements, in: 20th Conf. on Weather Analysis and Forecasting, Seattle,WA, Amer. Meteoc. Soc., cD-ROM,P1.20. Evensen, G. (1994) Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlo methods to forecast error statistics, Journal of Geophysical Research, 99, C5, pp. 10143–10162. — (1996) Advanced data assimilation for strongly nonlinear dynamics, Monthly Weather Review, 125, pp. 1342–1354. — (2003) The ensemble kalman filter: theoretical formulation and practical implementation, Ocean Dynamics, 53, pp. 343–367. Fenton, G. A. and Griffiths, D. V. (2007) Random field generation and the local average subdivision method, in: Probabilistic Method in Geotechnical Engineering (D. V. Griffiths and G. A. Fenton, eds.), Springer Vienna, pp. 201–223. Fisher, M. (2003) Background error covariance modelling, ECMWF Seminar. Frogner, I.-L., Haakenstad, H., and Iversen, T. (2006) Limited-area ensemble predictions at the norwegian meteorological institute, Q.J.R.Meteorol.Soc., 132, pp. 2785–2808. Frogner, I.-L. and Iversen, T. (2001) Targeted ensemble prediction for northern europe and parts of the northern atlantic ocean, Tellus, 53A, pp. 35–55. — (2002) High-resolution limited-area ensemble predictions based on low-resolution targeted singular vectors, Q.J.R.Meteorol.Soc., 128, pp. 1321–1341. Fu, L.-L., Fukumori, I., and Miller, R. N. (1993) Fitting dynamic models to the geosat sea level observations in the tropical pacific ocean. part ii: A linear wind-driven model, Journal of Physical Oceanography, 23, pp. 2162–2181. Fukumori, I. (1995) Assimilation of topex sea level measurements with a reduced-gravity shallow water model of the tropical pacific ocean, Journal of Geophysical Research, 100, C12, pp. 25027–25039. 147 148 BIBLIOGRAPHY Fukumori, I., Benveniste, J., Wunsch, C., and Haidvogel, D. B. (1993) Assimilation of sea surface topography into an ocean circulation model using a steady-state smoother, Journal of Physical Oceanography, 23, pp. 1831–1855. Fukumori, I. and Malanotte-Rizzoli, P. (1995) An approximate kalman filter for ocean data assimilation; an example with an idealized gulf stream model, Journal of Geophysical Research, 100, C12, pp. 6777–6793. Gaspari, G. and Cohn, S. (1999) Construction of correlation functions in two and three dimensions, Q.J.R.Meteorol.Soc., 125, pp. 723–757. Gerritsen, H. (2005) What happened in 1953? the big flood in the netherlands in retrospect, Phil. Trans. R. Soc. A, 363, pp. 1271–1291. Gerritsen, H., de Vries, J., and Philippart, M. (1995) The dutch continental shelf model, in: Quantitative Skill Assessment for Coastal Ocean Models. Coastal and Estuarine Studies (D. Lynch and A. Davies, eds.), American Geophysical Union, vol. 47, pp. 425–467. Hagedorn, R., Doblas-Reyes, F. J., and Palmer, T. N. (2005) The rationale behind the success of multi-model ensembles in seasonal forecasting - i. basic concept, Tellus, 57A, pp. 219–233. Hamill, T. M. (2001) Interpretation of rank histograms for verifying ensemble forecasts, Monthly Weather Review, 129, pp. 550–560. Hamill, T. M. and Withaker, J. S. (2001) Distance-dependent filtering of background error covariance estimates in an ensemble kalman filter, Monthly Weather Review, 129, pp. 2776–2790. Heemink, A. W. (1986) Storm surge prediction using Kalman filtering, Ph.D. thesis, Twente University of Technology, The Netherlands. — (1988) Two-dimensional shallow water flow identification, Appl. Math. Modelling, 12, pp. 109–118. — (1990) Identification of wind stress on shallow water surfaces by optimal smoothing, Stochastic Hydrology and Hydraulics, 4, pp. 105–119. Heemink, A. W., de Jong, B., and Prins, H. (1991) Adaptive harmonic analysis, Deutsche Hydrographische Zeitschrift, 44, 2, pp. 91–106. Heemink, A. W. and Kloosterhuis, H. (1990) Data assimilation for non-linear tidal models, International Journal for Numerical Methods in Fluids, 11, 2, pp. 1097–1112. Hirose, N., Fukumori, I., and Yoon, J. H. (1999) Assimilation of topex/poseidon altimeter data with a reduced gravity model of the japan sea, Journal of Oceanography, 55, pp. 53–64. BIBLIOGRAPHY Hollingsworth, A. and L¨onnberg, P. (1986) The statistical structure of short-range forecast errors as determined from radiosonde data. part i: the wind field, Tellus, 38A, pp. 111–136. Houtekamer, P. L., Lefaivre, L., Derome, J., Ritchie, H., and Mitchell, H. L. (1996) A system simulation approach to ensemble prediction, Monthly Weather Review, 124, pp. 1225–1242. Houtekamer, P. L. and Mitchell, H. L. (1998) Data assimilation using an ensemble kalman filter technique, Monthly Weather Review, 126, pp. 796–811. — (2001) A sequential ensemble kalman filter for atmospheric data assimilation, Monthly Weather Review, 129, pp. 123–137. Jazwinski, A. H. (1970) Stochastic processes and filtering theory, Academic Press Inc. Kalman, R. E. (1960) A new approach to linear filtering and prediction problems, Journal of Basic Engineering, 82, 1, pp. 35–45. Kantha, L. H. and Clayson, C. A. (1999) Numerical Models of Oceans and Oceanic Processes, Academic Press. Leendertse, J. J. (1967) Aspects of a computational model for long-period free-surface flow modelling, no. Report RM-5294-RR, Rand Corporation, Santa Monica, California, USA. Leeuwenburgh, O. (2007) Validation of an enkf system for ogcm initialization assimilating temperature, salinity, and surface height measurements, Monthly Weather Review, 135, pp. 125–139. Leith, C. E. (1974) Theoretical skill of monte carlo forecasts, Monthly Weather Review, 102, pp. 409–418. Lermusiaux, P. F. J., Chiu, C.-S., Gawarkiewicz, G., Abbot, P., Robinson, A., Miller, R., Haley, P., Leslie, W., Majumdar, S., Pang, A., and Lekien, F. (2006) Quantifying uncertainties in ocean predictions, Oceanography, 19, 1, pp. 81–93. Lorenc, A. C., Bell, R. S., and Macpherson, B. (1991) The meteorological office analysis correction data assimilation scheme, Q.J.R.Meteorol.Soc., 117, pp. 59–89. Lorenz, E. N. (1963) Deterministic non-periodic flow, Journal of the Atmospheric Sciences, 20, pp. 130–141. Madsen, H. and Canizares, R. (1999) Comparison of extended and ensemble kalman filters for data assimilation in coastal area modelling, International Journal for Numerical Methods in Fluids, 31, pp. 961–981. 149 BIBLIOGRAPHY 150 Maybeck, P. S. (1979) Stochastic Model Estimation, and Control Volume 1, Academic Press Inc. Mehra, R. K. (1970) On the identification of variances and adaptive kalman filtering, IEEE Transaction on Automatic Control, AC-15, 2, pp. 175–184. — (1972) Approaches to adaptive filtering, IEEE Transaction on Automatic Control, pp. 693–698. Miller, R. N., Ghil, M., and Gauthiez, F. (1994) Advanced data assimilation in strongly nonlinear dynamical systems, Journal of the Atmospheric Sciences, 51, 8, pp. 1037– 1056. Mitchell, H. L. and Houtekamer, P. L. (2000) An adaptive ensemble kalman filter, Monthly Weather Review, 128, pp. 416–433. Mitchell, H. L., Houtekamer, P. L., and Pellerin, G. (2002) Ensemble size, balance, and mode-error representation in an ensemble kalman filter, Monthly Weather Review, 130, pp. 2791–2808. Ogata, K. (1990) Modern Control Engineering, Prantice-Hall, Inc. Oke, P. R., Allen, J. S., Miller, R. N., Egbert, G. D., and Kosro, P. M. (2002) Assimilation of surface velocity data into a primitive equation coastal ocean model, Journal of Geophysical Research, 107, 3122, doi: 10.1029/2000JC000511. Palmer, T., Alessandri, A., Andersen, U., Cantelaube, P., Davey, M., Dlcluse, P., Dqu, M., Dez, E., Doblas-Reyes, F., Feddersen, H., Graham, R., Gualdi, S., Gurmy, J., Hagedorn, R., Hoshen, M., Keenlyside, N., Latif, M., Lazar, A., Maisonnave, E., Marletto, V., Morse, A., Orfila, B., Rogel, P., Terres, J., , and Thomson, M. (2004) Development of a european multimodel ensemble system for seasonal-to-interannual prediction (demeter), Bull. Amer. Meteor. Soc., 85, pp. 853–872. Parish, D. F. and Derber, J. C. (1992) The national meteorological center’s spectral statistical-interpolation analysis system, Monthly Weather Review, 120, pp. 1747– 1763. Rogers, S. R. (1988) Efficient numerical algorithm for steady-state kalman covariance, IEEE Transactions on Aerospace and Electronic Systems, 24, 6, pp. 815–817. Sætra, O., Hersbach, H., Bidlot, J.-R., and Richardson, D. (2004) Effects of observations errors on the statistics for ensemble spread and reliability, Monthly Weather Review, 132, pp. 1487–1501. Sakov, P. and Oke, P. R. (2008) A deterministic formulation of the ensemble kalman filter: an alternative to ensemble square root filters, Tellus-A, 60A, pp. 361–371. BIBLIOGRAPHY Schiereck, G. J. and Eisma, D. (2002) The rotterdam harbour: the connection with the north sea and europoort, in: Engineered Coasts (J. Chen, K. Hotta, and H. J. Walker, eds.), Springer, pp. 269–290. Segers, A. J., Heemink, A. W., Verlaan, M., and van Loon, M. (2000) A modified rrsqrtfilter for assimilating data in atmospheric chemistry models, Environmental Modelling and Software, 15, pp. 663–671. Serafy, G. Y. H. E. and Mynett, A. E. (2008) Improving the operational forecasting system of the stratified flow in osaka bay using an ensemble kalman filter-based steady state kalman filter, Water Resources Research, 44, W06416, pp. 1–192, doi: 10.1029/2006WR005412. Sørensen, J. V. T., Madsen, H., and Madsen, H. (2004a) Efficient kalman filter techniques for the assimilation of tide gauge data in three-dimenstional modeling of the north sea and baltic sea system, Journal of Geophysical Research, 109, C03017, pp. 1–14. — (2004b) Operational data assimilation in marine modeling: water level forecasting with regularisation techniques, in: Hydroinformatics, World Scientific Publishing Company, vol. 2, pp. 1197–1204. — (2006) Parameter sensitivity of three kalman filter schemes for assimilation of water levels in shelf sea models, Ocean Modelling, 11, pp. 441–463. Stelling, G. S. (1984) On the construction of computational methods for shallow water flow problems, Ph.D. thesis, Delft University of Technology, The Netherlands. Sumihar, J. H., Verlaan, M., and Heemink, A. W. (2008) Two-sample kalman filter for steady-state data assimilation, Monthly Weather Review, 136, pp. 4503–4516. — (2009) Storm surge forecast with two-sample kalman filter, submitted. Sun, C., Rienecker, M. M., Rosaty, A., Harrison, M., Wittenberg, A., Keppenne, C. L., Jacob, J. P., and Kovach, R. M. (2007) Comparison and sensitivity of odasi ocean analyses in the tropical pacific, Monthly Weather Review, 135, pp. 2242–2264. Talagrand, O. (1997) Assimilation of observations, an introduction, Journal of the Meteorological Society of Japan, 75, pp. 191–209. ten Brummelhuis, P. G. J. and Heemink, A. W. (1990) Parameter identification in tidal models with uncertain boundary conditions, Stochastic Hydrology and Hydraulycs, 4, pp. 193–208. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S. (2003) Ensemble square root filters, Monthly Weather Review, 131, pp. 1485–1490. 151 152 BIBLIOGRAPHY Treebushny, D. and Madsen, H. (2005) On the construction of a reduced rank squareroot kalman filter for efficient uncertainty propagation, Future Generation Computer Systems, 21, pp. 1047–1055. Und´en, P., Rontu, L., J¨arvinen, H., Lynch, P., Calvo, J., Cats, G., Cuxart, J., Eerola, K., Fortelius, C., Garcia-Moya, J. A., Jones, C., Lenderlink, G., McDonald, A., McGrath, R., Navascues, B., Nielsen, N., Ødegaard, V., Rodriquez, E., Rummukainen, M., R¯oo¯ m, R., sattler, K., Sass, B., Savij¨arvi, H., B.W.Schreur, Sigg, R., The, H., and Tijm, A. (2002) Hirlam-5 scientific documentation, Available online from http://hirlam.org/. Verboom, G. K., de Ronde, J. G., and van Dijk, R. P. (1992) A fine grid tidal flow and storm surge model of the north sea, Continental Shelf Research, 12, 2/3, pp. 213–233. Verlaan, M. (1998) Efficient Kalman Filtering Algorithms for Hydrodynamics models, Ph.D. thesis, Delft University of Technology, The Netherlands. Verlaan, M. and Heemink, A. W. (1997) Tidal flow forecasting using reduced rank square root filters, Stochastic Hydrology and Hydraulics, 11, pp. 349–368. Verlaan, M., Zijderveld, A., de Vries, H., and Kroos, J. (2005) Operational storm surge forecasting in the netherlands: developments in the last decade, Phil. Trans. R. Soc. A, 363, pp. 1441–1453. von Storch, H. and Woth, K. (2008) Storm surges: perspectives and options, Sustain Sci, 3, pp. 33–43. Whitaker, J. S. and Hamill, T. M. (2002) Ensemble data assimilation without perturbed observations, Monthly Weather Review, 130, pp. 1913–1924. Wilks, D. S. (2006) Statistical methods in the atmospheric sciences, Academic Press. Wong, E. (1971) Stochastic processes in information and dynamical system, Mc.GrawHill Book Company. Xu, Q., Lu, H., Gao, S., Xue, M., and Tong, M. (2008a) Time-expanded sampling for ensemble kalman filter: assimilation experiments with simulated radar observations, Monthly Weather Review, 136, pp. 2651–2667. Xu, Q. and Wei, L. (2001) Estimation of three-dimensional error covariances. part ii: analysis of wind innovation vectors, Monthly Weather Review, 129, pp. 2939–2954. Xu, Q., Wei, L., Lu, H., Qiu, C., and Zhao, Q. (2008b) Time-expanded sampling for ensemble-based filters: assimilation experiments with a shallow-water equation model, Journal of Geophysical Research, 113, D02114, pp. 1–12. Yang, R., Guo, J., and Riishøjgaard, L. P. (2006a) Application of an error statistics estimation method to the psas forecast error covariance model, Advances In Atmospheric Sciences, 23, 1, pp. 33–44. BIBLIOGRAPHY Yang, S.-C., Baker, D., Li, H., Cordes, K., Huff, M., Nagpal, G., Okereke, E., Villafane, J., Kalnay, E., and Duane, G. S. (2006b) Data assimilation as synchronization of truth and model: experiments with the three-variable lorenz system, Journal of the Atmospheric Sciences, 63, pp. 2340–2354. Yen, P. H., Jan, C. D., Lee, Y. P., and Lee, H. F. (1996) Application of kalman filter to short-term tide level prediction, Journal of Waterway, Port, Coastal, and Ocean Engineering, 122, 5, pp. 226–231. Ziehmann, C. (2000) Comparison of a single-model eps with a multi-model ensemble consisting of a few operational models, Tellus, 52A, pp. 280–299. 153 154 BIBLIOGRAPHY Samenvatting Twee aanpakken voor het verbeteren van de nauwkeurigheid van waterstandsvoorspellingen worden in deze studie onderzocht. De eerste aanpak concentreert op de verbetering van de voorspellingsnauwkeurigheid van de astronomische getij component. Hierbij wordt een methode toegepast ten behoeve van het analyseren en voorspellen van de periodieke componenten van het residue van de harmonisch analyse. Deze methode is redelijk nauwkeurig tijdens rustige weersomstandigheden, maar niet tijdens een stormachtige periode. Deze constatering heeft geleid tot een voortzetting van de studie met een geheel andere aanpak waarbij een nieuw data assimilatie is geimplementeerd in het operationele twee-dimensionale stormvloedkeringvoorspellingsmodel. Het operationele stormvloedvoorspelmodel in Nederland maakt gebruik van een steadystate Kalman filter om nauwkeurige beginvoorwaarden voor de voorspelsimulatie van het model te berekenen. Een belangrijke factor die bepalend is voor het succes van het Kalman filter, is de specificatie van de modelfout covariantie. In het operationele systeem wordt de modelfout gemodelleerd op basis van de aannames van isotropie en homogeniteit. In deze studie onderzoeken we het gebruik van het verschil tussen de voorspellingen van twee vergelijkbare atmosferische modellen om de onbekende fout van het model te representeren. Tevens wordt een nieuwe methode voor de berekening van een steady-state Kalman gain, het zogenaamde two-sample Kalman filter, ontwikkeld. Dit is een iteratieve procedure voor de berekening van de steady-state Kalman gain met behulp van twee realisaties van het proces. Een groot aantal experimenten zijn uitgevoerd om aan te tonen dat dit algoritme de goede oplossingen produceert. Het two-sample Kalman filter algoritme is toegepast met behulp van de wind voorspellingen van twee meteorologische centra: het Koninklijk Nederlands Meteorologisch Instituut (KNMI) en UK Met Office (UKMO). De bias of systematische fout wordt eerst ge¨elimineerd voordat de wind voorspelling worden gebruikt voor het two-sample Kalman filter. De ruimtecorrelatie van de systeemfout wordt geschat uit deze twee windvoorspellingen en blijkt, in tegenstelling tot de aanname in het huidige operationele systeem, sterk anisotroop. Het steady-state Kalman filter dat op basis van deze foutencovariantie wordt gevonden resulteert in nauwkeuriger voorspellingen. Voor de meetstations langs de Nederlandse kust wordt de nauwkeurigheid van de voorspelling tot ongeveer 12 uur vooruit verbeterd. Ter verdere verbetering van het data assimilatie systeem wordt het two-sample Kalman filter uitgebreid tot het gebruik van meerdere realisaties. Door meer realisaties te gebruiken kan de berekening van de foutencovariantie worden uitgevoerd door middeling over een kortere periode. Hierdoor kan de stationariteit over een kortere periode worden aangenomen en zal naar verwachting de modelfout nog nauwkeuriger worden gerepresenteerd. In deze studie wordt dit algoritme toegepast met behulp van een wind-ensemble van LAMEPS, het systeem dat bij het Noorse Meteorologische Instituut operationeel wordt gebruikt. De resultaten zijn hierbij vergelijkbaar met het steady-state Kalman filter ti155 156 Samenvatting jdens grote positieve stormvloeden. Echter blijkt het steady-state Kalman filter beter te presteren dan het ensemble systeem bij de voorspelling van negatieve stormvloeden. Verder onderzoek naar de wind ensemble aanpak is daarom vereist. Curriculum Vitae Julius H. Sumihar was born on 28th July 1975 and grew up in Bandung, Indonesia. He attended the high shool Sekolah Menengah Atas Negeri 3 Bandung in 1990. In 1993, he started his bachelor study at the Department of Engineering Physics, Institut Teknologi Bandung, Indonesia. He obtained his engineer degree in 1998. During the next three years, he explored different possibilities for his career. In this period, he became interested in the application of mathematics in the field of environmental system modelling and prediction. To pursue this interest, in 2001 he left his country to study at the MSc program of Mathematics in Risk and Environmental Modelling, at Technische Universiteit Delft, The Netherlands. This graduate program was supervised by Prof. Roger Cooke and Prof. Arnold Heemink. He obtained his MSc degree in 2003, with a thesis about adaptive harmonic analysis using Kalman filter. In October 2004, he began his PhD study at Delft Institute of Applied Mathematics, under the supervision of Dr. Martin Verlaan and Prof. Arnold Heemink. His research was about data assimilation and its application for storm surge forecasting, whose results are written in this thesis. Since March 2009 he has been working at Deltares/TNO as a researcher. 157
© Copyright 2024