Data assimilation and biogeochemical modeling – Why is it so “hard”? Yvette H. Spitz Oregon State University, College of Earth, Ocean, and Atmospheric Sciences Corvallis, OR 97331 [email protected] Two famous quotes: “Ecologists do not have the equivalent of the Navier Stokes equations” “To perform data assimilation, one has to have a model and some data” J.J. O’Brian (Summer school 1993) (and assimilation technique) Are these statements still valid and what are the implications on uncertainties in the estimates (model state variables and/or parameters)? Sources of uncertainties and quantification • Model structure • Model forcing: physical and biological • Assimilated data: spatial and temporal scarcity – measurement error • Assimilation techniques: Strengths and weaknesses – impact on the estimates • Quantification of “errors” via/after assimilation of biogeochemical data in coupled circulation/biogeochemical models Model Structure All based on the same structure with more or less complexity (including the Darwin model) Parameterization of the pathways can be different Fennel et al. (2011) Franks et al. (1986) Schematic of BIOMAS’ Pelagic Ecosystem Model NH4 DOM Sinking NH4 NO3 Si(OH)4 Vertical Migration Diatoms (PD) Flagellates (PF) Predators (ZP) Copepods (ZL) Detritus Small Zoo (ZS) opal DOM Zhang, Spitz et al. (2010) – based on Nemuro (Kishi et al. 2007) Fashman et al. (1990) DON / DOC Dissolved Organic Nitrogen -> Spitz et al. (2001) Phytoplankton Chlorophyll-a Nitrate Ammonium Mesozoo. Bacteria Nano/Microzoo. Zooplankton Large Detritus Detritus Based on Spitz et al. (2001) and Nemuro M. Rodrigues, A. Oliveira, H.Queiroga, Y.J. Zhang, A.B. Fortunato, A Baptista (2007) MIRO ECOSYSTEM STRUCTURE NH4 OPM OPC DOM BAC POM NF µZoo PO4 NO3 DA MZoo Si Benthic diagenesis 126 parameters to estimate Lancelot, Spitz et al., 2005 The Regional Ecosystem Modeling Intercomparison Testbed Project Marjorie Friedrichs, Larry Anderson, Rob Armstrong, Fei Chai, Jim Christian, Scott Doney, John Dunne, Jeff Dusenberry, Masahiko Fujii, Raleigh Hood, John Klinck, Dennis McGillicuddy, Markus Schartau, Yvette Spitz, Jerry Wiggert To quantitatively compare pelagic ecosystem models against data in a standardized one-dimensional framework • Which ecosystem structures are most robust? • How much complexity is justified? • Is it feasible to develop models that are applicable over many diverse ecosystems? Experiment 1: Individual assimilation Experiment 2: Simultaneous assimilation Which types of models reproduce mean PP and chl? Experiment 3: Cross validation Which types of models are most portable? Model data misfit Initial model-data comparison (pre-assimilation) EqPac + Arabian Sea LST 1 2 3 4 5 6 7 8 9 10 11 12 Model Number (increasing complexity) Cost function comparison Expt 1 & 2 Cost function Expt 1 Expt 2 Fe Fe Fe Fe MM LST 1 2 3 4 5 6 7 8 9 10 11 12 Model Number (increasing complexity) - MM and LST models do quite well - Simple NPZD models (#1-4) can reproduce data separately at each site, but not at both simultaneously - More complex models (#5-12) do not necessarily better - Only 4 models do substantially better than MM/LST: those with Fe Production vs Chlorophyll: Expt. 2 Arabian Sea [mmol C m -2 d-1] Mean Integrated PP data models data EqPac Mean Integrated Chl [mg chl m-2 ] - Models with multiple P size classes are slightly better able to reproduce chl - No relationship between number of P (or Z) compartments, and how well production is reproduced Production vs Chlorophyll: Expt. 2 Arabian Sea [mmol C m -2 d-1] Mean Integrated PP data models Fe data EqPac Mean Integrated Chl [mg chl m-2 ] - Only models with iron are capable of reproducing observed PP in EqPac, and these models do much better in AS as well No Fe Portability Index - Models with more P and Z size classes are not necessarily more portable than models with single P and Z size classes Conclusions from The Testbed Project • Simple NPZD (no Fe) models can fit data well at individual sites, but have difficulty simultaneously fitting data at both sites • Multiple size class models with iron are best able to fit data simultaneously at both sites. – Include a PZND+Fe model • Half the models do not reproduce the data as well as the mean (MM) and empirical (LST) models – Include additional sites (BATS, HOT, NABE, Southern Ocean) • No apparent trend in portability with model complexity – Examining additional portability indices • Models can fit data similarly well, but do so via very different pathways – Need data that better constrain the model flows and dynamics Importance of model pathways DON / DOC Dissolved Organic Nitrogen Phytoplankton Chlorophyll-a Nitrate Ammonium Mesozoo. Bacteria Nano/Microzoo. Zooplankton Fashman et al. (1990) Large Detritus Detritus -> Spitz et al. (2001) Annual fluxes for the upper mixed-layer Original model (Fasham et al, 1990) New model with data assimilation at BATS New production system to remineralized system Comparison of fluxes using two formulations of the microbial loop Spitz et al. (2001) Fasham et al. (1990) the NH regeneration is from different sources. Using the FDM uptake formulation, the main source of NH is from nano/microzooplankton excretion. In our model simulation, the NH source is split between bacteria regeneration and nano/microzooplankton excretion. Importance of Microbial Loop CASE 1 Dissolved Organic DON / DOC Nitrogen N2 fixation E x Phytoplankton Chlorophyll-a Nitrate Ammonium Mesozoo. Bacteria Nano/Microzoo. Zooplankton Large Detritus Detritus Rem Ex = γ1 Phyto + γ2 f(I) uptake (NO3, NH4) Phyto Rem = (m1 + m2 bact) CASE 2 CASE 3 N2 fixation N2 fixation Phytoplankton (Chlorophyll-a) (a) Ammonium Phytoplankton (Chlorophyll-a) Nitrate (a) Ammonium Mesozoo. Ex Nitrate (a) (a) Mesozoo. Ex (a) (a) Nano/Microzoo. (a) (c) Detritus (a) Rem = 0.096 d-1 (HOT) Rem = 0.03 d-1 (BATS) (a) (b) (a) Nano/Microzoo. (a) (c) Detritus (a) Rem = 0.03 d-1 (HOT and BATS) HOT - Case (2) Exudation to detritus HOT - Basic Case (1) HOT - Case (3) Exudation to ammonium (a) Chlorophyll-a (b) (b) = rem and sinking rate = 2 x (a) In all cases, the remineralization length scale is the same (c) (c) = rem and sinking rate = 0.5 x (a) BATS Simulation Case 2 = Exudation to detritus (b) = rem and sinking rate = 2 x (a) (c) = rem and sinking rate = 0.5 x (a) Deep chlorophyll maximum is smaller in case (2b), contrary to HOT Case 3 = Exudation to ammonium Mesozooplankton integrated over the first 140m HOT HOT BATS Reduction of chla in all cases (2-3) But large increase of mesozooplankton in case 2c (rem, sinking rate = 0.5 (a)) The importance of the various pathways has changed but varies from one oligotrophic environment to another Data assimilation did not lead to different parameters but the cost function could not be reduced in cases (2) and (3) Atmospheric forcing • Wind stress and non solar radiation affect directly the circulation and indirectly the ecosystem • Solar radiation affects directly the ocean circulation and ecosystem (i.e. photosynthesis) Circulation model • Mixing scheme, grid resolution etc Ecosystem model • Parameters and pathways Downward Short Wave Radiation (W m-2) Wind Stress (dyne cm -2) Mean Difference between NCEP/DOE and NCEP/NCAR (1992-2001) HOT Mean Difference between NCEP/DOE and NCEP/NCAR (1992-2001) HOT The mean varies between 100 and 220 W m-2 Downward Short Wave Radiation (W m-2) Hale-Aloha Mooring Equator 0oN -140oW Correlation between NCEP/DOE and NCEP/NCAR (1992-2001) Surface temperature Surface Chla Correlation between model chlorophyll-a simulations Correlation between NCEP/DOE and NCEP/NCAR (HOT estimated parameters) Correlation between BATS and HOT parameters (same atm. Forcing) (9) (3) PRSOM derived biogeographical regions using annual climatology of SeaWiFS chlorophyll-a and AVHRR sea surface temperature between 1998 and 2005. (2) Region 2 Region 3 Region 7 Region 9 Color Forcing Parameter Red NCEP/NC AR Doney et al. (2008) Black NCEP/NC AR HOT Green NCEP/DOE HOT Blue NCEP/NC AR BATS SeaWiFS log10(chl) (7) Modeled log10(chl) Assimilated data: spatial and temporal scarcity – measurement error Biogeochemistry time series – International Ocean Carbon Coordination Program http://www.ioccp.org/time-series-efforts Argo Floats Some are measuring oxygen SeaWiFS (sr2010.0m) and MODISA (AR2013.0m) Comparison for Hawaii Black=Deep-Water Blue=Oligotrophic Green=Mesotrophic Red=Eutrophic Trend Statistics, Global Subsets Derived from Binned Level3 Mission Trends MODISA (ar2012.0m) and MERIS (mr2012.1m) SeaWiFS (sr2010.0m) and MERIS (mr2012.1m) Black=Deep-Water Blue=Oligotrophic Green=Mesotrophic Red=Eutrophic http://oceancolor.gsfc.nasa.gov/ANALYSIS/PROCTES T/ Trend Statistics, Global Subsets Derived from Binned Level3 Mission Trends MODISA (ar2010.0m) and MODISA (ar2012.0m) Same satellite two reprocessing (algorithms) Black=Deep-Water, Blue=Oligotrophic, Green=Mesotrophic, Red=Eutrophic http://oceancolor.gsfc.nasa.gov/ANALYSIS/PROCTES T/ Same satellite - two reprocessing (algorithms) Example for Hawaii - Chlorophyll Satellite remote sensed versus in situ observations Meanand overMODIS 200 m Chla SeaWiFS In situ Chla at HOT Fluometric, HPLC Note the difference in the y-axis limits • Long term time series from ocean observatories (OOI, etc) and remote sensed observations are needed to estimate model parameters, state variables and to calibrate the model results, BUT what are we really missing from space, mooring, gliders? DON/DOC Ammonium Bacteria Phytoplankton (Chlorophyll) Nitrate Mesozoo. Nano/Microzoo Detritus PON = P+Z+B+DET Which data assimilation technique to use how can we be certain to obtain the “optimal parameters? Ensemble Kalman filter (EnKF) -- Assumes all distributions are Gaussian. Introduces systematic errors (bias), easy to break w/ nonlinear model and sparse obs. Needs prior for parameters. Variational methods (4D-Var, MLE, MAP, &c.) -- Must store many high-dim. obs. Needs sequential continuation. Problems with local minima. Particle filters (SIR) -- Ensemble collapse in high dimensions. Implicit sampling -- Ensemble method that solves variational problems to guide particles toward obs. Advantages Nonparametric: strong theoretical basis for nonlinear/non-Gaussian problems. Sequential/on-line: use recursion rule and kernel density estimates to continue assimilation. Variational: (hopefully) fixes high-dim. degeneracy of particle method. And many more or variant of the mentioned one Improving marine ecosystem models: Use of data assimilation and mesocosm experiments – J.J. Vallino (2000) 10 state variables and 29 parameters SA = smaller cost function but “overfitting” Problems with “overfitting” Assimilation of observations Twin experiment Food for thought • “The parameter set that minimizes the cost function”: is that really what we are looking for? How can we account for the uncertainties in the cost function? •Given the uncertainties in the model pathways, observations and model forcing, any error estimate will need to be given in the context of these uncertainties, which could be a challenge. • The accuracy of the error estimates strongly depends on the assimilation technique. This will be presented in detail by Brad Weir. The typically used Gaussian approximation of the error distribution can be very misleading. For example, parameters that appear unidentifiable could in reality be identifiable with reasonable error statistics if using the right assimilation technique. Nemuro model (Kishi et al. , 2007) Methods: Ecosystem model descriptions • Models 1-4: N, P, Z, D (NH4,DOM, C:chl, T) (CCMA, McCreary, Anderson/McGillicuddy, Hood) • • • • • • • • • Models 5-6: 2P, 2Z, 2D, Fe (Christian, Wiggert) Model 7: 2P, 2Z, 2D, Si (Chai) Model 8: 2P, 3Z, 2D, Si, DOM (Fujii) Model 9: 2P, 4Z, 1D, B, DOM (Laws/Hood) Model 10: C, Alk, P, Z, 1D, 2DOM (Schartau) Model 11: 3P, 0Z, 1D, 3DOM, Si, Fe (Dunne) Model 12: 3P, 1Z, 2D, 4DOM, Si, Fe (Dusenberry/Doney/Moore) MM: Mean Model LST: Least Squares Test (4 box) (Friedrichs/Hood/Wiggert/Laws)
© Copyright 2024