Data assimilation and biogeochemical modeling – Why is it so “hard”?

Data assimilation and biogeochemical
modeling – Why is it so “hard”?
Yvette H. Spitz
Oregon State University,
College of Earth, Ocean, and Atmospheric Sciences
Corvallis, OR 97331
[email protected]
Two famous quotes:
“Ecologists do not have the equivalent of the
Navier Stokes equations”
“To perform data assimilation, one has to have
a model and some data” J.J. O’Brian (Summer
school 1993) (and assimilation technique)
Are these statements still valid and what are the
implications on uncertainties in the estimates (model
state variables and/or parameters)?
Sources of uncertainties and quantification
• Model structure
• Model forcing: physical and biological
• Assimilated data: spatial and temporal scarcity – measurement
error
• Assimilation techniques: Strengths and weaknesses – impact
on the estimates
• Quantification of “errors” via/after assimilation of
biogeochemical data in coupled circulation/biogeochemical
models
Model Structure
All based on the same structure with more or less
complexity (including the Darwin model)
Parameterization of the pathways can be different
Fennel et al. (2011)
Franks et al. (1986)
Schematic of BIOMAS’ Pelagic Ecosystem Model
NH4
DOM
Sinking
NH4
NO3
Si(OH)4
Vertical
Migration
Diatoms (PD)
Flagellates (PF)
Predators (ZP)
Copepods (ZL)
Detritus
Small Zoo
(ZS)
opal
DOM
Zhang, Spitz et al. (2010) – based on Nemuro (Kishi et al. 2007)
Fashman et al. (1990)
DON / DOC
Dissolved
Organic
Nitrogen
-> Spitz et al. (2001)
Phytoplankton
Chlorophyll-a
Nitrate
Ammonium
Mesozoo.
Bacteria
Nano/Microzoo.
Zooplankton
Large
Detritus
Detritus
Based on Spitz et al. (2001) and Nemuro
M. Rodrigues, A. Oliveira,
H.Queiroga, Y.J. Zhang,
A.B. Fortunato, A Baptista
(2007)
MIRO ECOSYSTEM STRUCTURE
NH4
OPM
OPC
DOM
BAC
POM
NF
µZoo
PO4
NO3
DA
MZoo
Si
Benthic diagenesis
126 parameters to estimate
Lancelot, Spitz et al., 2005
The Regional Ecosystem Modeling
Intercomparison Testbed Project
Marjorie Friedrichs, Larry Anderson, Rob Armstrong, Fei Chai,
Jim Christian, Scott Doney, John Dunne, Jeff Dusenberry,
Masahiko Fujii, Raleigh Hood, John Klinck, Dennis McGillicuddy,
Markus Schartau, Yvette Spitz, Jerry Wiggert
To quantitatively compare pelagic ecosystem models
against data in a standardized one-dimensional
framework
• Which ecosystem structures are most robust?
• How much complexity is justified?
• Is it feasible to develop models that are
applicable over many diverse ecosystems?
Experiment 1: Individual assimilation
Experiment 2: Simultaneous assimilation
Which types of models reproduce mean PP and
chl?
Experiment 3: Cross validation
Which types of models are most portable?
Model data misfit
Initial model-data comparison (pre-assimilation)
EqPac + Arabian Sea
LST
1
2
3
4
5
6
7
8
9
10 11 12
Model Number (increasing complexity)
Cost function comparison Expt 1 & 2
Cost function
Expt 1
Expt 2
Fe Fe
Fe Fe
MM LST 1 2 3 4 5 6 7 8 9 10 11 12
Model Number (increasing complexity)
- MM and LST models do quite well
- Simple NPZD models (#1-4) can reproduce data separately at each
site, but not at both simultaneously
- More complex models (#5-12) do not necessarily better
- Only 4 models do substantially better than MM/LST: those with Fe
Production vs Chlorophyll: Expt. 2
Arabian Sea
[mmol C m -2 d-1]
Mean Integrated PP
data
models
data
EqPac
Mean Integrated Chl [mg chl m-2 ]
- Models with multiple P size classes are slightly better able to
reproduce chl
- No relationship between number of P (or Z) compartments, and how
well production is reproduced
Production vs Chlorophyll: Expt. 2
Arabian Sea
[mmol C m -2 d-1]
Mean Integrated PP
data
models
Fe
data
EqPac
Mean Integrated Chl [mg chl m-2 ]
- Only models with iron are capable of reproducing observed
PP in EqPac, and these models do much better in AS as well
No Fe
Portability Index
- Models with more P and Z size classes are not necessarily more
portable than models with single P and Z size classes
Conclusions from The Testbed Project
•
Simple NPZD (no Fe) models can fit data well at individual sites,
but have difficulty simultaneously fitting data at both sites
•
Multiple size class models with iron are best able to fit data
simultaneously at both sites.
– Include a PZND+Fe model
•
Half the models do not reproduce the data as well as the mean
(MM) and empirical (LST) models
– Include additional sites (BATS, HOT, NABE, Southern Ocean)
•
No apparent trend in portability with model complexity
– Examining additional portability indices
•
Models can fit data similarly well, but do so via very different
pathways
– Need data that better constrain the model flows and dynamics
Importance of model pathways
DON / DOC
Dissolved
Organic
Nitrogen
Phytoplankton
Chlorophyll-a
Nitrate
Ammonium
Mesozoo.
Bacteria
Nano/Microzoo.
Zooplankton
Fashman et al. (1990)
Large
Detritus
Detritus
-> Spitz et al. (2001)
Annual fluxes for the upper mixed-layer
Original model (Fasham et al, 1990)
New model with data assimilation at BATS
New production system to remineralized system
Comparison of fluxes using two formulations of the microbial loop
Spitz et al. (2001)
Fasham et al. (1990)
the NH regeneration is from different sources. Using the FDM uptake formulation, the
main source of NH is from nano/microzooplankton excretion. In our model simulation,
the NH source is split between bacteria regeneration and nano/microzooplankton
excretion.
Importance of Microbial Loop
CASE 1
Dissolved
Organic
DON / DOC
Nitrogen
N2 fixation
E
x
Phytoplankton
Chlorophyll-a
Nitrate
Ammonium
Mesozoo.
Bacteria
Nano/Microzoo.
Zooplankton
Large
Detritus
Detritus
Rem
Ex = γ1 Phyto + γ2 f(I) uptake (NO3, NH4) Phyto
Rem = (m1 + m2 bact)
CASE 2
CASE 3
N2 fixation
N2 fixation
Phytoplankton
(Chlorophyll-a)
(a)
Ammonium
Phytoplankton
(Chlorophyll-a)
Nitrate
(a)
Ammonium
Mesozoo.
Ex
Nitrate
(a)
(a)
Mesozoo.
Ex
(a)
(a)
Nano/Microzoo.
(a)
(c)
Detritus
(a)
Rem = 0.096 d-1 (HOT)
Rem = 0.03 d-1 (BATS)
(a) (b)
(a)
Nano/Microzoo.
(a)
(c)
Detritus
(a)
Rem = 0.03 d-1 (HOT and BATS)
HOT - Case (2)
Exudation to detritus
HOT - Basic Case (1)
HOT - Case (3)
Exudation to ammonium
(a)
Chlorophyll-a
(b)
(b) = rem and sinking rate = 2 x (a)
In all cases, the
remineralization length scale
is the same
(c)
(c) = rem and sinking rate = 0.5 x (a)
BATS Simulation
Case 2 = Exudation to detritus
(b) = rem and sinking rate = 2 x (a)
(c) = rem and sinking rate = 0.5 x (a)
Deep chlorophyll maximum
is smaller in case (2b),
contrary to HOT
Case 3 = Exudation to ammonium
Mesozooplankton integrated over the first 140m
HOT
HOT
BATS
Reduction of chla in all cases (2-3)
But large increase of mesozooplankton in case 2c (rem, sinking rate = 0.5 (a))
The importance of the various pathways has changed but varies from one oligotrophic
environment to another
Data assimilation did not lead to different parameters but the cost function could not
be reduced in cases (2) and (3)
Atmospheric forcing
• Wind stress and non solar radiation affect directly the circulation and indirectly the
ecosystem
• Solar radiation affects directly the ocean circulation and ecosystem (i.e.
photosynthesis)
Circulation model
• Mixing scheme, grid resolution etc
Ecosystem model
• Parameters and pathways
Downward Short Wave Radiation
(W m-2)
Wind Stress (dyne cm -2)
Mean Difference between
NCEP/DOE and NCEP/NCAR
(1992-2001)
HOT
Mean Difference between
NCEP/DOE and NCEP/NCAR
(1992-2001)
HOT
The mean varies between 100 and 220 W m-2
Downward Short Wave Radiation (W m-2)
Hale-Aloha Mooring
Equator 0oN -140oW
Correlation between
NCEP/DOE and NCEP/NCAR (1992-2001)
Surface temperature
Surface Chla
Correlation between model chlorophyll-a simulations
Correlation between
NCEP/DOE and NCEP/NCAR (HOT
estimated parameters)
Correlation between
BATS and HOT parameters (same atm.
Forcing)
(9)
(3)
PRSOM derived biogeographical regions using annual
climatology of SeaWiFS chlorophyll-a and AVHRR sea
surface temperature between 1998 and 2005.
(2)
Region 2
Region 3
Region 7
Region 9
Color
Forcing
Parameter
Red
NCEP/NC
AR
Doney et al.
(2008)
Black
NCEP/NC
AR
HOT
Green
NCEP/DOE
HOT
Blue
NCEP/NC
AR
BATS
SeaWiFS log10(chl)
(7)
Modeled log10(chl)
Assimilated data: spatial and temporal
scarcity – measurement error
Biogeochemistry time series – International Ocean Carbon Coordination Program
http://www.ioccp.org/time-series-efforts
Argo Floats
Some are measuring oxygen
SeaWiFS (sr2010.0m)
and
MODISA (AR2013.0m)
Comparison for Hawaii
Black=Deep-Water
Blue=Oligotrophic
Green=Mesotrophic
Red=Eutrophic
Trend Statistics, Global Subsets Derived from Binned Level3 Mission Trends
MODISA (ar2012.0m)
and
MERIS (mr2012.1m)
SeaWiFS (sr2010.0m)
and
MERIS (mr2012.1m)
Black=Deep-Water Blue=Oligotrophic
Green=Mesotrophic Red=Eutrophic
http://oceancolor.gsfc.nasa.gov/ANALYSIS/PROCTES
T/
Trend Statistics, Global Subsets Derived from Binned Level3 Mission Trends
MODISA (ar2010.0m)
and
MODISA (ar2012.0m)
Same satellite
two reprocessing
(algorithms)
Black=Deep-Water, Blue=Oligotrophic,
Green=Mesotrophic, Red=Eutrophic
http://oceancolor.gsfc.nasa.gov/ANALYSIS/PROCTES
T/
Same satellite - two reprocessing (algorithms)
Example for Hawaii - Chlorophyll
Satellite remote sensed versus in situ observations
Meanand
overMODIS
200 m Chla
SeaWiFS
In situ Chla at HOT
Fluometric, HPLC
Note the difference in the y-axis limits
• Long term time series from ocean observatories (OOI, etc) and remote sensed
observations are needed to estimate model parameters, state variables and to calibrate
the model results,
BUT what are we really missing from space, mooring, gliders?
DON/DOC
Ammonium
Bacteria
Phytoplankton
(Chlorophyll)
Nitrate
Mesozoo.
Nano/Microzoo
Detritus
PON = P+Z+B+DET
Which data assimilation technique to use
how can we be certain to obtain the “optimal parameters?




Ensemble Kalman filter (EnKF) -- Assumes all distributions are
Gaussian. Introduces systematic errors (bias), easy to break w/
nonlinear model and sparse obs. Needs prior for parameters.
Variational methods (4D-Var, MLE, MAP, &c.) -- Must store many
high-dim. obs. Needs sequential continuation. Problems with local
minima.
Particle filters (SIR) -- Ensemble collapse in high dimensions.
Implicit sampling -- Ensemble method that solves variational
problems to guide particles toward obs.
Advantages

Nonparametric: strong theoretical basis for nonlinear/non-Gaussian
problems.

Sequential/on-line: use recursion rule and kernel density estimates to
continue assimilation.

Variational: (hopefully) fixes high-dim. degeneracy of particle method.

And many more or variant of the mentioned one
Improving marine ecosystem models: Use of data assimilation and
mesocosm experiments – J.J. Vallino (2000)
10 state variables and
29 parameters
SA = smaller cost function but “overfitting”
Problems with “overfitting”
Assimilation of observations
Twin experiment
Food for thought
• “The parameter set that minimizes the cost function”: is that really what
we are looking for? How can we account for the uncertainties in the cost
function?
•Given the uncertainties in the model pathways, observations and model
forcing, any error estimate will need to be given in the context of these
uncertainties, which could be a challenge.
• The accuracy of the error estimates strongly depends on the
assimilation technique. This will be presented in detail by Brad Weir.
The typically used Gaussian approximation of the error distribution can
be very misleading. For example, parameters that appear unidentifiable
could in reality be identifiable with reasonable error statistics if using the
right assimilation technique.
Nemuro model (Kishi et al. , 2007)
Methods: Ecosystem model descriptions
• Models 1-4: N, P, Z, D (NH4,DOM, C:chl, T) (CCMA, McCreary,
Anderson/McGillicuddy, Hood)
•
•
•
•
•
•
•
•
•
Models 5-6: 2P, 2Z, 2D, Fe (Christian, Wiggert)
Model 7: 2P, 2Z, 2D, Si (Chai)
Model 8: 2P, 3Z, 2D, Si, DOM (Fujii)
Model 9: 2P, 4Z, 1D, B, DOM (Laws/Hood)
Model 10: C, Alk, P, Z, 1D, 2DOM (Schartau)
Model 11: 3P, 0Z, 1D, 3DOM, Si, Fe (Dunne)
Model 12: 3P, 1Z, 2D, 4DOM, Si, Fe (Dusenberry/Doney/Moore)
MM: Mean Model
LST: Least Squares Test (4 box) (Friedrichs/Hood/Wiggert/Laws)