Modeldata fusion for the coupled carbonwater system Cathy

Model-data fusion for the coupled carbon-water system Cathy Trudinger, Michael Raupach, Peter Briggs CSIRO Marine and Atmospheric Research, Australia and Peter Rayner LSCE, France Email: cathy. trudinger@csiro. au

Outline n Model-data fusion (= data assimilation + parameter estimation) n Parameter estimation with the Kalman filter n Australian Water Availability Project n Opt. IC project – Optimisation Intercomparison

Model-data fusion Model: Observations: - Process representation - Subjective, incomplete - Capable of interpolation & forecast - ‘Real world’ representation - Incomplete, patchy - No forecast capability Fusion: Optimal combination (involves model-obs mismatch & strategy to minimise) Analysis: - “Best of both worlds” - Identify model weaknesses - Forecast capability - Confidence limits

Choices in model-data fusion n Target variables – what model quantities to vary to match observations – e. g. initial conditions, model parameters, time-varying model quantities, forcing n Cost function – measure of misfit between observations and corresponding model quantities e. g. J(targets) = (H(targets) - obs)2 + (targets - priors)2 n Fusion method - search strategy n n Batch (non-sequential) e. g. down-gradient, global search Sequential e. g. Kalman filter Approach and issues will differ to some extent between disciplines – e. g. numerical weather prediction vs terrestrial carbon cycle

The Ensemble Kalman filter n Ensemble Kalman filter (En. KF) – sequential method that uses Monte Carlo techniques; error statistics are represented using an ensemble of model states. n Two steps: Model used to predict from one time to next Update using observation 1. 2. Initial ensemble Time: t 0 Update using measurement Model predicts t 1 t 2

Parameter estimation with the Ensemble Kalman filter n Augmented state vector to be estimated contains Time-dependent model variables n Time-independent model parameters n n State vector estimate at any time is due to observations up to that time

Our component of Australian Water Availability project: develop a Hydrological and Terrestrial Biosphere Data Assimilation System for Australia n n MODEL Soil moisture Leaf carbon Water fluxes Carbon fluxes n n n n n OBSERVATIONS NDVI Monthly river flows Weather: rainfall, solar radiation, temperature PRIOR INFORMATION n Initial parameter estimates n Soil, vegetation types MODEL-DATA FUSION Ensemble Kalman Filter Down-gradient method (LM) Analysis of past, present and future water and carbon budgets Maps of soil moisture, vegetation growth Process understanding Drought assessments, national water balance

AWAP- Dynamic Model and Observation Model Timestep = 1 day n Spatial resolution = 5 x 5 km State variables (x) and dynamic model n Dynamic model is of general form dx/dt = F (x, u, p) n All fluxes (F) are functions F (x, u, p) = F (state vector, met forcing, params) n Governing equations for state vector x = (W, CL): Soil water W: Leaf carbon CL: n Observations (z) and observation model n NDVI = func(CL) n Catchment discharge = average of FWR + FWD [- extraction - river loss] n State vector in En. KF: x = [W, CL, NDVI, Dis, params]

Southern Murray Darling Basin, Australia: "unimpaired" gauged catchments

J F M A M J J A S O N D 81 82 83 84 85 Murrumbidgee Relative Soil Moisture (0 to 1) 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 (Forward run with priors, no assimilation)

Predicted and observed discharge 11 unimpaired catchments in Murrumbidgee basin 25 -year time series: Jan 1981 to December 2005 (Forward run with priors, no assimilation)

Model-data synthesis approach: - State and parameter estimation with the En. KF - Assimilate NDVI and monthly catchment discharge Why Kalman filter? - Can account for model error (stochastic component) - Consistent statistics (uncertainty analysis) - Forecast capability (with uncertainty) Issues: - Time-averaged observations in En. KF (e. g. monthly catchment discharge) - Specifying statistical model (model and observation errors) - KF (sequential) vs batch parameter estimation methods? (using Levenberg-Marquardt method; also Opt. IC project)

Estimated parameters Preliminary results: Adelong Creek Blue = Ensemble Kalman filter (sequential) Red = Levenberg-Marquardt (PEST) (batch) Monthly mean discharge/runoff

Opt. IC project Optimisation method intercomparison n n International intercomparison of parameter estimation methods in biogeochemistry Simple test model, noisy pseudo-data 9 participants submitted results Methods used: n n n Down-gradient (Levenberg-Marquardt, adjoint), Sequential (extended Kalman filter, ensemble Kalman filter) Global search (Metropolis, Metropolis MCMC, Metropolis. Hastings MCMC).

Opt. IC model Estimate parameters p 1, p 2, k 1, k 2 where F(t) – forcing (log-Markovian i. e. log of forcing is Markovian) x 1 – fast store x 2 – slow store p 1, p 2 – scales for effect of x 1 and x 2 limitation of production k 1, k 2 – decay rates for pools s 0 – seed production (constant value to prevent collapse)

Noisy pseudo-observations T 1: Gaussian (G) T 4: Gaussian but noise in x 2 correlated with noise in x 1 (GC) T 6: Gaussian with 99% of x 2 data missing (GM) T 2: Log-normal (L) T 3: Gaussian + temporally correlated (Markov) (GT) T 5: Gaussian + drifts (GD)

Estimates divided by true parameters p 1 p 2 k 1 k 2

Cost function Some participants used cost functions with weights, wi(t), that depended on each noisy observation zi(t)

Down-gradient KF Global-search Code Method Weights LM 1 Monte Carlo then Levenberg-Marquardt f(zi(t)) LM 1 Rob As LM 1, but ignore 2% highest summands in cost fn f(zi(t)) LM 2 Levenberg-Marquardt 0. 01 LM 3 Levenberg-Marquardt f(zi(t)) Adj 1 Down-gradient search using model adjoint 1. 0 Adj 2 Down-gradient search using model adjoint sd(x) EKF Extended Kalman filter (with parameters in state vector) sd(resids) En. KF Ensemble Kalman filter (with parameters in state vector) sd(resids) Metropolis sd(resids) Met. Rob As Met but absolute deviations not least squares sd(resids) Met. MCMC Metropolis Markov Chain Monte Carlo Met. MCMCq As Met. MCMC but quadratic weights MH_MCMC Metropolis-Hastings Markov Chain Monte Carlo 1. 0 f(zi 2(t)) 1. 0 wi(t) = f(zi(t)) less successful than constant weights

Choice of cost function n Evans (2003) – review of parameter estimation in biogeochemical models - “it was hard to find two groups of workers who made the same choice for the form of the misfit function”, with most of the differences being in the form of the weights. n Evans (2003) and the Opt. IC project emphasise that the choice of cost function matters, and should be made deliberately not by accident or default. (Evans 2003, J. Marine Systems)

Optic project results Choice of cost function had large impact on results n Most troublesome noise types: - temporally correlated noise n The Kalman filter did as well as the batch methods n For more information on Opt. Ic: n http: //www. globalcarbonproject. org/ACTIVITIES/Opt. IC. htm

Thank you!