- Slides: 15
MODEL-DATA FUSION Harnessing the 'long tail' of ecosystem carbon cycle observations Approaches and challenges in synthesizing and assimilating nonautomated and experimental data Mike Dietze, Trevor Keenan, Ankur Desai, Bob Cook and others! #NACP 13 AIM 4 Wednesday Breakout Session Report
The long tail of orphan data Volume Characteristics Big Science Large Volume Automated sensos Well described Well curated Easily Discovered Specialized repositories (50%) Orphan data (50%) Rank frequency of datatype(B. Heidorn) Characteristics • • Small Science Small Volume Poorly described Rarely Indexed Invisible to scientists Rarely Used Dark Data • • • High spatial resolution Process based Theory Development Model Development Benchmarking 2
Carbon’s Long Tail • • Experimental manipulations Vegetation plots (esp. historical) Belowground carbon Sap flux Gas exchange (leaf, root, etc. ) Soil respiration CH 4, VOC, DOC
Limitations to synthesis • OBSERVATIONS – – Non-standard formatting Inadequate metadata Insufficient archiving Data discovery • MODELS – Models inaccessible, not user friendly or well documented, relevance hard to judge – Data assimilation even less accessible – Informatics, flow of information is high and only one-way – # of experimentalists >> # of modelers
Questions about Observations • What are the key challenges to using experimental and observational data in data assimilation? • What priorities are there for data or biomes the community should focus on? • Is there a need to develop guidelines for community model-data assimilation to prevent misuse or assimilation of biased data?
Rate my data: Keenan et al. (2013) Ecological Applications
✔Check for best practices ✔Create metadata ✔Connect to ONEShare Data & Metadata (EML) https: //dataone. org http: //dataup. cdlib. org/
Discussion • Challenges: – Incentives to format, submit data (carrots and sticks) • Enforcing data management plans • Providing useful parameters (e. g. , Leaf. Web) for submitted raw data and DOI and Amazon. com/Google like searching and suggestion of collaborators and other relevant data – Limitations of existing Fluxnet/Ameriflux “ancillary” data standards (now in revision) • Must be able to link to hierarchy of scaled or summarized observations (like aboveground biomass) down to detailed, raw observations (tree diameters with lat/lons) in machine readable formats with well defined variable names/units – Tracking provenance, usage, access rights – Evil Excel spreadsheets hiding in desk drawers – orphaned data
Discussion • Priorities – Funding for data management beyond “big data” – Identification of a very small set of very high priority observations beneficial to all carbon cycle models and not currently synthesized well – Training for best practices, good examples – Use NACP syntheses like MSTMi. P to build benchmark datasets of carbon cycle observations beyond flux and remote sensing
Questions about models • How can new tools make model-data synthesis more accessible, communityoriented, and with faster forecast turnaround times? • Can this approach increase credibility of models for addressing policy and management questions? • How can we better archive and document older data sets that are at risk of falling through the cracks?
Manage flow info in/out of models Scientific workflows Automate analysis Accessibility Repeatability Data Assimilation Predictive Ecosystem Analyzer Le. Bauer et al 2013 http: //www. pecanproject. org/
Would you, could you, assimilate data in someone’s model, if this is easy to use? Would you use mine? Would you contribute yours? Data, Models, Code and time Try it, tell me more!
Rate my model! Le. Bauer et al. , 2013, Ecol. Mono
Discussion • Everyone is a modeler now – models need better documentation, details of parameters and how they are used, standard interfaces • Workflow tools need to be both graphical and scriptable and also traceable • Ensemble runs, data assimilation, uncertainty analyses in complex models require perhaps user-facility like computational resources • Output needs to be more closely tied to observations made (e. g. , soil respiration) • Incorporating responses of manipulations in model frameworks is a particular challenge