Next Steps Group by theme Identify grand challenges

Next Steps ● ● Group by theme Identify grand challenges where progress in theme is needed Identify gaps to achieving Downselect to ~4 topic areas, then break into groups to polish ○ ○ ○ (DL) UQ related to other topics (DOE, V&V, information, decision), many domains (~6) (JN) Interpretability and Explainability (better quantify, methods/algs to optimize metrics) ■ And Causality, and V&V (~8) (BC) Information/Data (value of information over time for UQ and decisions) and DOE (relates to RL/adaptive design/extrapolation) (~6) ■ Optimization ● Prepare outbrief slides

1) UQ ● ● ● Precision and accuracy; understand tradeoffs for decisions with high consequences; multiple methods/approaches when methods have different benefits/precision (global/local/multimodal and speed fast/slow) what does UQ mean in modeling? One way: bayesian posterior; another way: what are the 5 most plausible interpretations 2 broad flavors of DL; DL approximates a function; also used for example in variational approach (learning a distribution) kernel methods? Are we making inference in euclidean space or distribution space? UQ on machine learning, or use machine learning for UQ (GPs); affects interpretability; climate example; discussion of density estimates (accuracy of prediction intervals? “Too aggressive”) Bayesian approach to UQ - get effects on prediction from prior (advantages/disadvantages? ) What do we lose when we choose the wrong class of models at the outset? Fund physics: look at many facilities/quantify resolution functions; Using ML for UQ, what else do you get by bringing in resolution functions? ; not using uncertainty in transport codes appropriately, resulting in bias? Does uncertainty propagate? How strong of a prior should your surrogate model be? [BNNs are one approach to help here] “Surrogate model(s)”; have an ensemble/complex systems; each model has different distributions; how do we do model selection among families of models Ensemble models: interp/extrap; drives UQ; disagreement among ensemble -> uncertainty; Gap between optimal model from DL vs what a domain expert wants When do we want to reduce vs understand uncertainty? Extrapolation: how can we bound it? What priors/constraints do we have when extrapolating? Adaptive design/RL? Related to UQ/GP approaches (climate breakout)

UQ: Challenges and Gaps ● ML for UQ vs UQ for ML ○ ○ ○ Highly domain dependent Why did we do ML first? Physical model too expensive -> surrogate model (where is it applicable/how local is it; discontinuities are problems computationally and wrt uncertainty) Use of surrogate models ■ Replace expensive models with ‘cheap’ surrogates ■ Physics based models are hard for different reasons ■ Are cheap and local. Can we quantify this better? ■ Need better class of AI models to create surrogates ● Probabilistic methods for non-local approximations ■ “Model of a model…” ● How do we propagate error optimally, what is loss function ● UQ for extreme events with high loss or quantiles

UQ: Challenges and Gaps ● Climate Challenges: needs ‘better UQ’ ○ ○ ○ Lots of deterministic models across scale propagate up/down/ between grid cells Want to improve predictions/intervals; v&v not fast/sample size 1 earth Want to quantify what we don’t know Computationally expensive ■ Use of surrogate models (biased, but fast) Uncertainty comes from data and model? ● GPs Historically Used for Determining UQ on deterministic models ○ Deterministic models expensive to evaluate ■ GPs estimate the uncertainty in interpolation ■ Different than using a surrogate model ■ Source of uncertainty in model? ■ How do we do better with DL? ● Fundamental Physics ○ Propagating uncertainty

2) Interpretability and Exp ● ● ● Tensor factorization for explainability: N-dimensional data; unsupervised learning (climate/cancer/chemistry); like NNMF/latent features; chemistry/combustion: multiscale modeling in transportation Definitions? Why are we looking at the posterior? To accept/reject predictions -> point estimate fine; in practice we use softmaxes; develop new methods focused on ‘trust’; PPD makes us feel good, but is it true? We need data/model parameters for different reasons; helps differentiate exp and interp. Explaining an unforseen outcome; vs interpreting model output; explaining/inter parameters aspects where one matters/other doesn’t; multiscale - interpretability at subgrid desired, other scale for control decision; application specific Model robustness: (modeling assumptions; do we use priors too strong? )

Interp/Exp Challenges and gaps ● Connection UQ and interp/explainability ○ In climate, have a big inverse problem ■ If I believe output, how do I understand how we got there? ● How do we quantify interpretability across domains? ○ ○ What is uncertainty by which we get to solution from a complex model Bounds

3) Validation & Verification ● ● big in power industry; Want V&V for everything. -> TRUST for decision makers; nuclear security (quantile bounds; extreme value theory); deterministic models; asymmetric loss for statistical models, validation: how is validation different than model selection? Probabilistic calibration; asymmetric loss/rare events/bounds Verification: does model do what it is intended to do; in simulation (ABS), we follow/subselect some agents to verify; Do we need new v&v paradigms for models built on new hardware

V&V Challenges and Gaps ● Domains such as power and national security (nuclear) ○ Key prerequisite for trust of decision makers ● Dovetails with UQ ○ ○ Does our UQ lead to results that can be reproduced ■ Issue in systems biology not translating to lots of new successful FDA trials Can we do better with V&V in areas like climate

4) Optimization ● ● We need to solve big opt problems (MINLPs) for energy grid generation and distribution; algorithm and scalability precision/accuracy: tradeoff in loss; newton’s vs quasi-newton vs SGD; can we select model fitting under different loss functions? Challenges ● Sometimes a tool, sometimes an outcome ● Many times there is no single, one dimensional objective function ○ When is optimization the right answer ● Big, nasty problems mathematically (MINLPs) ● What do we need to develop in opt for AI? ● Robust methods (what do we mean specifically) Lots of AI at many scales with high variance. ○ Do we want resilience or rigidity? Defining performance metrics difficult Gaps: ● Separating larger problem into smaller problems (decomposition) ● SGD can be improved on; better classes of model could be enabled Infrastructure/Investment ● HPC Needs from Domain: ● How can we relax the problem? (physics constrained; a discretized PDE)

5) Decision ● Precision and accuracy; understand tradeoffs for decisions with high consequences; multiple methods/approaches when methods have different benefits/precision (global/local/multimodal and speed fast/slow)

6) Information Theory/Data ● ● ● What do we lose when we choose the wrong class of models at the outset? Time scale; what data do we keep? For how long? Colocation of data/compute? Sensor side: ‘real time’ decisions; years later may need post hoc analysis (error/anomaly later in production life) What should we keep? How? Data ‘sufficiency’ is different; (what do we need) sufficient dim reduction Thoughts on models better/worse at interpolation vs extrapolation? (sequential analyses of hypotheses? ) adaptive designs How do we do model selection in adaptive design? Data curation; information; build models for a specific dataset, not necessarily the population it came from; do we have enough data to answer other hypotheses. ; What data do I need to know to answer hypothesis X

Information Theory/Data Challenges and Gaps ● Characterizing model/data go hand in hand ● Which data have strong influence in model predictions ● Some hypotheses we know a priori, some we know post hoc ○ ○ We don’t know what analyses will be done in the future We don’t know what data to keep ● How valuable is data from surrogate models? ○ When is it valuable? ● How do we estimate the value of data over time? (eg user facility like SNS) ○ ○ ○ Do we maintain just statistics of data distribution instead of data What mix of high resolution/low resolution allows original data to be reconstructed accurately? Additive manufacturing (post hoc error analysis) User facility: how do you monitor the stream of data over time? What objective functions are used when keeping data around? ■ Energy systems: intrusion detection/security

Information Theory/Data Gaps ● Don’t know the value data over time ○ Used for different purposes over time ● Maintain and keep data with limited resources ● New data (sensors) have different information ● Information is in literature that isn’t easily extractable (materials, standards, etc) ● Value of data from surrogate models not always obvious ● No DOE database exists Infrastructure/Investment ● Encoder/decoder ● Literature mining Needs from Domain: ● How can domain quantify changes in data measurements as sensors change (climate: units, measurement change) ● What data is important; what has been done in literature; what is the expiration date for their data; need metadata Uniqueness to DOE ● Big generator of data; user facilities; Esnet; unique data with little immediate economic value; Capabilities 3 -5 yr better data management plan; 3 D ability to identify defects post hoc; applications to AI; 10 -15 year ability to reconstruct data ●

7) Design of Experiments ● ● ● ● precision/accuracy; asymmetric loss; iterative decision making Iterative set of analyses, propagating uncertainty, how good do we need to be at estimating error to get a good prediction interval In finance; there is value of information with respect to financial risk; we (DOE) don’t necessarily always have very clear metric of risk a priori In manufacturing, machines are under-instrumented; need a formal way to measure what we need to know AI suit: up front, look (up front) what hypotheses might be solved, what is the value of different measurements wrt those hypotheses; decision theoretic approach (what are loss functions? ) “what we know now, vs what we want to know in the future” Thoughts on models better/worse at interpolation vs extrapolation? (sequential analyses of hypotheses? ) adaptive designs How do we do model selection in adaptive design?

Design of Experiments Challenges and Gaps ● DOE and UQ are also related in real world, iterative science ○ Need better adaptive design when we make decisions iteratively ● ML for DOE vs DOE for ML ○ ● ● An approach to the curse of dimensionality? DOE and reducing UQ in an optimal way Model characterization Design of experiments to determine to bound uncertainty/variance estimate? Many labs are interested in finding rare events ○ How does DOE need to adjust for things like rare/extreme events like particle events ■ Can UQ estimates help determine the best places to look for rare events? ● UQ/DOE is changing fast, can we provide better education/automated tools ○ For a domain like fundamental physics, how can we create better metadata/tools for domain needs

Design of Experiments Gaps ● ● ● Experiments (multiscale, etc) are complex, specific experiments not known a priori Want to maximize value of information in a user facility experiment when resources can go offline Design for rare events, for example particle detectors Need to clarify the use: static, adaptive/iterative Codesign: don’t know specific experiments when you build a user facility (env science; innoculate soil, don’t know what you’ll want to measure in the future) ● Value of data changes over time: Underinstrumented: how to maximize value with uncertain value of data over time (features are expensive to add) Infrastructure/Investment ● Mathematical side; optimization with non-traditional constraints; DOE with asymmetric loss functions; balance between flexibility and improvement (bandit like) Needs from Domain: ● ML for DOE: climate, analyze time series to decide which processes are most important to include; what new data to collect ● Collaboration earlier; define loss functions more clearly ● Getting prior information Uniqueness to DOE ● Energy design, generation and distribution; rare particle event detection; user facilities; asymmetric loss functions (nuclear security, energy grid)

8) Causality ● ● bio idea of “digital twin” to create counterfactuals for treatment options Modeling physical phenomenon; miss nuances of human/social phenomena; many causal pathways/interp/exp can be very culturally dependent (environment not consistent); domain knowledge ‘culturally dependent’ How are interp/exp different in causal framework vs association/correlation models (how do decision makers view these models? ) Can we create a causal/explainable model Challenges ● How do go from correlation to causation in next gen AI? ○ What are the gaps in the deep learning world for (a) cause/effect and (b) counterfactual settings

Model Applicability and Characterization Co-lead: Blair Christian (ORNL) Co-lead: Dan Lu (ORNL) Co-lead: Justin Newcomer (Sandia) ORNL is managed by UT-Battelle, LLC for the US Department of Energy

List of breakout participants (44 people) • • • • 21 Blair Christian Dan Lu Justin Newcomer Mathieu Doucet David Fobes Nancy Hayden Jacob Hinkle Travis Johnston Brian Kaul Ryan King James Kress Jitendra Kumar Frank Felder Robert Link Lexie Yang • • • • Jiafu Mao Yury Maximov Hugh Medal Matt Menickelly Konstantin Mischaikow Susan Mniszewski Ambarish Nag Kyle Neal Michelle Newcomer Jim Ostrowski Ozgur Ozmen Tara Pandya Pavel Lougovski Jiaxin Zhang Dawn Levy • • • • Armenak Petrosyan Hong Qin Daniel Ricciuto Derek Rose Peter Schultz Satyabrata Sen Stuart Slattery Michael Smith Sibendu som Suhas Somnath David Stracuzzi Hai Xiao Zechun Yang Steven Young

Breakout Agenda • • 22 9: 00 Introduction (10 min) 9: 10 Review charge questions (10 min) 9: 20 Identify potential topics (20 min) 9: 40 Merge and reduce topics (20 min) 10: 00 Topic 1 (30 min) 10: 30 Topic 2 (30 min) 11: 00 Topic 3 (30 min) 12: 50 Wrap up

Mapping application challenges to the crosscuts Data Learning Action s Scalability Assurance Workflow Partial Information Environment Agent Reward s State s Model-based Approximations • Experimental design • Physics informed • Data curation and • Reinforcement learning validation • Compressed sensing • Adversarial networks • Facilities operation • Representation and control learning and multimodal data • “Foundational math” of learning 23 • Algorithms, complexity and convergence • Levels of parallelization • Mixed precision arithmetic • Communication • Implementations on accelerated-node hardware Model applicability and characterization • Uncertainty • Computing at the • Explainability and • Compression quantification interpretability • Validation and verification • Causal inference edge • Online learning • Federated learning • Infrastructure • Augmented intelligence • Human-computer interface

Review Charge Questions (10 minutes) Identify 3 -5 open questions that need to be addressed to maximally contribute to AI impact in the science domains and/or AI impact in the enabling technologies? For each challenge: • To what extent is DOE uniquely positioned to address this challenge? – What contributions can DOE make to the broader AI community (3 -5 years, 10+ years)? – How well is the broader AI community suited to addressing this challenge? • What capabilities are imagined in the 3 -5 year timeframe and the 10 -15 year timeframe? • What classes of AI problems will the technical area contribute to? • What level of infrastructure and investment is needed to realize the impact? • What do you need from the domain sciences to push your areas forward? 24

Initial list of topics (20 minutes) 25

Merge and reduce topics (20 minutes) 26

Topic 1: UQ • Uniqueness to DOE o Contributions from DOE ▪ Math/statistics theory ▪ HPC architectures o Broader community contributions ▪ Confidence in ML predictions ▪ Knows what we do not know 27 • Capabilities o 3 -5 years ▪ Learn probability distribution from data in high dimensional spaces ▪ Mathematical and computational approaches that can be widely applied ▪ Physics-informed ML methods to constrain the uncertainty o 10 -15 years ▪ UQ for combined ML and physical models ▪ Scalable UQ methods ▪ Generalized ML model for extrapolation (rare events) and UQ to quantify the extrapolation uncertainty ▪ Life-cycle UQ framework

Topic 1: • Classes of AI problems contributed to o Probabilistic deep learning • Infrastructure and investment needed o DOE-leading computing 28 • Needs from the domain sciences o Physics information to constrain the uncertainty and for V&V o To understand the uncertainty propagation • Contributions to broader AI community o 3 -5 years ▪ Advance the ML model prediction (accuracy and precision) ▪ Explainability and interpretability o 10+ years ▪ Support decision making ▪ Improve predictive understanding

Topic 2: Interpretability and Explainability • Uniqueness to DOE o Contributions from DOE ▪ More relevant to high consequence scientific applications ▪ Having methods that are inherently explainable/interpretable enables more rapid/automated scientific discovery o Broader community contributions ▪ Overlap in interests from pharma and transportation/mobility - both in terms of verifiability and in terms of improving understanding of outcomes 29 • Capabilities o 3 -5 years ▪ Ability to define/quantify interpretability and explainability? Qualitative vs. quantitative measures of explainability / How do we encode explainability? ▪ Ability to build in explainability / interpretability as a feature of AI/ML methods o 10 -15 years ▪ Development of a new class of algorithms that directly optimize interpretability and explainability ▪ Ability to represent/encode, and embed domain knowledge and causality into AI algorithms ▪ Ability to interrogate the model to understand why it is making decisions - draw out the explainability from the model

Topic 2: Interpretability and Explainability • Classes of AI problems contributed to o All • Infrastructure and investment needed o Dedicate people/funding focused specifically on explainability rather than having it be an offshoot of other projects o Embedding explainability into models and algorithms may (or may not) increase computational complexity - Degree of explainability may come at a cost. Would like to understand what those costs are and how to make those tradeoffs. 30 • Needs from the domain sciences o Understanding of the tradeoffs between interpretability and explainability - which is needed, to what degree, and when o Understanding of what the domain scientist needs to trust a model (metrics? ) - What is a satisfactory explanation/interpretation? o Representation of domain knowledge / known causal relationships • Contributions to broader AI community o 3 -5 years o 10+ years ▪ Ability to provide domain scientists, decision makers, and the general public the trust to adopt the AI algorithms for critical decisions

Topic 3: Optimization, Design of Experiments, Information Theory/Data • Uniqueness to DOE ○ ○ ○ Large user facilities Energy/National security Climate Asymmetric loss Experiments not clearly defined o Contributions from DOE ▪ HPC ▪ User facility data ▪ Dynamic user groups ▪ Large use of surrogate models 31 • Capabilities o 3 -5 years ▪ XXX o 10 -15 years ▪ XXX

Topic 3: Optimization, Design of Experiments, Information Theory/Data • Classes of AI problems contributed to o DOE dovetails with o XXX • Infrastructure and investment needed 32 • Needs from the domain sciences • Contributions to broader AI community o XXX o 3 -5 years o XXX ▪ XXX o 10+ years ▪ XXX

Design of Experiments Challenges ● DOE and UQ are also related in real world, iterative science ○ Need better adaptive design when we make decisions iteratively ● ML for DOE vs DOE for ML ○ ● ● An approach to the curse of dimensionality DOE and reducing UQ in an optimal way in complex settings like climate Model characterization Design of experiments to determine to bound uncertainty/variance estimate Many labs are interested in finding rare events ○ How does DOE need to adjust for things like rare/extreme events like particle events ■ Can UQ estimates help determine the best places to look for rare events? ● UQ/DOE is changing fast, can we provide better education/automated tools ○ For a domain like fundamental physics, how can we create better metadata/tools for domain needs

Design of Experiments Gaps ● ● ● Experiments (multiscale, etc) are complex, specific experiments not known a priori Want to maximize value of information in a user facility experiment when resources can go offline Design for rare events, for example particle detectors Need to clarify the use: static, adaptive/iterative Codesign: don’t know specific experiments when you build a user facility (env science; innoculate soil, don’t know what you’ll want to measure in the future) ● Value of data changes over time: Underinstrumented: how to maximize value with uncertain value of data over time (features are expensive to add) Infrastructure/Investment ● Mathematical side; optimization with non-traditional constraints; DOE with asymmetric loss functions; balance between flexibility and improvement (bandit like) Needs from Domain: ● ML for DOE: climate, analyze time series to decide which processes are most important to include; what new data to collect ● Collaboration earlier; define loss functions more clearly ● Getting prior information Uniqueness to DOE ● Energy design, generation and distribution; rare particle event detection; user facilities; asymmetric loss functions (nuclear security, energy grid) Capabilities 3 -5 yr: understanding from domain scientists (bioepic at LBNL, instrument soil), 10 -15 year: can AI automate the DOE process for a given experimental process (automated lab) ●

Optimization Challenges ● Sometimes a tool (SGD for AI), sometimes a goal (energy production) ● Many times there is no single, one dimensional objective function ○ When is optimization the right answer ● Big, nasty problems mathematically (MINLPs) ● Are local optima good enough in AI? ● Robust methods (what do we mean specifically) Lots of AI at many scales with high variance. ○ Do we want resilience or rigidity? Defining performance metrics difficult Gaps: ● Separating larger problem into smaller problems (decomposition) ● SGD can be improved on; better classes of model could be enabled Infrastructure/Investment ● HPC Needs from Domain: ● How can we relax the problem? (physics constrained; a discretized PDE) ● More problems; benchmarks for power systems Uniqueness to DOE ● Domain specific, electric; DOE has computational power; DOE has big data (big DL, energy grid, climate); culture of ● collaboration between modeling and domain Capabilities ● 3 -5 year: finding optimal solutions of DL; adding in constraints ; 10 -15 year: using quantum to speed up; automating energy grid; 3 d functional design with AI driven topology optimization

Information Theory/Data Challenges ● Characterizing model/data go hand in hand ● Which data have strong influence in model predictions/uses ● Some hypotheses we know a priori, some we know post hoc ○ ○ We don’t know what analyses will be done in the future We don’t know what data to keep ● How valuable is data from surrogate models? ○ When is it valuable? ● How do we estimate the value of data over time? (eg user facility like SNS) ○ ○ ○ Do we maintain just statistics of data distribution instead of data What mix of high resolution/low resolution allows original data to be reconstructed accurately? Additive manufacturing (post hoc error analysis) User facility: how do you monitor the stream of data over time? What objective functions are used when keeping data around? ■ Energy systems: intrusion detection/security ■ Healthcare: what health covariates would predict future diagnoses Office Science/NNSA same problem ● How do we value data at different scales

Information Theory/Data Gaps ● Don’t know the value data over time ○ Used for different purposes over time ● Maintain and keep data with limited resources ● New data (sensors) have different information ● Information is in literature that isn’t easily extractable (materials, standards, etc) ● Value of data from surrogate models not always obvious ● No DOE database exists Infrastructure/Investment ● Encoder/decoder ● Literature mining Needs from Domain: ● How can domain quantify changes in data measurements as sensors change (climate: units, measurement change) ● What data is important; what has been done in literature; what is the expiration date for their data; need metadata Uniqueness to DOE ● Big generator of data; user facilities; Esnet; unique data with little immediate economic value; Capabilities 3 -5 yr better data management plan; 3 D ability to identify defects post hoc; applications to AI; 10 -15 year ability to reconstruct data ●

Summary: key points to communicate (10 minutes) 38