Challenges and Opportunities in Using Automatic Differentiation with

Acknowledgments l l l Jason Abate Satish Balay Steve Benson Peter Brown Omar Ghattas

Outline l l l Intro to AD Survey of Toolkits Q Sens. PVODE Q

Automatic Differentiation l l Technique for augmenting code for computing a function with code

Comparison of Methods l l l Finite Differences Q Advantages: cheap Jv, easy Q

PVODE: Parallel ODE-IVP solver l Algorithm developers: Hindmarsh, Byrne, Brown and Cohen l ODE

PVODE for ODE simulations l ODE Initial-Value Problem (standard form): l Implicit time-stepping using

Objective F(y, p) p y|t=0 ODE Solver (PVODE) y|t=t 1, t 2, . .

Possible Approaches Apply AD to PVODE: ad_F(y, ad_y, p, ad_p) p, ad_p y, ad_y|t=0

Sensitivity Differential Equations l Differentiate y = f(t, y, p) with respect to pi:

PETSc l l l l Portable, Extensible Toolkit for Scientific computing Parallel Object-oriented Free

Nonlinear PDE Solution Main Routine Nonlinear Solvers (SNES) Linear Solvers (SLES) PC Application Initialization

TAO The process of nature by which all things change and which is to

TAO Goals l Portability l Performance l Scalable parallelism l An interface independent of

TAO Algorithms l l l Unconstrained optimization Q Limited-memory variable-metric method Q Trust region/line

PETSc and TAO Application Driver Optimization Tools Linear Solvers Matrices PC KSP Application Initialization

Using AD with Toolkits l l Apply AD to toolkit to produce derivative-enhanced toolkit

Differentiated Version of Toolkit l l l Makes possible sensitivity analysis, black-box optimization of

Levels of Function Evaluation int Form. Function(SNES snes, Vec X, Vec F, void *ptr){

Parallel Function Level l Advantages Q Well-defined interface: int Function(SNES, Vec, void *); void

Subdomain Function Level l l Advantages Q No need to differentiate communication functions Q

Using AD with PETSc Global-to-local scatter of ghost values Local Function computation Parallel function

Using AD with TAO Global-to-local scatter of ghost values Local Function computation Parallel function

Element/Vertex Function Level l l Advantages Q Reduced memory requirements Q No need for

Example int localfunction 2 d(Field **x, Field **f, int xs, int xm, int ys,

Experimental Results l l Toolkit Level – Differentiated PETSc Linear Solver Parallel Nonlinear Function

Sens. PVODE: Problem l l l Diurnl kinetics advection-diffusion equation 100 x 100 structured

PETSc Applications l l Toy problems Q Solid fuel ignition: finite difference discretization; Fortran

For More Information l PETSc: http: //www. mcs. anl. gov/petsc/ TAO: http: //www. mcs.

10 challenges for PDE optimization algorithms Problem size 2. Efficiency vs. intrusiveness 3. “Physics-based”

7. Non-smoothness l l PDE residual may not depend smoothly on state variables (maybe

Slides: 43

Download presentation

Challenges and Opportunities in Using Automatic Differentiation with Object. Oriented Toolkits for Scientific Computing Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois Mc. Innes (ANL) Boyana Norris (ANL) Barry Smith (ANL) The Computational Differentiation Project at Argonne National Laboratory

Acknowledgments l l l Jason Abate Satish Balay Steve Benson Peter Brown Omar Ghattas Lisa Grignon William Gropp Alan Hindmarsh David Keyes Jorge Moré Linda Petzold Widodo Samyono

Outline l l l Intro to AD Survey of Toolkits Q Sens. PVODE Q PETSc Q TAO Using AD with Toolkits Q Toolkit Level Q Parallel Function Level Q Subdomain Level Q Element/Vertex Function Level Experimental Results Conclusions and Expectations

Automatic Differentiation l l Technique for augmenting code for computing a function with code for computing derivatives Analytic differentiation of elementary operations/functions, propagation by chain rule Can be implemented using source transformation or operator overloading Two main modes Q Forward: propagates derivatives from independent to dependent variables Q Reverse (adjoint): propagates derivatives from dependent to independent variables

Comparison of Methods l l l Finite Differences Q Advantages: cheap Jv, easy Q Disadvantages: inaccurate (not robust) Hand Differentiation Q Advantages: accurate; cheap Jv, JTv, Hv, … Q Disadvantages: hard; difficult to maintain consistency Automatic Differentiation Q Advantages: cheap JTv, Hv; easy? Q Disadvantages: Jv costs ~2 function evals; hard?

PVODE: Parallel ODE-IVP solver l Algorithm developers: Hindmarsh, Byrne, Brown and Cohen l ODE Initial-Value Problems l Stiff and non-stiff integrators l Written in C l MPI calls for communication

PVODE for ODE simulations l ODE Initial-Value Problem (standard form): l Implicit time-stepping using BDF methods for y (tn) Solve nonlinear system for y(tn) via Inexact Newton Solve update to Newton iterate using Krylov methods l l

Possible Approaches Apply AD to PVODE: ad_F(y, ad_y, p, ad_p) p, ad_p y, ad_y|t=0 ad_PVODE y, ad_y|t=t 1, t 2, . . . Solve sensitivity eqns: ad_F(y, ad_y, p, ad_p) p y|t=0 Sens. PVODE y, dy/dp |t=t 1, t 2, . . .

Sensitivity Differential Equations l Differentiate y = f(t, y, p) with respect to pi: l A linear ODE-IVP for the sensitivity vector si(t) :

PETSc l l l l Portable, Extensible Toolkit for Scientific computing Parallel Object-oriented Free Supported (manuals, email) Interfaces with Fortran 77/90, C/C++ Available for virtually all UNIX platforms, as well as Windows 95/NT Flexible: many options for solver algorithms and parameters

Nonlinear PDE Solution Main Routine Nonlinear Solvers (SNES) Linear Solvers (SLES) PC Application Initialization User code Solve F(u) = 0 PETSc KSP Function Evaluation PETSc code Jacobian Evaluation Post. Processing AD-generated code

TAO The process of nature by which all things change and which is to be followed for a life of harmony The Right Way Toolkit for advanced optimization l l Object-oriented techniques Component-based (CCA) interaction Leverage existing parallel computing infrastructure Reuse of external toolkits

TAO Goals l Portability l Performance l Scalable parallelism l An interface independent of architecture

TAO Algorithms l l l Unconstrained optimization Q Limited-memory variable-metric method Q Trust region/line search Newton method Q Conjugate-gradient method Q Levenberg-Marquardt method Bound-constrained optimization Q Trust region Newton method Q Gradient projection/conjugate gradient method Linearly-constrained optimization Q Interior-point method with iterative solvers

PETSc and TAO Application Driver Optimization Tools Linear Solvers Matrices PC KSP Application Initialization Vectors Function & Gradient Evaluation User code TAO code Toolkit for Advanced Optimization (TAO) Hessian Evaluation PETSc code Post. Processing

Using AD with Toolkits l l Apply AD to toolkit to produce derivative-enhanced toolkit Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at Q Parallel Function Level Q Subdomain Function Level Q Element/Vertex Function Level

Differentiated Version of Toolkit l l l Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks Ongoing work with PETSc and PVODE

Levels of Function Evaluation int Form. Function(SNES snes, Vec X, Vec F, void *ptr){ /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx; Parallel Function Level Subdomain Function Level ierr = DAGlobal. To. Local. Begin(user->da, X, INSERT_VALUES, local. X); CHKERRQ(ierr); ierr = DAGlobal. To. Local. End(user->da, X, INSERT_VALUES, local. X); CHKERRQ(ierr); ierr = Vec. Get. Array(local. X, &x); CHKERRQ(ierr); ierr = Vec. Get. Array(local. F, &f); CHKERRQ(ierr); ierr = DAGet. Corners(user->da, &xs, &ys, PETSC_NULL, &xm, &ym, PETSC_NULL); CHKERRQ(ierr); ierr = DAGet. Ghost. Corners(user->da, &gxs, &gys, PETSC_NULL, &gxm, &gym, PETSC_NULL); CHKERRQ(ierr); for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue; } u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*Petsc. Exp. Scalar(u); } } Vertex/Element Function Level ierr = Vec. Restore. Array(local. X, &x); CHKERRQ(ierr); ierr = Vec. Restore. Array(local. F, &f); CHKERRQ(ierr); ierr = DALocal. To. Global(user->da, local. F, INSERT_VALUES, F); CHKERRQ(ierr); ierr = PLog. Flops(11*ym*xm); CHKERRQ(ierr); return 0; }

Parallel Function Level l Advantages Q Well-defined interface: int Function(SNES, Vec, void *); void function(integer, Real, N_Vector, void *); l Q No changes to function Disadvantages Q Differentiation of toolkit support functions (may result in unnecessary work) Q AD of parallel code (MPI, Open. MP) Q May need global coloring

Subdomain Function Level l l Advantages Q No need to differentiate communication functions Q Interface may be well defined Disadvantages Q May need local coloring Q May need to extract from parallel function

Using AD with PETSc Global-to-local scatter of ghost values Local Function computation Parallel function assembly Local Function computation Script file Global-to-local scatter of ghost values Seed matrix initialization Local Jacobian computation Parallel Jacobian assembly ADIFOR or ADIC Coded manually; can be automated Local Jacobian computation

Using AD with TAO Global-to-local scatter of ghost values Local Function computation Parallel function assembly Local Function computation Script file Global-to-local scatter of ghost values Seed matrix initialization Local gradient computation Parallel gradient assembly ADIFOR or ADIC Coded manually; can be automated Local gradient computation

Element/Vertex Function Level l l Advantages Q Reduced memory requirements Q No need for matrix coloring Disadvantages Q May be difficult to decompose function to this level (boundary conditions, other special cases) Q Decomposition to this level may impede efficiency

Example int localfunction 2 d(Field **x, Field **f, int xs, int xm, int ys, int ym, int mx, int my, void *ptr) { xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym; if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i]. u = x[j][i]. u; f[j][i]. v = x[j][i]. v; f[j][i]. omega = x[j][i]. omega + (x[j+1][i]. u - x[j][i]. u)*dhy; f[j][i]. temp = x[j][i]. temp-x[j+1][i]. temp; } } if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i]. u = x[j][i]. u - lid; f[j][i]. v = x[j][i]. v; f[j][i]. omega = x[j][i]. omega + (x[j][i]. u - x[j-1][i]. u)*dhy; f[j][i]. temp = x[j][i]. temp-x[j-1][i]. temp; } } if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i]. u = x[j][i]. u; f[j][i]. v = x[j][i]. v; f[j][i]. omega = x[j][i]. omega - (x[j][i+1]. v - x[j][i]. v)*dhx; f[j][i]. temp = x[j][i]. temp; } } if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i]. u = x[j][i]. u; f[j][i]. v = x[j][i]. v; f[j][i]. omega = x[j][i]. omega - (x[j][i]. v - x[j][i-1]. v)*dhx; f[j][i]. temp = x[j][i]. temp - (double)(grashof>0); } } for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) { vx = x[j][i]. u; avx = Petsc. Abs. Scalar(vx); vxp = p 5*(vx+avx); vxm = p 5*(vx-avx); vy = x[j][i]. v; avy = Petsc. Abs. Scalar(vy); vyp = p 5*(vy+avy); vym = p 5*(vy-avy); u = x[j][i]. u; uxx = (two*u - x[j][i-1]. u - x[j][i+1]. u)*hydhx; uyy = (two*u - x[j-1][i]. u - x[j+1][i]. u)*hxdhy; f[j][i]. u = uxx + uyy - p 5*(x[j+1][i]. omega-x[j-1][i]. omega)*hx; u uxx uyy f[j][i]. v = x[j][i]. v; = (two*u - x[j][i-1]. v - x[j][i+1]. v)*hydhx; = (two*u - x[j-1][i]. v - x[j+1][i]. v)*hxdhy; = uxx + uyy + p 5*(x[j][i+1]. omega-x[j][i-1]. omega)*hy; u = x[j][i]. omega; uxx = (two*u - x[j][i-1]. omega - x[j][i+1]. omega)*hydhx; uyy = (two*u - x[j-1][i]. omega - x[j+1][i]. omega)*hxdhy; f[j][i]. omega = uxx + uyy + (vxp*(u - x[j][i-1]. omega) + vxm*(x[j][i+1]. omega - u)) * hy +(vyp*(u - x[j-1][i]. omega) + vym*(x[j+1][i]. omega - u)) * hx -p 5 * grashof * (x[j][i+1]. temp - x[j][i-1]. temp) * hy; u = x[j][i]. temp; uxx = (two*u - x[j][i-1]. temp - x[j][i+1]. temp)*hydhx; uyy = (two*u - x[j-1][i]. temp - x[j+1][i]. temp)*hxdhy; f[j][i]. temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1]. temp) + vxm*(x[j][i+1]. temp - u)) * hy + (vyp*(u - x[j-1][i]. temp) + vym*(x[j+1][i]. temp - u)) * hx); } } ierr = Petsc. Log. Flops(84*ym*xm); CHKERRQ(ierr); return 0; }

Experimental Results l l Toolkit Level – Differentiated PETSc Linear Solver Parallel Nonlinear Function Level – Sens. PVODE Local Subdomain Function Level Q PETSc Q TAO Element Function Level – PETSc

Differentiated Linear Equation Solver

Increased Accuracy

Sens. PVODE: Problem l l l Diurnl kinetics advection-diffusion equation 100 x 100 structured grid 16 processors of a Linux cluster with 550 MHz processors and Myrinet interconnect

Sens. PVODE: Time

Sens. PVODE: Number of Timesteps

Sens. PVODE: Time/Timestep

PETSc Applications l l Toy problems Q Solid fuel ignition: finite difference discretization; Fortran & C variants; differentiated using ADIFOR, ADIC Q Driven cavity: finite difference discretization; C implementation; differentiated using ADIC Euler code Q Based on legacy F 77 code from D. Whitfield (MSU) Q Finite volume discretization Q Up to 1, 121, 320 unknowns Q Mapped C-H grid Q Fully implicit steady-state Q Tools: SNES, DA, ADIFOR

C-H Structured Grid

Algorithmic Performance

Real Performance

Hybrid Method

Hybrid Method (cont. )

TAO: Preliminary Results

For More Information l PETSc: http: //www. mcs. anl. gov/petsc/ TAO: http: //www. mcs. anl. gov/tao/ l Automatic Differentiation at Argonne l Q http: //www. mcs. anl. gov/autodiff/ Q ADIFOR: http: //www. mcs. anl. gov/adifor/ Q ADIC: http: //www. mcs. anl. gov/adic/ l http: //www. autodiff. org

10 challenges for PDE optimization algorithms Problem size 2. Efficiency vs. intrusiveness 3. “Physics-based” globalizations 4. Inexact PDE solvers 5. Approximate Jacobians 6. Implicitly-defined PDE residuals 7. Non-smooth solutions 8. Pointwise inequality constraints 9. Scalability of adjoint methods to large numbers of inequalities 10. Time-dependent PDE optimization 1.

7. Non-smoothness l l PDE residual may not depend smoothly on state variables (maybe not even continuously) Q Solution-adaptivity Q Discontinuity-capturing, front-tracking Q Subgrid-scale models Q Material property evaluation Q Contact problems Q Elasto(visco)plasticity PDE residual may not depend smoothly or continuously on decision variables Q Solid modeler- and mesh generator-induced