Robust Optimization Concepts and Examples Yuriy Zinchenko Shane

Robust Optimization Concepts and Examples Yuriy Zinchenko Shane G. Henderson ORIE, Cornell University Zinchenko and Henderson 2005

Outline • What can go wrong with LP? • A familiar blend problem • The general picture – Robust linear programming – Software, resources, practicalities • Radiation therapy for cancer treatment Zinchenko and Henderson 2005 2

What can go wrong with LP? Tough LP problem: max x + y s/t 1 x 1 1 y 1 x, y 0 ? Zinchenko and Henderson 2005 3

Blend Problem $ $$ but properties change with time $$$ blend to get output properties at minimum cost for any input properties within reason Zinchenko and Henderson 2005 4

Blend constraints • Typical constraint looks like Low ≤ 10 x 1 + 12 x 2 + 7 x 3 ≤ High • Changes to Low ≤ a 1 x 1 + a 2 x 2 + a 3 x 3 ≤ High for any vector a that is “close” to (10, 12, 7) Zinchenko and Henderson 2005 5

General robust LP min c. Tx s/t A(1) x b 1 A(2) x b 2 A(3) x b 3 Zinchenko and Henderson 2005 6

A more detailed view Simple linear constraint ax 1 x 0 with a “close” to 1, namely 0 a 2 Want x to work for all such a How do we deal with it? Zinchenko and Henderson 2005 7

ax 1, x 0 for all max a x 1, 0 a 2 x 0 0 a 2, x 0 2 x 1, x 0 x 1/2 , x 0 Zinchenko and Henderson 2005 8

A slightly more involved example: ax+by 1 where (a, b) “close” to (1, 1), namely in Ellipsoidal (spherical) “uncertainty” set U (a, b) is in U if (a, b) = (a 0, b 0) + (Da, Db) with (a 0, b 0) = (1, 1) and Da 2 + Db 2 1 Zinchenko and Henderson 2005 9

Ellipsoidal “uncertainty” set U (a, b) = (a 0, b 0) + (Da, Db) (a 0, b 0) = (1, 1) Da 2 + Db 2 1 U (a 0, b 0) Want (x, y) to satisfy a x + b y 1, for all (a, b) from U Zinchenko and Henderson 2005 10

ax+by 1 max for all (a, b) in U ax+by 1 (a, b) in U What can we say about ax+by? a x + b y = (a 0 + Da) x + (b 0 + Db) y = (a 0 x + b 0 y) + (Da x + Db y) Zinchenko and Henderson 2005 11

For a moment, think of (x, y) as your objective function (fixed) max a x + b y ( 1 ? ) (a, b) in U (x, y) U (a 0, b 0) same as (a 0 x + b 0 y) + max (Da x + Db y) ( 1 ? ) Da 2 + Db 2 1 Zinchenko and Henderson 2005 12

max (Da x + Db y) ( 1 - (a 0 x + b 0 y) ? ) Da 2 + Db 2 1 Here Da x + Db y ||(x, y)|| = (x 2 + y 2)1/2 (x 1, y 1) U (a 0, b 0) (x 2, y 2) the “length” of (x, y) Zinchenko and Henderson 2005 13

ax+by 1 max for all (a, b) in U ax+by 1 (a, b) in U (a 0 x + b 0 y) + max (Da x + Db y) 1 Da 2 + Db 2 1 ||(x, y)|| 1 - (a 0 x + b 0 y) Zinchenko and Henderson 2005 14

Good news Can handle constraints of this type ||(x, y)|| 1 - (1 x + 1 y) easily (the so-called second-order conic programming (SOCP)) Not much harder than linear programming! Zinchenko and Henderson 2005 15

General Robust LP formulation Robust LP: max c. Tx s/t A(i) x bi, i = 1, …, m where c, x Î Rn, A(i) Î R 1 x n, A(i)=A(i)0 + wi Pi with wi Î R 1 x ki, ||wi|| 1, i=1, …, m, Pi Î Rki x n Zinchenko and Henderson 2005 16

SOCP equivalent: max c. Tx s/t || Pi x || bi - A(i)0 x, i = 1, …, m Probabilistic interpretation: think of A(i) taken from an a-level set of your favorite probability distribution (e. g. multivariate normal) the robust constraint will read satisfy the constraint with a given probability a Zinchenko and Henderson 2005 17

Where’d the ellipse come from? • Expert opinion • Statistics: Averages live in ellipsoids • Doesn’t have to be an ellipse. Can be some other shape (e. g. , boxes) Zinchenko and Henderson 2005 18

Software Commercial: • Mosek (http: //www. mosek. com/) “Free”: • Se. Du. Mi (http: //sedumi. mcmaster. ca/) • SDPT 3. x (http: //www. math. nus. edu. sg/~mattohkc/sdpt 3. html/) Zinchenko and Henderson 2005 19

Practicalities • Realistic problem sizes – number of variables/constraints on the order of 103 – 104 – depends (greatly) on the problem data structure/sparsity • Possible to obtain a “good”, “inexpensive” approximation with LP Zinchenko and Henderson 2005 20

Generality • Possible to extend this approach to quite a few other convex programming problems Resources • Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications by A. Ben-Tal, A. S. Nemirovskii • Google for Robust Optimization (robust LP etc. ) Zinchenko and Henderson 2005 21

Joint work with Millie Chu (Cornell) and Michael B. Sharpe (Princess Margaret Hospital, Toronto) Zinchenko and Henderson 2005 22

Cancer treatment • About 1. 3 million new cancer cases in the U. S. each year • Nearly 60% receive radiation therapy (in conjunction with surgery, chemotherapy etc) Zinchenko and Henderson 2005 23

External beam radiation therapy • Radiation delivered by a linear accelerator • Cancer cells more susceptible than normal cells • Overlay beams from different angles • Dose given in daily fractions for ~ 6 weeks Zinchenko and Henderson 2005 24

Intensity Modulated Radiation Therapy • Block parts of the radiation beam – discretize the whole beam into a grid of smaller “beamlets” • Choose different intensities for each beamlet Intensity Modulated Radiation Therapy Collaborative Working Group, 2001 Zinchenko and Henderson 2005 25

Treatment Planning Goal: Choose beam angles and beamlet intensities that deliver enough radiation to kill all tumor cells, while avoiding healthy organs & tissue as much as possible - Take CT scan - Delineate target region and healthy structures - Discretize body as small cubes, or “voxels” - Formulate & solve a mathematical program to find a Zinchenko and Henderson 2005 “good” plan 26 Princess Margaret Hospital

Robust Treatment Planning Setup errors + Patient motion + Structural changes during treatment = uncertainty in geometry • Don’t rescan patient much if at all • Use RO to “robustify” mathematical program Zinchenko and Henderson 2005 27

Model Formulation • Many different formulations exist – we use a penalty formulation minimize: penalty objective subject to: Pr(Dose to voxel i in healthy structure k ≤ Uk) ≥ 0. 95 Pr(Dose to voxel i in tumor ≥ L) ≥ 0. 95 x = beamlet intensities Zinchenko and Henderson 2005 ≥ 0 28

Computational Results • Prostate: tumor + 5 healthy regions • 5 equi-spaced beams, ~ 225 beamlets from each angle • Voxel size = 2 cm, ~ 400 total voxels • Solver: Mosek, v. 3. 0. 1. 18 • Solve time = 6 seconds (LP), 45 minutes (SOCP) Zinchenko and Henderson 2005 29

Dose-Volume Histograms % of structure receiving ≥ x Gy deterministic solution’s plan DVH of expected dose stochastic solution’s plan Zinchenko and Henderson 2005 30

$Comparison • Simulate 10 treatments (45 fractions each) • For each of the 10$

Comparison • Simulate 10 treatments (45 fractions each) • For each of the 10 treatments, and for each solution (deterministic & stochastic), – calculated dose delivered to each voxel in each fraction – summed over the 45 fractions to get total dose delivered to each voxel – plotted DVH Zinchenko and Henderson 2005 31

DVH – Treatment 1 det stoch Zinchenko and Henderson 2005 32

DVH – Treatment 2 det stoch Zinchenko and Henderson 2005 33

DVH – Treatment 3 det stoch Zinchenko and Henderson 2005 34

DVH – Treatment 4 det stoch Zinchenko and Henderson 2005 35

DVH – Treatment 5 det stoch Zinchenko and Henderson 2005 36

DVH – Treatment 6 det stoch Zinchenko and Henderson 2005 37

DVH – Treatment 7 det stoch Zinchenko and Henderson 2005 38

DVH – Treatment 8 det stoch Zinchenko and Henderson 2005 39

DVH – Treatment 9 det stoch Zinchenko and Henderson 2005 40

DVH – Treatment 10 det stoch Zinchenko and Henderson 2005 41

Conclusions • LP “pushes you into a corner” • True situation never same as data • Robust LP: Find good solution that is always feasible within reason • Efficient solution methods: can solve real problems • Software available Zinchenko and Henderson 2005 42

A Bit More Detail • Di(x) = Dose delivered to voxel i in N fractions, with intensities x, a random variable Di(x) is the sum of N random variables (N = 45), assume iid, apply CLT, so Di(x) is approximately normally distributed • Take n sample shifts, s 1, . . . , sn, with associated probabilities p = (p 1, . . . , pn)T • Let ai(∙)T = ai(s 1)T ai(s 2)T … dose delivered to voxel i, shifted by sj, from each beamlet with unit intensity ai(sn)T so that Np. Tai(∙)Tx = expected total dose delivered to voxel i, for N fractions. • Let vi(x) = sample variance of dose delivered to voxel i Di(x) ~ Normal ( Np. Tai(·)Tx, Nvi(x) ) Zinchenko and Henderson 2005 43

Probabilistic Constraints • Want constraints to be violated with low probability (say, δ =. 05) • Example: maximum dose constraint on voxel i in Hk: Assuming Di(x) ~ Normal ( Np. Tai(∙)Tx, Nvi(x) ), m k Want P(Di(x) > mk) ≤ δ • Second order cone program (SOCP) Zinchenko and Henderson 2005 44

$Dose-Volume Constraints • Physicians like constraints of form: “<= fraction fk of structure Hk$

Dose-Volume Constraints • Physicians like constraints of form: “<= fraction fk of structure Hk gets >= dk” • 0 -1 var for each voxel: = 1 if dose is > dk. • MIP: Hard to solve! • Many voxels get near max allowed dose • Alternative: upper bound the “excess” dose. For healthy structure Hk, we require: • Linear constraints☺ Zinchenko and Henderson 2005 45