Algorithmic Resource Verification Ravichandhran Kandhadai Madhavan Public Thesis

Algorithmic Resource Verification Ravichandhran Kandhadai Madhavan Public Thesis Defense Presentation Advised by Prof. Viktor Kuncak Reviewers Prof. James Larus, EPFL Prof. Rupak Majumdar, MPI SWS Dr. G. Ramalingam, MSR Jury president Prof. Martin Odersky, EPFL

Complexity of Software Are all these software out there functioning as expected?

Software Characteristics Input-output behavior Or correctness behavior Resource usage behavior Whether it performs a task correctly e. g. (a) the program sorts a list of integers (b) It computes strength of the EM field How well it performs the task e. g. (a) How long does it take to complete (b) How much memory does it need

Reasoning about software is difficult Euclid [c. 300 BC] def gcd(x, y) = { val rem = x % y if(rem (0 == y else gcd(y, rem } ( How many recursions does it take to compute the gcd?

Software Verification Objective Ideally fully automatic Satisfies Expectation Software Verifier Expectations of the Software Provided by writer or developer of the software Explanation for why the expectation fails

Resource Verification Satisfies Specification Software Verifier Expected Resource usage Behavior e. g. the program completes in (log size(input)) steps An input that violates the specification

Resource Verification •

Why Verify Resource Bounds? • Testing for performance bottlenecks is incomplete and often more difficult than testing correctness • Machine certified guarantee of performance • Performant code generally implies complex code – verifying resources means verifying complex pieces of code

Challenge 1: Measuring Performance Algorithmic resource usage # of instructions executed Physical resource usage time <= (3. 33 e-11)n + 10 ms Definable on the semantics of the language constructs

Algorithmic Resources Definable on the semantics of the language constructs Platform and machine independent Deterministic and easy to measure A reasonable measure of relative performance Tangible on general-purpose programs

Resources considered in the Thesis steps – # of evaluation steps/primitive operations (ALU + Mem) executed alloc – # of objects allocated in the heap stack – # of call stack locations needed in terms of 64 bit words rec – number of recursive calls made by a function depth – the length of the longest chain of data dependencies

More resources are definable Resident allocations • # of objects allocated - # of objects freed (or freeable) • Resident allocations is zero implies no memory leaks via heap N/w usage – # of network operations performed by a piece of code …

Challenge 2: Verification Strategy There is a trade-off between automation, expressiveness and user-interaction Specs satisfied Software Verifier Specifications Timeout/Failure An input that violates the spec

Challenge 2: Verification Strategy Expressivity, Interaction (Manual encoding of resource abstractions and proofs) Interactive Theorem Proving Contract-based Verification Limited expressivity due to restricted abstract domains Push-button Static Analyses Automation

Software Contracts def gcd(x, y) = { require(x >= y && y >= 1) val rem = x % y if(rem (0 == y else gcd(y, rem // preconditions ( } ensuring{ res => res>= 1 } // postconditions Preconditions are a characterization of the inputs to the system Postconditions are the properties of the output (or observables)

Why Contracts-based Verification? Contracts can express the guarantees of the operating environment and preconditions on the input High-level program properties and bounds can be specified by users Low-level constraint solving can be automated (using SMT solvers) Modular Approach – functions or modules are analyzed independently

Proposal: Contracts with Resource Bounds def gcd(x, y) = { require(x >= y && y >= 1) val rem = x % y if(rem (0 == y else gcd(y, rem ( } ensuring{ res => res >= 1 && steps <= ? *log 4 by 3)x+y)+? } Resource usage is made accessible in the postcondition Constants in the resource bounds can be left unspecified

Example: Height Balanced Tree

Specifying Balanced Property hbalanced(t) = height(t) =

Specifying Insert Function Specification and implementation language are the same Recursive functions and nonlinearity allowed in contracts

http: //leondev. epfl. ch

Resource-Verified Implementations Benchmark Search Trees (rbt) Heaps (binomial heap) Sorting (Heap Sort) (Merge Sort) Program Manipulation (constant folding) (nnf conversion)

Lazy Evaluations are suspended until needed Results of suspensions are cached for reuse (Suspended evaluations are closures with unit parameter that are memoized when forced) def f(n) = { … f(n-1) { def main() = { lazy val x = f(2) … val y = x + 2 return x + y }

Memoization Function results are cached for each distinct argument @memoize def f(n) = { … f(n-1) { Memo Table fun args f g h … res

Higher-order Functions + Memoization The approach was extended to higher-order functions + memoization [R. Madhaval et al. , POPL 2017] Subsumes lazy evaluation which is more performant than eager evaluation [R. Bird et al. 1997]

Importance of Lazy Evaluation “Lazy evaluation is strictly more efficient - in asymptotic terms - than eager evaluation, and that there is no general complexity-preserving translation of lazy programs into an eager functional language. ” - Richard Bird, Geraint Jones and Oege de Moor More Haste, Less Speed Journal of Functional Programming 1997 Practical Examples: • Okasaki’s lazy data structures • Dynamic programming algorithms • C# LINQ • Scala Streams

Functional Reasoning for Evaluation Results “Lazy evaluation, which applies functions to arguments whose evaluation is postponed until the first (but only the first) time their values are needed. This ensures that equational reasoning about the values of programs is sound …. ” - Richard Bird, Geraint Jones and Oege de Moor More Haste, Less Speed Journal of Functional Programming 1997 Functions are referentially transparent Heap can be assumed to be immutable (No frame problem) Fewer specification overhead, more efficient verification

Challenges in Verifying Resource Usage “This ensures that equational reasoning about the values of programs is sound, at the expense of making it extremely hard to reason about when expressions are evaluated, and to estimate how much work is involved in doing so. ” - Richard Bird, Geraint Jones and Oege de Moor More Haste, Less Speed Journal of Functional Programming 1997 Bad news: the state of the cache is visible in the resource behavior Good news: the state is well-behaved and grows monotonically

Streams s 1 g(1) 2 g(2) tail 3 g(3) tail

Streams s 1 g(1) 2 g(2) tail 3 g(3) Say g(n) takes O(n) time tail take(n, s) … take(n, s) takes O(n 2) time takes O(n) time Invariant: tail of first n elements of s is memoized Resource Bound: take(n, s) runs in time liner in n

Cache-dependent Specifications cached(x. g) - true iff the function g is memoized for the argument x concrete. Until(n, s) =

Example 2: The Knapsack Problem •

Bottom-up Iteration 1 n 2 Iterations

Specifying Cache Invariants cached(g x) - true iff the function g is memoized for the argument x ksp(n, items) =

Aliasing of Closures Equality between closures needed to specify performant data structures [Conc-trees for functional and parallel programming. A. Prokopec and M. Odersky. LCPC 2015]

Semantics of Closure Equalities Closures are lambdas without free variables Our system uses intensional or structural equality Benefits: 1. Can encode reference equality of closures 2. Can be checked efficiently using SMT theories

Recap of the Specification Strategy 1. Resource bounds are expressed in post conditions as templates with holes 2. Recursive functions and nonlinearity allowed in contracts 3. A construct cached queries the state of memoization 4. Closures can be compared for equality

Verification Approach

The Verification Problem 1. Check that the invariants in the contracts are valid 2. Infer values for holes that make the bound hold for all executions 3. Make the bound as strong as possible

The Grand Verification Challenge Program Features Recursive Datatypes Recursive functions Nonlinear arithmetic Contracts Resource templates (Memoization First-class functions) Decidable Theories ? Boolean/Bit-vector Arithmetic Inductive datatypes Uninterpreted functions Sets

The Approach Resource Bound to Invariant conversion (a) Instruments expressions with resources (b) Defunctionalizes indirect calls (c) Tracks the cache state First-order program phase References: [R. Madhavan & V. Kuncak CAV 2014] [R. Madhaval et al. , POPL 2017]

Modular Instrumentation

Instrumentation of Expressions Resources are propagated bottom-up Parameterized by cost of operations

Invariant Checking to Logical Formulas def f(x) = { require(g(x)>=0) … r = h(x) … } ensuring(res <= a*p(x)+b)

VCs with Free Variables •

Solving Verification Conditions Unfolding of Recursive Functions No solution Nonlinearity Eliminationi Conflicting disjunct Counter-example guided solving References: [R. Madhavan & V. Kuncak CAV 2014] [R. Madhavan et al. , POPL 2017]

Counter-Example Guided Solving Eliminate UF and ADTs Pick a disjunct satisfiable under the guess Next guess No solution Unsat

Eliminating UFs and ADTs • Suffices to instantiate Injectivity axiom for ADTs • Completeness is preserved by the elimination • Proved in the dissertation in section 3. 4 (Theorem 16) • Two key reasons • Assignments to holes do not affect the shapes of ADTs • Elimination is performed on a satisfiable disjunct

Solving Numerical Parametric Disjuncts Find values for a, b, c s. t. the formula becomes unsatisfiable Multiply by unknown non-negative values Add the inequalities Add an unknown non-neg value Equate to 1 <= 0

Farkas’ Constraints Nonlinear Constraints Every solution for the constraints will make the inequalities unsatisfiable

Strengthening of Bounds steps <=a*size(t)+b

Benchmarks & Empirical Studies

Benchmarks Handpicked most sophisticated higher-order/ lazy functional data structures No prior formal technique exists for many benchmarks. Examples Include: • • • Okasaki’s lazy data structures e. g. constant time queue, deque Splitable and concateable lists: conc-trees [Prokopec et al. ] Lazy Sorting and Kth order statistics e. g. lazy merge sort Dynamic programming algorithms e. g. knapsack, packrat parsing Infinite, cyclic streams e. g. hamming stream

Benchmark Statistics Currently is evaluated on 50 Scala programs implementing complex algorithms Benchmarks comprise 10 K lines of Scala code (max per benchmark 880) and 250 functions with resource templates. Analysis time was at most 5 min. On average a few tens of seconds

Implementation Details The implementation named: Orb is a part of the Leon verification and synthesis framework It is free and open-source: https: //github. com/epfl-lara/leon/tree/infer. Inv leondev. epfl. ch The implementation relies on Leon for parsing Scala programs and accessing SMT solvers

Sample Bounds Inferred by the Tool Functions / Operations Accessing kth min using lazy merge sort Persistently dequeing an element Concat function of a partitionable list (Conc. Tree) Viterbi dynamic programming algorithm Packrat Parsing

Accuracy Measurements Ratio of worst case dynamic resource usage to the static estimate Dynamic usage / static estimate 100 98 96 94 92 90 88 88 86 84 82 80 81 steps alloc

Knapsack Algorithm Comparison of static estimate vs dynamic resource usage when w = i. size

Longest Common Subsequence

Lazy, Bottom-up Merge Sort (Finding kth Minimum from an unsorted list)

Contributors to Imprecision 1. The template which specifies the closed form solution 2. The constants inferred by the tool The imprecision due to the template form is unavoidable One should compare against the best possible solution for a given template

Pareto Optimal Bounds Pareto Optimal Bound w. r. t dynamic runs: Obtained by minimizing one constant in the inferred bound until it underapproximates the resource usage of a run

Pareto Optimal Usage Vs Static Estimate 99 94 94 91 89 88 84 79 81 74 steps accuracy alloc pareto optimal

Our Inference Algorithm vs CEGIS diverges on all benchmarks On restricting solutions to [- 200, 200] CEGIS scaled to 5 small benchmarks • It was 2. 5 times to 64 times slower Reason: our algorithm eliminates an infinite set of counter-examples in every iteration

Outcomes of this Research Estimating performance is a hard problem but a right combination of user inputs and automation can make it practically feasible Demonstrated on very complex benchmarks • Lazy data structures like Scheduling-based queues • Optimizers and parsers Extensible to other resources, amortized bounds and relational reasoning Powerful Approach CAV ’ 14 , POPL ’ 17 FOSS Implementation (github. com/epfl-lara/leon) Soundness proofs

A Novel Strategy for Resource Verification Expressivity, Interaction • Danner et al. PLPV’ 13 • Sands ESOP’ 90 • Chargueraud, ITP’ 15 • Nipkow ITP’ 15 Interactive Theorem Proving • Madhavan et al. CAV’ 14 • Madhavan et al. POPL’ 17 • Carbonneaux et al. PLDI’ 14 Contract-based Verification • Alonso-Blas et al. • SPEED POPL’ 09 ESOP’ 90 • Le Metayer, TOPLAS’ 88 • Alias et al. SAS’ 10 • Navas ICFP’ 07 • COSTA TCS’ 12 • Vasconcelos et al. , ESOP’ 15 • RAML, CAV’ 12 • Zuleger et al. , SAS’ 11 • Danielsson, POPL ‘ 08 • Sinn et al. , CAV’ 14 Push-button Techniques Automation

A Novel View of Performance Verification Resource bounds are invariants of programs that are instrumented with their resource usage Instrumentation can be performed modularly Modular, inductive reasoning can be used for proving resource bounds

New and Effective Exists-Forall Solver Numerical holes are inferred in the presence of recursive functions, data types and nonlinearity The constants inferred are minimal and precise

Future Applications Embedded and Real Time systems Compliance with time and memory limits are critical for functioning of the system

Future Applications: Runtime Optimizations • Choose the best performing implementation from the existing options • E. g. we can use counting sort or merge sort depending on the range of inputs time(counting-sort(size, range)) = a * range + b time(merge-sort(size)) = c * size * log(size) + d

Future Applications: Verifying Security Properties Side-channel Attacks

Future Application: Towards Approximating WCET for High-level Programs Steps vs Wall-clock execution time (measured in nano seconds) Benchmark Lazy selection sort Nano seconds per step (Mean ± Abs Deviation) Precision of static steps estimate 1 ± 0. 01 99% Lazy Merge sort 90% Okasaki’s Real-time Queue 93%

GCD Verified! Euclid [c. 300 BC] def gcd(x, y) = { val rem = x % y if(rem (0 == y else gcd(y, rem } Does it really compute gcd? How many steps does it take to compute the gcd? (

Related Work Resource Analyses • Push-button: COSTA [Albert et al. ’ 10], Avanzini et al. ‘ 15, Danielsson ’ 08, SPEED [Gulwani et al. ’ 09], RAML [Hoffmann et al. ’ 12, Jost et al. ’ 10] • Interactive Theorem Proving: Bezinger ‘ 04, Danner et al. ‘ 13, Sands ’ 90, Chaugerad et al. ’ 17 • Specification-based: Alonso-Blas et al. ‘ 12, Carbonneaux et al. ‘ 14 Higher-order Program Verification • HO Contracts: Findler et al. ‘ 02, Kobayashi ’ 11, Voirol et al. ’ 16

Related Work Correctness Verification • Template-based Invariant Inference: Bayer et al. ‘ 07, Sankaranarayanan et al. ’ 04, Cousot ’ 05 • Contract-based Verification: Dafny [Leino ’ 10], JStar [Distefano et al. ‘ 08], Jahob [Zee et al. ’ 08], Verifast [Jacobs et al. ’ 11] • Software Model Checking: SLAM [Ball & Rajamani ’ 02], Cousot ’ 05, BLAST [Henzinger et al. ’ 02], Yogi [Nori et al. ’ 09]