COURSE INTRODUCTION CS 703 Program verification and synthesis

2 Instructor • Loris D’Antoni • Assistant Professor • Member of mad. PL group

3 Logistics Lecture • When: MWF 11 -1215 • Where: Canvas BBCUltra • We’ll

Goal and activities 1. Understand what program verification and synthesis can do and how

Evaluation 35% : Assignments – 2 homework problems and 1 -2 programming assignments 10%

Paper reviews Posted on Canvas ahead of due date You can already find several

7 Assignments Logistics • On canvas • Late penalty 10% of grade each day

8 Course structure Program verification • Modeling programs (e. g. , finite state machines)

9 Project Logistics • Most important part of the course • Teams of 1

10 Project evaluation Kinds of project • • • A theoretical problem (related to

Projects people did in the past 1/ Mechanical suggestions for rust lifetime compiler errors

Projects people did in the past 2/ Introducing Program Synthesis through the Creation of

Projects people did in the past 3/ Learning Symbolic Automata Over BDDs The L∗

14 Synthesizing Switch Table Rules For End-to-End Policy Enforcement In this talk, I will

18. 2 million software developers worldwide Everybody wants to write programs

Ariane disaster, 1996 $500 million software failure B G U C S T S

Solutions Program verification Does my program do what it is supposed to do? Program

22 What is program verification? Input E L B Output Program N U A

23 How bad is it? Even the “easiest” verification problem is NP-Complete (SAT)

25 Reasons behind success Wonderful theory Automata theory Model checking Abstraction Domainspecific knowledge Programming

30 Never give up • Automatic program repair • Parsers from examples • Network

Flash. Fill: a feature of Excel 2013 (Sumit Gulwani et al. )

Practical and predictable program synthesis Specification Quantitative Objectives Search Space Unified Synthesis Framework Best

Slides: 39

Download presentation

COURSE INTRODUCTION CS 703 – Program verification and synthesis Loris D’Antoni

2 Instructor • Loris D’Antoni • Assistant Professor • Member of mad. PL group since 2015 • Before: Ph. D at the University of Pennsylvania • Research areas: • Program synthesis • Program verification • Applications to: ML, Networks, Education

3 Logistics Lecture • When: MWF 11 -1215 • Where: Canvas BBCUltra • We’ll end lectures a few weeks before end of course to focus on project Office hours • When: M 4 -5 • Where: Canvas BBCUltra Course Website • https: //pages. cs. wisc. edu/~loris/cs 703/ • Questions via Piazza

Goal and activities 1. Understand what program verification and synthesis can do and how Lectures + reading materials (mostly research papers) 2. Use state-of-art tools Project! 3. Contribute to existing techniques and tools

Evaluation 35% : Assignments – 2 homework problems and 1 -2 programming assignments 10% : Paper reviews (around 10 papers in the second half of the course) • Short paper summaries plus critiques of the paper (e. g. , possible improvements) 35% : Research project • 5% Project ideas • 10% Project proposal + Progress reports • 20% Final project report 10% : Class participation • Questions during class and (most importantly) during project presentations • The following methods are all good: class questions + chat questions + Piazza 10% : Project presentation

Paper reviews Posted on Canvas ahead of due date You can already find several of the papers there (please don’t review them yet!)

7 Assignments Logistics • On canvas • Late penalty 10% of grade each day up to 4 days max • Have to be completed individually Homework assignments • Cover first half of the class • Please type the solutions so that I can read them Programming assignments • Typically of the form “Use tool X to solve the following problem…”

8 Course structure Program verification • Modeling programs (e. g. , finite state machines) • Modeling specifications and requirements (e. g. , all locks should be released) • Algorithms for checking that the program meets the requirement (model checking) • Extensions to more complex programs (recursion, strings, etc. ) Program synthesis • Modeling specifications and requirements (slightly different than verification) • Defining search spaces of possible programs (e. g. , grammars, sketches) • Algorithms for automatically finding programs in the search spaces that meet the specification! • Guest lectures from other synthesis experts!

9 Project Logistics • Most important part of the course • Teams of 1 or 2 people • Expectations commensurate with size of team Deliverables • Sep 28: • Oct 16: • Oct 30: • Nov 13: • Early Dec: • Dec 10: Email me a list of 3 project ideas Project proposal Description of progress 1 Description of progress 2 Presentation to the class Final project report

10 Project evaluation Kinds of project • • • A theoretical problem (related to formal methods) Re-implement a technique from a paper Apply the concepts learned in class to a new domain Extend/improve existing synthesis/verification techniques Develop a new synthesis tool for a specific task … Judged in terms of • Quality of execution • Originality • Scope If the project is good and new we can turn it into a research paper!

Projects people did in the past 1/ Mechanical suggestions for rust lifetime compiler errors Rust is a new systems programming language that aims to allow low-level control and high performance, like C/C++, but with memory safety and data-race-freedom guaranteed statically. Rust accomplishes this using a compiler pass called the borrow checker. Unfortunately, the borrow checker's compile errors are sometimes difficult to understand. Beginners and advanced users alike are often not sure how to resolve them. In our work, we propose a tool that uses the compiler's analyses to mechanically produce suggestions for how to resolve borrow checker errors. Learn JSON Schema from Positive and Negative Examples JSON Schemas provide an explicit specification for data interchange that well balances human and machine readability. Without this balance, we will not have a transparent and smart web. Unfortunately, a large portion of the web's JSON follows know schemas[Bex]. This is often because of opaque APIs and fast-moving developers. Therefore we present a tool capable of learning JSON schemas from example. We first convert schemas into Symbolic Tree Automata. We then lift the Lambda* Algorithm[DDA] from Symbolic Automata to Symbolic Tree Automata, and learn schema from positive and negative examples. Converting back to JSON schema, we evaluate equivalence using both human inspection, and a probabilistic evaluation of generated member JSON objects against an API. We believe that this tool will help developers standardize unstandardized data.

Projects people did in the past 2/ Introducing Program Synthesis through the Creation of Visual Art There is a substantial diversity gap in the field of computing. The composition of the field’s workforce does not reflect that of the country’s population. A primary reason why women and racial and ethnic minorities are underrepresented in computer science fields is lack of exposure. The broad goal of this project is to garner interest in computer science among groups that may not otherwise be introduced to it through a series of workshops. In these workshops, which will be advertised to students from different departments on campus, attendees will use program synthesis to create some form of visual art using a tool specifically created for this purpose. This tool allows students to explore program synthesis and try out ideas through the creation of art. Explaining the Behaviors of an ML Model via Formal Language Learning Despite their popularity and commercial success, neural networks remain opaque to human users. Even with access to a neural network's architecture and parameters, it is difficult to gain any insight or explanation of "why" a network makes its decisions. In this work, we set our sights on a Recurrent Neural Network model and attempt to use methods from formal language learning to produce human-intepretable explanations of its decisions. Specifically, we attempt to explain the behavior of SLANG, an automatic code-completion framework (Raychev et al. , 2014).

Projects people did in the past 3/ Learning Symbolic Automata Over BDDs The L∗ algorithm is a well-known solution to learning regular languages, but it does not scale well for large alphabets because it has to perform many inclusion queries. It is more tractable to represent regular languages over large alphabets as symbolic automata, since given a sufficient predicate language, they require fewer transitions. There exists prior work on learning symbolic automata: in particular, Mens and Maler have adapted L∗ to learn symbolic automata where the predicates are unions of intervals. However, it is still remarkably inefficient at learning predicates like even and odd. Binary decision diagrams (BDDs) over bitvectors could encode these predicates quite easily and are not too bad at representing large contiguous intervals. Through an algorithm for refining BDDs given a counterexample, it is feasible to follow Mens’s and Maler’s work and learn symbolic automata over BDDs. Verification of Quantum Communication Protocols via Model Checking Given previously developed ideas for QKD protocols such as BB 84, we present methods for verifying the security of these protocols via model checking. Using a probabilistic model checkers such as PRISM, we are able to encode security protocols and construct PCTL formulas to formally verify the properties of these systems. We show that given unconditional power to an interceptor through various attack vectors these system remain secure. We present an overview on basic properties of qubits and the BB 84 protocol and then move on to how the system was implemented and tested in PRISM.

14 Synthesizing Switch Table Rules For End-to-End Policy Enforcement In this talk, I will be presenting my work on Genesis, a system build to synthesize switch table rules for enforcing end-to-end policy enforcement. The end-to-end policies supported by Genesis are (1) Reachability between two hosts for a packet class (2) Set of waypoints (intermediate switches in the path) that a packet class must traverse. (3) Link-level isolation between two packet classes. (2) and (3) are NP-hard problems, so we leverage the growing emergence of fast SAT solvers to synthesize paths for each packet class (and subsequently extract switch table rules from the paths). In the domain of software-defined networks, we can have a centralized controller application which can synthesize these rules before deployment and add them to the switches of the topology. I will discuss the model in short used to encode the policy enforcement into a SAT instance. We use Z 3 to solve the SAT instance (or report if the policies are not realisable) and extract the paths for each packet class. However, for real cases, preliminary approach demonstrated the failure of this approach to scale with increasing the number of policies and topologies sizes. However, with datacenter topologies which have a large number of paths between switches, there can exist multiple solutions for the policies. We leverage this to design an optimistic synthesis procedure with recovery mechanisms to scale the problem. The approach in principle synthesizes a sub-problem, and tries to find the solution to other sub-problems with this solution as a constraint, hence optimistic because the sub-problem may yield a solution which is incompatible with the other sub-problems. In the last section of my talk, I will present some procedures to scale which leverages topology shapes and symmetries.

15 WHAT IS THIS COURSE ABOUT?

18. 2 million software developers worldwide Everybody wants to write programs

Ariane disaster, 1996 $500 million software failure B G U C S T S O Y E N S E IV L D N A FDIV error, 1994 $500 million O M Estimated worst-case worm cost: > $50 billion

Solutions Program verification Does my program do what it is supposed to do? Program synthesis Can you generate a program that does what I have in mind and does not contain bugs?

21 PROGRAM VERIFICATION IN 4 SLIDES

22 What is program verification? Input E L B Output Program N U A D I C E D Property No null pointer exception is ever triggered YES Proof Verifier NO Counterexample

23 How bad is it? Even the “easiest” verification problem is NP-Complete (SAT)

24 Never give up

25 Reasons behind success Wonderful theory Automata theory Model checking Abstraction Domainspecific knowledge Programming device drivers Malware fingerprinting API usage in Android Router filtering Security protocols String encoding Engineering efforts SAT solvers SMT solvers … We will learn a bit about each of these topics in this class!

26 PROGRAM SYNTHESIS

Goal: Automate programming tasks

28 What is program synthesis?

29 E R O M E N L E B EV CIDA E D N U What is program synthesis? Input Output User intent Program 1 -> 0 340 -> 300 568 -> 500 Domain knowledge Program can only use: Length(x), if(x)then y else z, x[i], … Synthesizer Function f(x){ If(length(x)<3) return 0 Else return x[0]+`00’ }

30 Never give up • Automatic program repair • Parsers from examples • Network updates from specification • Biological models from mutations • Automatic feedback for programming assignments [Singh+14] • Reactive controllers • Flash. Fill video [Gulwani 11]

Flash. Fill: a feature of Excel 2013 (Sumit Gulwani et al. )

Real world application of synthesis

35 A BIT ABOUT MY RESEARCH

Practical and predictable program synthesis Specification Quantitative Objectives Search Space Unified Synthesis Framework Best program in the search space satisfying the specification Search algorithms Static analysis Constraint solvers Theoretical guarantees Proof that no program exists

Verifiable machine learning

Intent-based networking

Direct code manipulation