Programming Languages and Systems at Berkeley and Beyond

Programming Languages and Systems at Berkeley and Beyond Past, Present, and Future Kathy Yelick Slide 1

The Questions • Programming Languages and Systems (PL&S): – aka Languages: » this is too narrow (some of us don’t do much “language” research) – aka Software: » this is too broad (what doesn’t involve software? ) • Who are we? • What do we do? Slide 2

The Culture of PL&S • The middle management of EECS – Blamed for » » slow execution time buggy software low programmer productivity languages that are too big, restrictive, ugly, etc. – Need to have control over » hardware complexity » programmer quality » consumers (features over robustness) Slide 3

The Big Motivators • Ease of Programming – Hardware costs -> 0 – Software costs -> infinity • Correctness – Increasing reliance on software increases cost of software errors (medical, financial, etc. ) • Performance – Increasing machine complexity – New languages and applications » Enabling Java; network packet filters Slide 4

History of Programming Language Research General Purpose Language Design Parsing Theory Domain-Specific Language Design Type Systems Theory Flop optimization Memory Optimizations Data and Control Analysis Program Verification 70 s Garbage Collection Type-Based Analysis Threads Program Checking Tools 80 s 90 s 2 K Slide 5

Topics • Programming Language and Systems Research – Language Design – Compilers & Tools – Libraries & Runtime Systems – Software Engineering • Berkeley Projects: Current and Future – BANE – Titanium – Proof Carrying Code • Future Emphasis: Reliability Slide 6

Language Design • Economics of programming languages – Programming training is the dominant cost » implies languages are rarely replaced – Languages are adopted to fill a void » not because of language quality • Is there anything left for PL designers? – Niche languages: » Everyone does language design, but doing it well is hard – Understanding languages: » E. g. , Titanium’s type system is sound, Split-C’s is not • Language design at Berkeley: – Lisp (Fateman), Ada (Hilfinger), Tioga (*), Titanium (*) Slide 7

Compilers and Tools • Economics of compilers – Large industrial teams built commercial compilers • How can academia compete? – Focus on new algorithms and future problems – Need software infrastructure for experiments » from others (SUIF, gcc) or our own (Titanium, BANE) • Compilers and Runtime Systems at Berkeley – Historical and continuing strength » » Code gen, profiling (Graham), sw pipelining (Aiken) Analysis and optimization of parallel code (Yelick) Automatic (compile-time) memory management (Aiken) Environments (Graham, Fateman) Slide 8

Libraries • Open problems in complex platforms/applications – Scientific libraries (overlaps with Sci. Comp group) – Parallel and distributed machines • Economics of Libraries – Market and competition are less intense – Can’t afford to hand-code for each machine • Berkeley strength: » Load balancing (Graham, Yelick, and many others) » Data structures (Yelick), matrices (Demmel, Kahan, Yelick), Meshes (Shewchuk) » High precision (Demmel, Fateman, Kahan, Shewchuk) » Symbolic (Fateman, Kahan) – New: tools to automate library construction Slide 9

Software Engineering • Economics of Software Engineering – Robust software is expensive • Old approaches: – Formal: Verification, specification – Informal: Software process, patterns • What Berkeley is doing: » » » Automatic analysis of large programs (Aiken) Software fault isolation (Graham) Proof Carrying Code (Necula) Model checking (Henzinger, Brayton, S-V) Experience (lots of large software construction projects) • What’s missing? – “Core” Software Engineering Slide 10

Projects: Titanium • Problem: portable scientific computing • The Approach – Domain-specific language and compiler: » Old applications: astrophysics, combustion » New applications in Bioengineering • modeling the cell to cure cancer (Arkin) • modeling bio-MEMs devices for treatment (Liepmann) – Language design » Dialect of Java with in-house compiler (to C) » Support for fast, safe multidimensional arrays » Types for distributed data, regions – Optimizations » Communication, memory, arrays, synchronization Slide 11

Projects: BANE • Problem: removing bugs from large programs • The Approach – automatic analysis – discover small facts about big programs – Target: 1, 000 line systems • Examples: – Find relay races in RLL programs » RLL used in >50% of factories, at Disneyland, etc. – Prove C programs are Y 2 K ready » CVS 1. 10 is OK, CVS 1. 9 is not – Detect buffer overruns in security-critical code Slide 12

Projects: Proof Carrying Code • The Problem: – How can I trust code from another language, person, machine? • The Approach: – programs carry a proof of what they promise » Semantic analog of digital signatures » Properties often from program analysis (e. g. , types) » Passed through compilation by validating translations – client’s cheap trusted verifier checks the proof • Applications – Very fast network packet filters – “Native code” in ML that is safe – Mobile code security Slide 13

Reliable Computing (Future) • Problem: build more reliable systems • Approaches: – Build from reliable components » Better languages for system design (H*) » Better environments for particular domains (F, G) » Build semantic models of system behavior (A, H, N) – Build reliable systems from unreliable components by spend cheap hardware resources (H, K, P, Y) » » Introspection of network, disks, processor, software Use statistical models to determine normal/abnormal Fault tolerant, self-scrubbing data structures Redundant computation: catch transient errors Slide 14

Summary of PL&S at Cal • Good coverage in core language and compiler work – People move with opportunities – Traditional boundaries becoming blurred • Strength in analysis – Semantics with practical applications • Strength in collaborative work – Systems: Culler, Kubiatowicz, Patterson – Scientific computing: inside and outside department • Areas that are not well represented – Core Software Engineering – Logic Slide 15

Faculty • • Alex Aiken Richard Fateman Susan Graham Mike Harrison Tom Henzinger Paul Hilfinger George Necula Kathy Yelick Slide 16

Long Term • Language research can be loooong term – e. g. , garbage collection Partial Evaluation Mobile Ambients Monads Continuations Pi Calculus Regions Software Fault Isolation Type Inference Set-Based Analysis Proof Carrying Code Slide 17

Executive Summary • Anything related to programming – How do we know it does what we think it does? • A mix of – theory – systems – human factors Slide 18

Language Design: History • 70 s & 80 s: – Design better general purpose languages » pure functional, object-oriented, logic… » Lisp (Fateman), Ada (Hilfinger) • 90 s & 2 Ks: – Domain-specific languages » Tioga (Stonebraker, Hellerstein, Aiken) » Titanium (Graham, Yelick, Hilfinger, Aiken) – Understanding semantics: type soundness, etc. » Titanium pointers types are sound (Split-C’s are not) • Good language design is hard • Almost everyone does it Slide 19

Language Technology without Languages • Increasing connections to other areas of CS – transfer of PL ideas to non-language tools – avoids language adoption problems – foundational ideas are portable • High-performance thread systems – based on CPS conversion • Low overhead virtual machines – uses software fault isolation • More to come. . . Slide 20

Interests and Collaborations Compilers Software Engineering Semantics Systems Programming Language Design Logic Slide 21
- Slides: 21