ROSE Compiler Framework What is ROSE How does

  • Slides: 40
Download presentation
ROSE Compiler Framework What is ROSE , How does it Work, and Why does

ROSE Compiler Framework What is ROSE , How does it Work, and Why does it add Value? May 8, 2017 V 1. 7 LLNL-PRES-730950 This work was performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC 52 -07 NA 27344. Lawrence Livermore National Security, LLC

Ever Wonder…. . § Are there bugs in our code, will it work correctly?

Ever Wonder…. . § Are there bugs in our code, will it work correctly? § Are there trap doors and extra things we don’t know about? § Is our code susceptible to compromise? § Is the code’s performance optimized, will it work on the new platforms? § What exactly is in our little black box’s firmware that controls most everything? Lawrence Livermore National Laboratory LLNL-PRES-730950 2

What is ROSE? The Quick Answer § ROSE is to software what a Find

What is ROSE? The Quick Answer § ROSE is to software what a Find and Replace function is to a Word Processor. Except ROSE is much more. § Find and Replace allows a Word Processor user to quickly Find syntax (words or phrases) and optionally Replace them with other syntax (words or phrases). Lawrence Livermore National Laboratory LLNL-PRES-730950 3

Find and Replace Value § Find and Replace adds value by automating document editing,

Find and Replace Value § Find and Replace adds value by automating document editing, eliminating mistakes of manual editing, and increasing productivity. § For example having a form letter that can be personalized to many clients instead of having to be rewritten from scratch each time or edited manually. Lawrence Livermore National Laboratory LLNL-PRES-730950 4

Sometimes Analyzing Syntax Can Help Us Has saved me from sending embarrassing emails numerous

Sometimes Analyzing Syntax Can Help Us Has saved me from sending embarrassing emails numerous times. On the other hand, pig face, jerk, and nitwit sailed through without raising a single chili pepper, as did turkey and damn Syntax: the arrangement of words and phrases to create well-formed sentences in a language. Lawrence Livermore National Laboratory LLNL-PRES-730950 5

Syntax versus Semantics § A Find and Replace function would need to understand word

Syntax versus Semantics § A Find and Replace function would need to understand word meaning and context to interpret meaning. § Native language has basic classifications of syntax: • Nouns • Pronouns • Verbs • Adjectives • Adverbs • Prepositions • Conjunctions Semantics: the meaning of the syntax in a language. Lawrence Livermore National Laboratory LLNL-PRES-730950 6

Remember Reed-Kellogg Sentence Graphing? But we still do not understand the meaning of the

Remember Reed-Kellogg Sentence Graphing? But we still do not understand the meaning of the words, just their classification. Lawrence Livermore National Laboratory LLNL-PRES-730950 7

Sentences can Be Syntactically Correct and Not Make Sense § She fed the dream

Sentences can Be Syntactically Correct and Not Make Sense § She fed the dream to my absence. § Beware of Buffalo buffalo, for they may buffalo you. § Colorless green ideas sleep furiously These sentences are all syntactically correct ! Lawrence Livermore National Laboratory LLNL-PRES-730950 8

Just Analyzing Syntax May Not Add Value Lawrence Livermore National Laboratory LLNL-PRES-730950 9

Just Analyzing Syntax May Not Add Value Lawrence Livermore National Laboratory LLNL-PRES-730950 9

Differences Between Natural Languages and Computer Languages § A syntax ambiguity in a natural

Differences Between Natural Languages and Computer Languages § A syntax ambiguity in a natural language (such as English) may cause the reader to pause to understand the intention of the author. § Computer languages have very strict syntax rules § A syntax mistake in a computer language will cause the compiler to issue an error, warning, or worse yet create code which is not correct (a bug). Equivalence Lawrence Livermore National Laboratory Assignment LLNL-PRES-730950 10

Language Translators Understandable to a French Speaker Understandable to an English Speaker Will it

Language Translators Understandable to a French Speaker Understandable to an English Speaker Will it rain today? Lawrence Livermore National Laboratory Language Translator pleuvra-t-il aujourd’hui? ” LLNL-PRES-730950 11

C++ Source Code Compiler Calculate the value of n!, where n is an positive

C++ Source Code Compiler Calculate the value of n!, where n is an positive integer, from 1 to 8. Example: 6! = 1 x 2 x 3 x 4 x 5 x 6= 720 Understandable to a Computer - Assembler Understandable to an Software Engineer - C++ Compiler Lawrence Livermore National Laboratory LLNL-PRES-730950 12

Compiler Definition § a computer program that translates a program written in a high-

Compiler Definition § a computer program that translates a program written in a high- level language into another language, usually machine language. Lawrence Livermore National Laboratory LLNL-PRES-730950 13

Compiler Flow Diagram Lawrence Livermore National Laboratory LLNL-PRES-730950 14

Compiler Flow Diagram Lawrence Livermore National Laboratory LLNL-PRES-730950 14

ROSE – What is Does § ROSE, being a compiler, has the ability to

ROSE – What is Does § ROSE, being a compiler, has the ability to understand both syntax and semantic information contained in computer languages such as C, C++, Java, PHP, Python, Open. MP, and FORTRAN. § It does this creating an Abstract Syntax Tree (AST) which is analogous to the Reed-Kellogg sentence graph. Lawrence Livermore National Laboratory LLNL-PRES-730950 15

ROSE – What It Does § So ROSE is much more powerful than a

ROSE – What It Does § So ROSE is much more powerful than a Find and Replace function and a sentence graph. § ROSE is a Find, Understand, and Rewrite function. § ROSE allows users to create tools to interrogate the syntactical and semantic structure of source code, using the ROSE simplified intermediate representation (IR) Lawrence Livermore National Laboratory LLNL-PRES-730950 16

Rose Flow Diagram User Analysis Stops the compiling process to let the user code

Rose Flow Diagram User Analysis Stops the compiling process to let the user code perform analysis Simplified IR Source Code Lawrence Livermore National Laboratory Outputs rewritten Source code LLNL-PRES-730950 17

ROSE Secret Sauces Simplified IR Source Code Output Lawrence Livermore National Laboratory Binary Analysis

ROSE Secret Sauces Simplified IR Source Code Output Lawrence Livermore National Laboratory Binary Analysis Dan Quinlan Creator of ROSE LLNL-PRES-730950 18

If Word Processors Had ROSE, Could Impose Tactfulness Input Sentence “If you are late

If Word Processors Had ROSE, Could Impose Tactfulness Input Sentence “If you are late again you are fired” Lawrence Livermore National Laboratory Harshness Reduction Algorithm ROSE Framework Output Sentence “I've noticed you've had trouble getting to work on time. What can I do to help? ” LLNL-PRES-730950 19

If Word Processors Had ROSE, Could Impose Simplicity Input Sentence “Thank you for trusting

If Word Processors Had ROSE, Could Impose Simplicity Input Sentence “Thank you for trusting me with some of your responsibilities. I'm sorry that I can't help you this time because of my workload. Is there anything I could help you with next week, when I have more time? ” Lawrence Livermore National Laboratory Frankness Algorithm ROSE Framework Output Sentence “No. ” LLNL-PRES-730950 20

If Word Processors Had ROSE, Could Impose Common Sense Input Paragraph This SOFTWARE PRODUCT

If Word Processors Had ROSE, Could Impose Common Sense Input Paragraph This SOFTWARE PRODUCT is provided by COLOSELTRON SOFTWARE "as is" and "with all faults. " COLOSELTRON SOFTWARE makes no representations or warranties of any kind concerning the safety, suitability, lack of viruses, inaccuracies, typographical errors, or other harmful components of this SOFTWARE PRODUCT. There are inherent dangers in the use of any software, and you are solely responsible for determining whether this SOFTWARE PRODUCT is compatible with your equipment and other software installed on your equipment. You are also solely responsible for the protection of your equipment and backup of your data, and COLOSELTRON SOFTWARE will not be liable for any damages you may suffer in connection with using, modifying, or distributing this SOFTWARE PRODUCT. Lawrence Livermore National Laboratory Output Sentence Legalese Interpretation Algorithm ROSE Framework There are bugs in this code. LLNL-PRES-730950 21

If Word Processors Had ROSE, Could Impose Directness Hemingway Style Algorithm Tonight I would

If Word Processors Had ROSE, Could Impose Directness Hemingway Style Algorithm Tonight I would like to toast our new Software Director who we are so fortunate to welcome to ACME SOFTWARE. At her former company, COLOSULTRON SOFTWARE, she was able to roll out agile methodologies and completely rewrite their software disclaimer to avoid litigation. Lawrence Livermore National Laboratory ROSE Framework “I drink to make other people more interesting. ” Ernest Hemingway LLNL-PRES-730950 22

ROSE Framework Operates on Computer Code Input Source Code: C, C++, Java, Fortran, Python,

ROSE Framework Operates on Computer Code Input Source Code: C, C++, Java, Fortran, Python, PHP, Open. MP or Binary Code to Operate on Syntax and Semantics (IR) Output Source Code Output is unique to ROSE Source or Binary Code ROSE Framework Transformed Source Code or Report Binary code (or Machine Language) is what compilers (and then Assemblers) turn source code into so computers can understand it. Also known as Firmware when inside little black boxes. Lawrence Livermore National Laboratory LLNL-PRES-730950 23

ROSE Built Tools Optimize Code for Platforms MPI Transform Algorithm Input Existing Physics Code

ROSE Built Tools Optimize Code for Platforms MPI Transform Algorithm Input Existing Physics Code Optimize Code For Multiple Threads in a Processor Output ROSE Framework Optimize Code Across Multiple Processors Lawrence Livermore National Laboratory Open MP Transform Algorithm MPI Optimized Physics Code Input Existing Physics Code Output ROSE Framework Open MP Optimized Physics Code LLNL-PRES-730950 24

ROSE Built Tools Optimize Code for Performance Mesh Transform Algorithm Input Existing Physics Code

ROSE Built Tools Optimize Code for Performance Mesh Transform Algorithm Input Existing Physics Code Optimize Code For Different Solver Types Output ROSE Framework Optimize Code Across Different Mesh Types Lawrence Livermore National Laboratory Solver Transform Algorithm Mesh Optimized Physics Code Input Existing Physics Code Output ROSE Framework Solver Optimized Physics Code LLNL-PRES-730950 25

ROSE Built Tools Can Translate or Visualize Code Translation Algorithm Input COBOL, Ada, Jovial

ROSE Built Tools Can Translate or Visualize Code Translation Algorithm Input COBOL, Ada, Jovial Code Visualization Output ROSE Framework Verify Translated Source Code Correctness Lawrence Livermore National Laboratory Visualization Transformation C, C++, C# Input Source or Binary Code Output ROSE Framework Code Visualization Reports LLNL-PRES-730950 26

ROSE Built Tools Find and Repair Potential Vulnerabilities Vulnerability Detection Checkers And Patches Vulnerability

ROSE Built Tools Find and Repair Potential Vulnerabilities Vulnerability Detection Checkers And Patches Vulnerability Detection Checkers Input Source or Binary Code Report ROSE Framework Detection of Code Potential Vulnerabilities Lawrence Livermore National Laboratory Code Potential Vulnerabilities Report Input Source or Binary Code Report ROSE Framework Repaired Binary or Source Code Detect and Repair Code Potential Vulnerabilities LLNL-PRES-730950 27

ROSE Built Tools Correctness and Bug Seeding Code Thorn Input LTL Specification Language Output

ROSE Built Tools Correctness and Bug Seeding Code Thorn Input LTL Specification Language Output ROSE Framework Seed Vulnerabilities to Measure Tool Effectiveness Proof of Correctness Report Vulnerability Seeding Transform C Code Specification to Code Correctness The RERS Challenge July 12, 2017 C, C++Test Code Lawrence Livermore National Laboratory Output Input ROSE Framework Seeded Test Code LLNL-PRES-730950 28

ROSE Built Tools Analyze Binary Code Analysis Detection Algorithm Back Door, Obfuscation, Dead code

ROSE Built Tools Analyze Binary Code Analysis Detection Algorithm Back Door, Obfuscation, Dead code Big 5, Changes, Input Existing Binary Code ROSE Framework Binary Code Analysis Condition Enforcement Stubs Input Source Code Output ROSE Framework Instrumented Source Code Units Output Unit Tester Lawrence Livermore National Laboratory LLNL-PRES-730950 29

ROSE and Clang/LLVM 1. We use Clang (the high level IR) as the frontend

ROSE and Clang/LLVM 1. We use Clang (the high level IR) as the frontend for Open. CL support and can use it alternatively as the frontend for C language support (if not too many GNU extensions are required). 2. We can generate LLVM (the low level IR), and we have another contract with Rice that was just started to make that more robust and address Fortran 77. 3. We can generate LLVM IR from binaries (using the instruction semantics). 4. We support the same plugin mechanism as CLANG/LLVM to add passes over the AST. 5. We support the LLVM compiler as a backend within source to source work (specifically ROSE can emulate the LLVM compiler (by version number) and handle the LLVM specific C and C++ language extensions outside of the C and C++ standard). 6. ROSE can be compiled using LLVM and it regularly tested with LLVM in our ROSE Matrix Testing. Lawrence Livermore National Laboratory LLNL-PRES-730950 30

1) ROSE Helping LLVM Produce Source Codes Input Existing Clang/LLVM Compiler Source Code Clang

1) ROSE Helping LLVM Produce Source Codes Input Existing Clang/LLVM Compiler Source Code Clang Parser Clang IR LLVM IR Binary Code ROSE IR Source Code Existing ROSE Compiler EDG Parser EDG IR Source to Source Transform Lawrence Livermore National Laboratory LLNL-PRES-730950 31

2) LLVM Helping ROSE Generate Binaries Existing Clang/LLVM Compiler Clang Parser Input Clang IR

2) LLVM Helping ROSE Generate Binaries Existing Clang/LLVM Compiler Clang Parser Input Clang IR LLVM IR Binary Code ROSE IR Source Code Existing ROSE Compiler Source Code EDG Parser EDG IR ROSE to LLVM Transform Lawrence Livermore National Laboratory LLNL-PRES-730950 32

3) ROSE Helping LLVM Analyze Binary Codes Existing Clang/LLVM Compiler Clang Parser Clang IR

3) ROSE Helping LLVM Analyze Binary Codes Existing Clang/LLVM Compiler Clang Parser Clang IR LLVM IR Binary Code Binary to LLVM IR Transform Input Existing ROSE Compiler Binary Code Lawrence Livermore National Laboratory Rose Disassembler ROSE IR LLNL-PRES-730950 33

4) LLVM and ROSE Share the Same Plug In API for IR Analysis Existing

4) LLVM and ROSE Share the Same Plug In API for IR Analysis Existing Clang/LLVM Compiler Clang Parser Clang IR LLVM IR Binary Code Analyzer or Checker Input Existing ROSE Compiler Source Code Lawrence Livermore National Laboratory EDG Parser EDG IR ROSE IR Source Code LLNL-PRES-730950 34

5) ROSE can Use LLVM as a Back End Compiler Existing Clang/LLVM Compiler Clang

5) ROSE can Use LLVM as a Back End Compiler Existing Clang/LLVM Compiler Clang Parser Input Clang IR LLVM IR Binary Code Existing ROSE Compiler Source Coder EDG Parser EDG IR ROSE IR Source Code Source Transform Lawrence Livermore National Laboratory LLNL-PRES-730950 35

6) ROSE Source Code can be Compiled by Clang/LLVM Existing Clang/LLVM Compiler Clang Parser

6) ROSE Source Code can be Compiled by Clang/LLVM Existing Clang/LLVM Compiler Clang Parser Clang IR LLVM IR Binary Code Existing ROSE Compiler ROSE Source Lawrence Livermore National Laboratory LLNL-PRES-730950 36

ROSE Built Tools Versus Commercial Tools § Commercial tools generally work well within their

ROSE Built Tools Versus Commercial Tools § Commercial tools generally work well within their markets § Commercial tool makers generally supply tools for large markets • Windows • Java, C++, C, C# (current versions) • Browser Based § Commercial tool makers take six months or more to add new capabilities (i. e. C++11, C++14, C++17) § Commercial tools depend largely on a priori data § User community aware of commercial tool capabilities/shortcomings Lawrence Livermore National Laboratory LLNL-PRES-730950 37

Where Does ROSE Fit Among Existing Tools? Commercial Tool Family ROSE Enhances Static Analyzers

Where Does ROSE Fit Among Existing Tools? Commercial Tool Family ROSE Enhances Static Analyzers ü Dynamic Analyzers ü Profilers ü Thread Checkers ü Malicious Code Detectors ü Code Coverage ü Binary Analysis ü Leak Detectors ü Proof of Correctness ü Uncovered Tool Needs ü Lawrence Livermore National Laboratory LLNL-PRES-730950 38

ROSE Built Tools § Extends to areas that Commercial tools miss • Latest C++,

ROSE Built Tools § Extends to areas that Commercial tools miss • Latest C++, C, Java standards (v 11, v 14, future) • Fortran, Ada, Binaries § ROSE built tools quickly adaptable to detect new threats § Can produce source code § Can add capability to commercial tools § ROSE built tools tailored to specific needs of users Lawrence Livermore National Laboratory LLNL-PRES-730950 39

ROSE Summary § Adds Value by • Automating Source and Binary Analysis • Transforming

ROSE Summary § Adds Value by • Automating Source and Binary Analysis • Transforming Source code to new Source Code • Addressing areas not covered by commercial tools • Enhancing existing commercial tools as an add on • Expandable to future specialized analysis and translation requirements Lawrence Livermore National Laboratory LLNL-PRES-730950 40