Lesson Objectives Aims 1 To understand the differences

Lesson Objectives Aims 1. To understand the differences between the three main types of translator: compiler, interpreter and assembler. Key Words Compiler, interpreter, assembler

Compilation Source Code (High Level) Compiler Executable (low level) Code runs on target system

• Compile once – run many • Works only on target system • Self contained code (except system calls) • Protects intellectual property (unless decompiled/reverse engineered/open source)

• To port code: – Must be re-compiled for another target – Likely need to re-write (libraries, system calls etc) – Code can be cross platform but: • Executable will still only run on one system • Redundant code

Interpreter Source Code Interpreter targeted at specific system Code runs on target system

Interpreters • No executable produced - it translates each line of code into machine code, line by line. • This means it can be platform independent code (the interpreter has to handle all the differences in system architecture) • Machine code produced while the interpreter is running is not saved – code is re-generated every time it is run • Used in some common languages including BASIC, Lisp, Prolog, Python, Java. Script. • Code cannot be executed without an interpreter installed on the target system.

• Advantages – Source code is write once – run anywhere (on any computer with an interpreter. ) – You can easily inspect the contents of variables as the program is running. – You can test code more efficiently as well as being able to test single lines of code. – Code can be run on many different types of computers and OSs, such as Macs, PC, UNIX and LINUX. • Disadvantages – You must have an interpreter running on the computer in order to be able to run the program. – An interpreter must be created for each individual computer architecture you want code to run on – As source code must be compiled each time, interpreted programs can sometimes run slowly. – You have to rely on the interpreter for machine-level optimisations rather than programming them yourself. – Developers can view the source code, which means that they could use your intellectual property to develop their own software.

Virtual Machines • There is a spanner in the works • There are languages that are both compiled and interpreted • Java. • Why? Why do they do it? !

• It’s another take on the interpreted idea. • The compiler turns code into “byte code” which is an intermediary between source code and fully compiled code • Funnily enough this code is called “intermediate code” • The interpreter is then called a “virtual machine”

VM Method Source Code Compiler VM • Code written in a given language • Turns source into intermediate code • This is not code that any real machine can run • Understands the “generic” intermediate code instruction set • Provides sandboxed environment • Interprets intermediate code to generate machine code

Why? • Compile on ANY platform • Run on ANY platform (with the VM) • Faster than pure interpreted languages

Assembler Assembly Language Source Code Assembler (Op Codes) Translator (binary) Executable (code runs on target)

• An assembler is one step up from coding purely in machine code (0’s and 1’s (think MIT’s ALTAIR (one for the geeks))) • Assembly language uses mnemonics to represent cpu instruction codes • Each translates to a hexadecimal token and then a binary op code

• Assembler matches directly to op codes • Whereas a high level language will turn single instructions in to many different op codes (and depending on compiler, different methods to solve the same problem) • Turning assembly code in to an executable is often called “assembling” rather than compiling

• Nearly all machine code instructions are in two parts: • Opcode • Data • The op code is the instruction, the data is, er, the data!

• Each binary opcode is usually unique to a processor • X 86, X 64, PPC, ARM etc may share instructions but not op codes • This explains a LOT about why code for one system won’t work on another!

mnemonics • Assemblers make life a lot easier by providing: – Mnemonics • ADD • SUB • MOV • JMP – Labels – Automatic calculation of memory addresses for jumps and loops

Compiling • Even though assembly is 1 -1 instruction mapping it still takes work to compile a program – First all labels are allocated memory addresses and a symbol table created – Then the code is compiled, linked to the symbol table • And this used to be done by hand…

Review/Success Criteria You should know: ü The difference between assembly, interpreters and compilers ü How software is turned from source code to executable