Languages Compilers Interpreters JingShin Chang Department of Computer

  • Slides: 39
Download presentation
Languages, Compilers, Interpreters Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan

Languages, Compilers, Interpreters Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University Last Update: 2018/03/28 1

What is A Compiler? - Functional blocks - Forms of compilers 2

What is A Compiler? - Functional blocks - Forms of compilers 2

The Compiler n What is a compiler? u A program for translating programming languages

The Compiler n What is a compiler? u A program for translating programming languages into machine languages F source n language => target language Why compilers? u Filling the gaps between a programmer and the computer hardware 3

Compiler: A Bridge Between PL and Hardware Applications (High Level Language) A : =

Compiler: A Bridge Between PL and Hardware Applications (High Level Language) A : = B + C * D Compiler Operating System Hardware (Low Level Language) Register-based or Stack-based machines MOV MUL ADD MOV A, C A, D A, B va, A Assembly Codes 4

Typical Machine Instructions – Register-based Machines A n Data Transfer u u u n

Typical Machine Instructions – Register-based Machines A n Data Transfer u u u n Arithmetic Operation u u u n u D E H L Registers of the Intel 8085 processor AND A, 00001111 B // A : = A & 00001111 B More: OR, NOT, XOR, Shift, Rotate Program Control u n ADD A, C // A : = A + C MUL A, D // A : = A * D More: ADC, SUB, SBB, INC … C Logical Operation u n MOV A, B MOV A, [mem] More: IN/OUT, Push, Pop, . . . B JMP, JZ, JNZ, Call, … Low Level Instructions Features: u Mostly Simple Binary Operators (using source & target operands) 5

Compiler (1) - Compilation MOV MUL ADD MOV A : = B + C

Compiler (1) - Compilation MOV MUL ADD MOV A : = B + C * D A, C A, D A, B va, A A=C A=A*D A=A+B va = A Source Compiler Program/Code (P. L. , Formal Spec. ) Target Program/Code (P. L. , Assembly, Machine Code) Error Message 6

Compiler (2 a) – Execution Running the compiled codes Input Target Code Output (in

Compiler (2 a) – Execution Running the compiled codes Input Target Code Output (in Real Machine) Target code (compiled) Loader (load into Real Machine) 10

Compiler (2 b) – Compile & Execution Two working phases in two passes Source

Compiler (2 b) – Compile & Execution Two working phases in two passes Source Program Input Compiler Error Message Target Code Output (in Real Machine) Compiler: Two independent phases to complete the work - (1) Compilation Phase: Source to Target compilation - (2) Execution Phase: run compiled codes & respond to input & produce output 11

Compiler (2 c) – Compile & Execution Two working phases in two passes Source

Compiler (2 c) – Compile & Execution Two working phases in two passes Source program (& executable Target code) Input Compiler (+Loader) Output (target loaded into Real Machine) Compiler: Two independent phases to complete the work - (1) Compilation Phase: Source to Target compilation - (2) Execution Phase: run compiled codes & respond to input & produce output 12

Interpreter (1) Source program Input Interpreter Output Error Message Interpreter: One single pass to

Interpreter (1) Source program Input Interpreter Output Error Message Interpreter: One single pass to complete the two-phases work - Each source statement is Compiled and Executed subsequently - The next statement is then handled in the same way 13

Interpreter (2) n Compile and then execute for each incoming statements u Do not

Interpreter (2) n Compile and then execute for each incoming statements u Do not save compiled codes in executable files F Save storage u Re-compile the same statements if loop back F Slower u Detect (compilation & runtime) errors as one occurs during the execution time F Compiler: Detect syntax/semantic errors (“compilation errors”) during compilation time 14

Hybrid: Compiler + Interpreter? Source program Compiler Error Message Intermediate program (simple low level

Hybrid: Compiler + Interpreter? Source program Compiler Error Message Intermediate program (simple low level instructions) Input Interpreter+ (with/without JIT) Output 15

Hybrid: Compiler + Interpreter? Source program Compiler Intermediate program Input Intermediate program: - without

Hybrid: Compiler + Interpreter? Source program Compiler Intermediate program Input Intermediate program: - without syntax/semantic errors - machine independent Interpreter+: - do not interpret high level source - but compiled low level code - easy to interpret + efficient Interpreter+ (with/without JIT) Output 16

Hybrid Method & Virtual Machine Source program Translator (Compiler) Intermediate program (machine independent) Virtual

Hybrid Method & Virtual Machine Source program Translator (Compiler) Intermediate program (machine independent) Virtual Machine Output (VM) Input (Interpreter with/without JIT) 17

Example: Java Compiler & Java VM Java program (app. java) Java Compiler (Javac) Java

Example: Java Compiler & Java VM Java program (app. java) Java Compiler (Javac) Java Bytecodes (app. class) Java Virtual Machine Input (Interpreter with/without JIT) Output 18

Hybrid Method & Virtual Machine n Compile source program into a platform independent code

Hybrid Method & Virtual Machine n Compile source program into a platform independent code u E. g. , Java => Bytecodes (stack-based instructions) n Execute the code with a virtual machine u High portability: The platform independent code can be distributed on the web, downloaded and executed in any platform that had VM preinstalled F Good for cross-platform applications 19

Just-in-time (JIT) Compilation n Compile a new statement (only once) as it comes for

Just-in-time (JIT) Compilation n Compile a new statement (only once) as it comes for the first time u And save the compiled codes u Executed by virtual/real machine u Do not re-compile as it loop back n Example: u Java VM (simple Interpreter version, without JIT): high penalty in performance due to interpretation u Java VM + JIT: improved by the order of a factor of 10 F JIT: translate bytecodes during run time to the native target machine instruction set 20

Comparison of Different Compilation-and-Go Schemes n Normal Compilers (Compile then Execute, two passes) u

Comparison of Different Compilation-and-Go Schemes n Normal Compilers (Compile then Execute, two passes) u u Will generate codes for all statements whether they will be executed or not Separate the compilation phase and execution phase into two different phases F n Syntax & semantic errors are detected at compilation time Interpreters and JIT Compilers (Comp. & Exec. at one pass) u Can generate codes only for statements that are really executed F F u Will depend on your input – different execution flows mean different sets of executed codes Interpreter: Syntax & semantic errors are detected at run/execution time JIT vs. Simple Interpreter F JIT: save the target machine codes (at the first time they are compiled) • Can be re-used, and compiled at most once F Interpreter: do not save target machine codes • Compiled more than once 21

Register-Based Virtual Machine for Android Phone – Dalvik VM Java Program Java Compiler Java

Register-Based Virtual Machine for Android Phone – Dalvik VM Java Program Java Compiler Java Bytecodes (stack based) Java Virtual Machine n Java VM (JVM) – Stack-based Instruction Set u Normally less efficient than RISC or CISC instructions F Limited memory organization F Requires too many swap and copy operations 22

Register-Based Virtual Machine for Android Phone – Dalvik VM Java Program n Dalvik VM

Register-Based Virtual Machine for Android Phone – Dalvik VM Java Program n Dalvik VM (for Android OS) – Register-based Instruction Set F Java Compiler Java Bytecodes (stack-based) dx (+compression) Dalvik Bytecodes (register-based) Dalvik Virtual Machine F F n Smaller size Better memory efficiency Good for phone and other embedded systems Generation and Execution of Dalvik byte codes u u u Compiled/Translated from Java byte code into a new byte code app. java (Java source) =|| javac (Java Compiler)||=> app. class (executable by JVM) =|| dx (in Android SDK tool) ||=> app. dex (Dalvik Executable) =|| compression ||=> apps. apk (Android Application Package) =|| Dalvik VM ||=> (execution) 23

How To Construct A Compiler - Language Processing Systems - High-Level and Intermediate Languages

How To Construct A Compiler - Language Processing Systems - High-Level and Intermediate Languages - Processing Phases - Quick Review on Syntax & Semantics - Processing Phases in Detail - Structure of Compilers 24

A language-Processing System Source Program Preprocessor Modified Source Program Compiler Target Assembly Program Assembler

A language-Processing System Source Program Preprocessor Modified Source Program Compiler Target Assembly Program Assembler Relocatable Machine Code Linker/Loader Library files and/or Relocatable object files Target Machine Code 25

Programming Languages vs. Natural Languages n Natural languages: for communication between native speakers of

Programming Languages vs. Natural Languages n Natural languages: for communication between native speakers of the same or different languages u Chinese, English, French, Japanese n Programming languages: for communication between programmers and computers u Generic High-Level Programming Languages: F Basic, Fortran, COBOL, Pascal, C/C++, Java u Typesetting Languages: F TROFF (+TBL, EQN, PIC), La/Tex, Post. Script u Markup Language -- Structured Documents: F SGML, HTML, XML, . . . u Script Languages: F Csh, bsh, awk, perl, python, javascript, asp, jsp, php 26

Machine Independent Intermediate Instructions n Low Level Instructions Features: u Mostly Simple Binary Operators

Machine Independent Intermediate Instructions n Low Level Instructions Features: u Mostly Simple Binary Operators u Result is often save to the Accumulator (A register) u Not intuitive to programmers n Intermediate instructions: u Complex instructions into groups of binary operations u 3 address codes: (for register-based machines) A : = B + C F 1 binary operation, 2 source operands, one destination operand F Easy to map to machine instructions (share one source & destination operand) F • A : = A + B u Stack machine codes: (for stack-based machines) 27

Compiler: A Bridge Between Language? ? PL and Hardware Natural Maybe in next AI

Compiler: A Bridge Between Language? ? PL and Hardware Natural Maybe in next AI Era Applications (High Level Language) Compiler Operating System Hardware (Low Level Language) Register-based or Stack-based machines A : = B + C * D T 1 : = C * D T 2 : = B + T 1 A : = T 2 Intermediate Codes MOV MUL ADD MOV A, C A, D A, B va, A Assembly Codes 28

Compiler: with Intermediate Codes T 1 : = C * D T 2 :

Compiler: with Intermediate Codes T 1 : = C * D T 2 : = B + T 1 A : = T 2 A : = B + C * D MOV MUL ADD MOV A, C A, D A, B va, A Source Compiler Program/Code (P. L. , Formal Spec. ) Target Program/Code (P. L. , Assembly, Machine Code) Error Message 29

float position, initial, rate position : = initial + rate * 60 intermediate code

float position, initial, rate position : = initial + rate * 60 intermediate code generator id 1 : = id 2 + id 3 * 60 Tokens syntax analyzer : = Parse Tree or Syntax Tree id 1 + id 2 * id 3 60 semantic analyzer : = Syntax Tree or Annotated Syntax Tree id 1 + id 2 * id 3 inttoreal 60 Typical Phases of a Compiler lexical analyzer temp 1 : = inttoreal (60) temp 2 : = id 3 * temp 1 temp 3 : = id 2 + temp 2 Id 1 : = temp 3 3 -address codes, or Stack machine codes code optimizer temp 1 : = id 3 * 60. 0 id 1 : = id 2 + temp 1 Optimized codes code generator MOVF MULF MOVF ADDF MOVF id 3, R 2 #60. 0, R 2 id 2, R 1 R 2, R 1, R 1 id 1 Assembly (or Machine) Codes 30

Analysis-Synthesis Model of a Compiler n Analysis : Program => Constituents => I. R.

Analysis-Synthesis Model of a Compiler n Analysis : Program => Constituents => I. R. u Lexical Analysis: linear => token u Syntax Analysis: hierarchical, nested => tree F Identify relations/actions among tokens: e. g. , add(b, mult(c, d)) u Semantic Analysis: check legal constraints / meanings F n By examining attributes associated with tokens & relations Synthesis: I. R. => I. R. * => Target Language u Intermediate Code Generation F generate intermediate representation (I. R. ) from syntax u Code Optimization: generate better equivalent IR (IR*) F machine independent + machine dependent u Code Generation 31

float position, initial, rate position : = initial + rate * 60 intermediate code

float position, initial, rate position : = initial + rate * 60 intermediate code generator id 1 : = id 2 + id 3 * 60 Tokens syntax analyzer : = Parse Tree or Syntax Tree id 1 + id 2 * id 3 60 semantic analyzer : = Syntax Tree or Annotated Syntax Tree id 1 + id 2 * id 3 inttoreal 60 Typical Phases of a Compiler lexical analyzer temp 1 : = inttoreal (60) temp 2 : = id 3 * temp 1 temp 3 : = id 2 + temp 2 Id 1 : = temp 3 3 -address codes, or Stack machine codes code optimizer temp 1 : = id 3 * 60. 0 id 1 : = id 2 + temp 1 Optimized codes code generator MOVF MULF MOVF ADDF MOVF id 3, R 2 #60. 0, R 2 id 2, R 1 R 2, R 1, R 1 id 1 Assembly (or Machine) Codes

Programming Languages -Issues about Modern PL’s - Module programming & Parameter passing - Nested

Programming Languages -Issues about Modern PL’s - Module programming & Parameter passing - Nested modules & Scopes - Static dynamic allocation 33

Programming Language Basics n Static vs. Dynamic Issues or Policies u Static: determined at

Programming Language Basics n Static vs. Dynamic Issues or Policies u Static: determined at compile time u Dynamic: determined at run time n Scopes of declaration F Region in which the use of x refer to a declaration of x u Static Scope (aka lexical scope): Possible to determine the scope of declaration by looking at the program F C, Java (and most PL) F • Delimited by block structures u Dynamic scope: F At run time, the same use of x could refer to any of several declarations of x. 34

Programming Language Basics n Variable declaration u Static variables F Possible to determine the

Programming Language Basics n Variable declaration u Static variables F Possible to determine the location in memory where the declared variable can be found • • • Public static int x; // C++ Only one copy of x, can be determined at compile time Global declarations and declared constants can also be made static u Dynamic variables: F Local variables without the “static” keyword • Each object of the class would have its own location where x would be held. • At run time, the same use of x in different objects could refer to any of several different locations. 35

Programming Language Basics n Parameter Passing Mechanisms u called by value F make a

Programming Language Basics n Parameter Passing Mechanisms u called by value F make a copy of physical value u called by reference F make a copy of the address of a physical object u call by name (Algol 60) F callee executed as if the actual parameter were substituted literally for the formal parameter in the code of the callee • macro expansion of formal parameter into actual parameter 36

Cousins of the Compiler u Preprocessors: macro definition/expansion u Interpreters: alternative for compilers F

Cousins of the Compiler u Preprocessors: macro definition/expansion u Interpreters: alternative for compilers F Compiler vs. interpreter vs. just-in-time compilation u Assemblers: 1 -pass / 2 -pass u Linkers: link source with library functions u Loaders: load executables into memory u Editors: editing sources (with/without syntax prediction) u Debuggers: symbolically providing stepwise trace u Profilers: gprof (call graph and time analysis) u Project managers: IDE F Integrated Development Environment u Deassemblers, Decompilers: low-level to high-level language conversion 37

Applications of Compilation Techniques 38

Applications of Compilation Techniques 38

Applications of Lexical Analysis n Text/Pattern Processing: u grep: get lines with specified regular

Applications of Lexical Analysis n Text/Pattern Processing: u grep: get lines with specified regular expression pattern • Ex: grep ‘^From ’ /var/spool/mail/andy u sed: stream editor, editing specified patterns • Ex: ls *. JPG | sed ‘s/JPG/jpg/’ u tr: simple translation between patterns (e. g. , uppercases to lowercases) • Ex: tr ‘a-z’ ‘A-Z’ < mytext > mytext. uc u AWK: pattern-action rule processing F pattern processing based on regular expression • Ex: awk '$1==“John"{count++}END{print count} ' < Students. txt 39

Applications of Compilation Techniques – Language Processing Pre-processor: Macro definition/expansion n Active Webpages Processing

Applications of Compilation Techniques – Language Processing Pre-processor: Macro definition/expansion n Active Webpages Processing n u Script or programming languages embedded in webpages for interactive transactions u Examples: Java. Script, JSP, ASP, PHP u Compiler Apps: expansion of embedded statements, in addition to web page parsing n Database Query Language: SQL 40

Applications of Compilation Techniques – Language Processing n Interpreter u no pre-compilation u executed

Applications of Compilation Techniques – Language Processing n Interpreter u no pre-compilation u executed on-the-fly u e. g. , BASIC n Script Languages: C-shell, Perl u Function: for batch processing multiple files/databases u mostly interpreted, some pre-compiled u Some interpreted and save compiled codes 41

Applications of Compilation Techniques – Language Processing n Text Formatter u Troff, La. Tex,

Applications of Compilation Techniques – Language Processing n Text Formatter u Troff, La. Tex, Eqn, Pic, Tbl 42