Compiler TH 6 7 8 DTH 102 cwhsuehcsie

  • Slides: 28
Download presentation
Compiler TH 6 7 8, DTH 102 薛智文 cwhsueh@csie. ntu. edu. tw http: //www.

Compiler TH 6 7 8, DTH 102 薛智文 cwhsueh@csie. ntu. edu. tw http: //www. csie. ntu. edu. tw/~cwhsueh/ 96 Spring /27 國立台灣大學 資訊 程學系

Why Study Compilers? 1. Excellent software-engineering example --theory meets practice. 2. Essential software tool.

Why Study Compilers? 1. Excellent software-engineering example --theory meets practice. 2. Essential software tool. 3. Influences hardware design, e. g. , RISC, VLIW. 4. Tools (mostly “optimization”) for enhancing software reliability and security. 9/5/2021 2 /27 資 系網媒所 NEWS實驗室

Compilers & Architecture Modern architectures have very complex structures, especially opportunities for parallel execution.

Compilers & Architecture Modern architectures have very complex structures, especially opportunities for parallel execution. Sequential programs can only make effective use of these features via an optimizing compiler. Hardware question: If we implemented this, could a compiler use it? 9/5/2021 3 /27 資 系網媒所 NEWS實驗室

Software Reliability Optimization technology (data-flow analysis) used in: Lock/unlock errors. Buffers not range-checked. Memory

Software Reliability Optimization technology (data-flow analysis) used in: Lock/unlock errors. Buffers not range-checked. Memory Leaks. SQL injection bugs. . 9/5/2021 4 /27 資 系網媒所 NEWS實驗室

What this Course Offers? Compiler methodology for both compiler implementation and related applications. Theoretical

What this Course Offers? Compiler methodology for both compiler implementation and related applications. Theoretical framework. Key algorithms. Hands-on experience. Nongoal: build a complete optimizing compiler. 9/5/2021 5 /27 資 系網媒所 NEWS實驗室

Course Outline Part 1 --- Introduction. Part 2 --- Scanner. Part 3 --- Parser.

Course Outline Part 1 --- Introduction. Part 2 --- Scanner. Part 3 --- Parser. Part 4 --- Syntax-Directed Translation. Part 5 --- Symbol Table. Part 6 --- Intermediate Code Generation. Part 7 --- Run Time Storage Organization. Part 8 --- Optimization. Part 9 --- How to Write a Compiler. Part 10 --- A Simple Code Generation (PSEUDO) Example. 9/5/2021 6 /27 資 系網媒所 NEWS實驗室

Introduction Compiler is one of language processors. source program Compiler input 9/5/2021 target program

Introduction Compiler is one of language processors. source program Compiler input 9/5/2021 target program output 7 /27 資 系網媒所 NEWS實驗室

What is a Compiler? Definitions: The software system translates description of computations into a

What is a Compiler? Definitions: The software system translates description of computations into a program executable by a computer. Source and target must be equivalent! Compiler writing spans: programming languages; machine architecture; language theory; algorithms and data structures; input software engineering. History: source program Compiler target program output 1950: the first FORTRAN compiler took 18 man-years; now: using software tools, can be done in a few months as a student’s project. 9/5/2021 8 /27 資 系網媒所 NEWS實驗室

An Interpreter source program input 9/5/2021 Interpreter output 9 /27 資 系網媒所 NEWS實驗室

An Interpreter source program input 9/5/2021 Interpreter output 9 /27 資 系網媒所 NEWS實驗室

A Hybrid Compiler source program Translator intermediate program input 9/5/2021 Virtual Machine output 10

A Hybrid Compiler source program Translator intermediate program input 9/5/2021 Virtual Machine output 10 /27 資 系網媒所 NEWS實驗室

A Language-Processing System source program Preprocessor modified source program Compiler target assembly program Assembler

A Language-Processing System source program Preprocessor modified source program Compiler target assembly program Assembler relocatable machine code Linker/Loader target machine code 9/5/2021 library files relocatable object files 11 /27 資 系網媒所 NEWS實驗室

Applications Computer language compilers. Translator: from one format to another. query interpreter text formatter

Applications Computer language compilers. Translator: from one format to another. query interpreter text formatter silicon compiler infix notation postfix notation: pretty printers 3+5– 6*6 ··· 35+66*– Software productivity tools. 9/5/2021 12 /27 資 系網媒所 NEWS實驗室

Relations with Computational Theory Computational theory: a set of grammar rules ≡ the definition

Relations with Computational Theory Computational theory: a set of grammar rules ≡ the definition of a particular machine. also equivalent to a set of languages recognized by this machine. a type of machines: a family of machines with a given set of operations, or capabilities; power of a type of machines ≡ the set of languages that can be recognized by this type of machines. 9/5/2021 13 /27 資 系網媒所 NEWS實驗室

A Language-Processing System source program Preprocessor modified source program Compiler target assembly program Assembler

A Language-Processing System source program Preprocessor modified source program Compiler target assembly program Assembler relocatable machine code Linker/Loader target machine code 9/5/2021 library files relocatable object files 14 /27 資 系網媒所 NEWS實驗室

Phases of a Compiler character stream Lexical Analyzer (scanner) token stream Syntax Analyzer (parser)

Phases of a Compiler character stream Lexical Analyzer (scanner) token stream Syntax Analyzer (parser) abstract-syntax tree Symbol Table Semantic Analyzer annotated abstract-syntax tree Error Handling Intermediate Code Generator intermediate representation Machine-Independent Code Optimizer optimized intermediate representation Code Generator relocatable machine code Machine-Dependent Code Optimizer 9/5/2021 target-machine code 15 /27 資 系網媒所 NEWS實驗室

Lexical Analyzer (Scanner) Actions: Reads characters from the source program; Groups characters into lexemes

Lexical Analyzer (Scanner) Actions: Reads characters from the source program; Groups characters into lexemes , i. e. , sequences of characters that “go together”, following a given pattern ; Each lexeme corresponds to a token. the scanner returns the next token, plus maybe some additional information, to the parser; The scanner may also discover lexical errors, i. e. , erroneous characters. The definitions of what a lexeme, token or bad character is depend on the definition of the source language. 9/5/2021 16 /27 資 系網媒所 NEWS實驗室

Scanner Example for C Symbol Table Lexeme: C statement position = initial + rate

Scanner Example for C Symbol Table Lexeme: C statement position = initial + rate * 60; 1 position … 2 initial … 3 rate … (Lexeme) position = initial + rate * 60 ; < id, 1> <=> <id, 2> <+> <id, 3> <*> <60> <; > (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL Arbitrary number of blanks (white spaces) between lexemes. Erroneous sequence of characters, that are not parts of comments, for the C language: control characters @ 2 abc 9/5/2021 17 /27 資 系網媒所 NEWS實驗室

Syntax Analyzer (Parser) Actions: Group tokens into grammatical phrases , to discover the underlying

Syntax Analyzer (Parser) Actions: Group tokens into grammatical phrases , to discover the underlying structure of the source Find syntax errors , e. g. , the following C source line: (Lexeme) index = 12 * ; (Token) ID ASSIGN INT TIMES SEMI-COL Every token is legal, but the sequence is erroneous! May find some static semantic errors , e. g. , use of undeclared variables or multiple declared variables. May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree. 9/5/2021 18 /27 資 系網媒所 NEWS實驗室

Parser Example for C Source code: position = initial + rate * 60 <

Parser Example for C Source code: position = initial + rate * 60 < id, 1> <=> <id, 2> <+> <id, 3> <*> <60> Abstract-syntax tree: = Symbol Table + < id, 1> <id, 2> * <id, 3> 60 1 position … 2 initial … 3 rate … interior nodes of the tree are OPERATORS; a node’s children are its OPERANDS; each subtree forms a logical unit. the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”. Where is ”; ”? 9/5/2021 19 /27 資 系網媒所 NEWS實驗室

Semantic Analyzer Actions: Check for more static semantic errors, e. g. , type errors.

Semantic Analyzer Actions: Check for more static semantic errors, e. g. , type errors. May annotate and/or change the abstract syntax tree. = < id, 1> <id, 2> = + < id, 1> * <id, 3> 60 <id, 2> Symbol Table 1 position … 2 initial … 3 rate … + * <id, 3> int_to_float 60 9/5/2021 20 /27 資 系網媒所 NEWS實驗室

Intermediate Code Generator Actions: translate from abstract-syntax trees to intermediate codes. One choice for

Intermediate Code Generator Actions: translate from abstract-syntax trees to intermediate codes. One choice for intermediate code is 3 -address code : Each statement contains at most 3 operands; in addition to “=”, i. e. , assignment, at most one operator. An ”easy” and “universal” format that can be translated into most assembly languages. = < id, 1> <id, 2> + * <id, 3> int_to_float 60 9/5/2021 t 1 = int_to_float(60) t 2 = id 3 * t 1 t 3 = id 2 + t 2 id 1 = t 3 21 /27 資 系網媒所 NEWS實驗室

Optimizer Improve the efficiency of intermediate code. Goal may be to make code run

Optimizer Improve the efficiency of intermediate code. Goal may be to make code run faster , and/or to use the least number of registers · · · t 1 = int_to_float(60) t 2 = id 3 * t 1 t 3 = id 2 + t 2 id 1 = t 3 t 1 = id 3 * 60. 0 id 1 = id 2 + t 1 Current trends: to obtain smaller, but maybe slower, equivalent code for embedded systems; to reduce power consumption. 9/5/2021 22 /27 資 系網媒所 NEWS實驗室

Code Generation A compiler may generate pure machine codes (machine dependent assembly language) directly,

Code Generation A compiler may generate pure machine codes (machine dependent assembly language) directly, which is rare now ; virtual machine code. Example: PASCAL compiler P-code interpreter execution Speed is roughly 4 times slower than running directly generated machine codes. Advantages: simplify the job of a compiler; decrease the size of the generated code: 1/3 for P-code ; can be run easily on a variety of platforms P-machine is an ideal general machine whose interpreter can be written easily; divide and conquer; recent example: JAVA and Byte-code. 9/5/2021 23 /27 資 系網媒所 NEWS實驗室

Code Generation Example t 1 = id 3 * 60. 0 id 1 =

Code Generation Example t 1 = id 3 * 60. 0 id 1 = id 2 + t 1 9/5/2021 LDF R 2, id 3 MULF R 2, #60. 0 LDF R 1, id 2 ADDF R 1, R 2 STF id 1, R 1 24 /27 資 系網媒所 NEWS實驗室

Practical Considerations (1/2) Preprocessing phase: macro substitution: #define MAXC 10 rational preprocessing: add new

Practical Considerations (1/2) Preprocessing phase: macro substitution: #define MAXC 10 rational preprocessing: add new features for old languages. BASIC C C ++ compiler directives: #include <stdio. h> non-standard language extensions. adding parallel primitives 9/5/2021 25 /27 資 系網媒所 NEWS實驗室

Practical Considerations (2/2) Passes of compiling First pass reads the text file once. May

Practical Considerations (2/2) Passes of compiling First pass reads the text file once. May need to read the text one more time for any forward addressed objects, i. e. , anything that is used before its declaration. Example: C language goto error_handling; ··· error_handling: ··· 9/5/2021 26 /27 資 系網媒所 NEWS實驗室

Reduce Number of Passes Each pass takes I/O time. Back-patching : leave a blank

Reduce Number of Passes Each pass takes I/O time. Back-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available. Example: C language when a label is used if it is not defined before, save a trace into the to-be-processed table label name corresponds to LABEL TABLE[i] code generated: GOTO LABEL TABLE[i] when a label is defined check known labels for redefined labels if it is not used before, save a trace into the to-be-processed table if it is used before, then find its trace and fill the current address into the trace Time and space trade-off ! 9/5/2021 27 /27 資 系網媒所 NEWS實驗室