Concordia University Department of Computer Science COMP 4426421

  • Slides: 34
Download presentation
Concordia University Department of Computer Science COMP 442/6421 Compiler Design Joey Paquet, 2000, 2002,

Concordia University Department of Computer Science COMP 442/6421 Compiler Design Joey Paquet, 2000, 2002, 2007, 2008 1

Course Description • Instructor – Name: Dr. Joey Paquet – Office: EV-3 -221 –

Course Description • Instructor – Name: Dr. Joey Paquet – Office: EV-3 -221 – Phone: 7831 – e-mail: paquet@cse. concordia. ca – Web: www. cse. concordia. ca/~paquet Joey Paquet, 2000, 2002, 2007, 2008 2

Course Description • Topic – Compiler organization and implementation. – Lexical, syntax and semantic

Course Description • Topic – Compiler organization and implementation. – Lexical, syntax and semantic analysis. Code generation. • Outline – Design and implementation of a simple compiler. – Lectures related to the project. Joey Paquet, 2000, 2002, 2007, 2008 3

Course Description • Grading – Assignments (4) – Final Examination – Final Project :

Course Description • Grading – Assignments (4) – Final Examination – Final Project : 40% : 30% • Late assignment penalty: 50% per working day • Assignments and project are graded on: Correctness, Completeness, Design, Style, Documentation. Joey Paquet, 2000, 2002, 2007, 2008 4

Project Description • Design and coding of a simple compiler – Individual work –

Project Description • Design and coding of a simple compiler – Individual work – Divided in four assignments – Final project is graded at the end of the semester, during a final demonstration – Testing is VERY important and up to you Joey Paquet, 2000, 2002, 2007, 2008 5

Project Description • A complete compiler is a fairly complex and large program: from

Project Description • A complete compiler is a fairly complex and large program: from 10, 000 to 1, 000 lines of code. • Programming one will force you to go over your limits. • It uses most of the elements of theoretical foundations of Computer Science. • It will probably be the most complex program you have ever written. Joey Paquet, 2000, 2002, 2007, 2008 6

Introduction to Compilation • A compiler is a translation system. • It translates programs

Introduction to Compilation • A compiler is a translation system. • It translates programs written in a high level language into a lower level language, generally machine (binary) language. source code compiler target code Source language Translator Target language Joey Paquet, 2000, 2002, 2007, 2008 7

Introduction to Compilation • The only language that the processor understands is binary. 000100111111

Introduction to Compilation • The only language that the processor understands is binary. 000100111111 a b c d a: Register addition (from a symbol table) b: First operand (R 1) c: Second operand (R 3) d: Third operand (R 15) Joey Paquet, 2000, 2002, 2007, 2008 8

Introduction to Compilation • Assembly language is the first higher level programming language. •

Introduction to Compilation • Assembly language is the first higher level programming language. • 000100111111 <=> Add R 1, R 3, R 15 • There is a one-to-one correspondence between lines of code and the machine code lines. • A op-code table is sufficient to translate assembly language into machine code. Joey Paquet, 2000, 2002, 2007, 2008 9

Introduction to Compilation • Compared to binary, it greatly improved the productivity of programmers.

Introduction to Compilation • Compared to binary, it greatly improved the productivity of programmers. Why? • Though a great improvement, it is not ideal: – Not easy to write – Even less easy to read and understand – Extremely architecture-dependent Joey Paquet, 2000, 2002, 2007, 2008 10

Introduction to Compilation • A compiler translates a given high-level language into assembler or

Introduction to Compilation • A compiler translates a given high-level language into assembler or machine code. X=Y+Z; L 3, Y A 3, Z ST 3, X Load working register with Y Add Z to working register Store the result in X 00001001001011 00010010010101 001001001 Joey Paquet, 2000, 2002, 2007, 2008 11

FORTRAN: The first compiler • The problems with assembly led to the development of

FORTRAN: The first compiler • The problems with assembly led to the development of the first compiler: FORTRAN. • Stands for FORmula TRANslation. • Developed between 1954 and 1957 at IBM by a team led by John Backus. • This was an incredible feat, as theory of compilation was not available at the time. Joey Paquet, 2000, 2002, 2007, 2008 12

Paving down the road • • In parallel to that, Noam Chomsky was investigating

Paving down the road • • In parallel to that, Noam Chomsky was investigating on the structure of natural languages. His studies led the way to the classification of languages according to their complexity (aka the Chomsky hierarchy). This was used by various theoreticians in the 1960 s and early 1970 s to design a fairly complete set of solutions to the parsing problem. These solutions have been used ever since. As the parsing solutions became well understood, efforts were devoted to the development of parser generators. The most commonly known is YACC (Yet Another Compiler). Developed by Steve Johnson in 1975 for the Unix system. Joey Paquet, 2000, 2002, 2007, 2008 13

Compilation vs. Interpretation • A compiler translates high-level instructions into machine code. An interpreter

Compilation vs. Interpretation • A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement. – Advantage: immediate response – Drawbacks: inefficient with loops, restricted to single-file programs. Joey Paquet, 2000, 2002, 2007, 2008 14

Compiler’s Environment • Building an executable from multiple files source code compiler object code

Compiler’s Environment • Building an executable from multiple files source code compiler object code executable code linker run-time libraries Joey Paquet, 2000, 2002, 2007, 2008 compiled modules 15

Phases of a Compiler source code lexical analysis token stream syntax tree syntactic analysis

Phases of a Compiler source code lexical analysis token stream syntax tree syntactic analysis semantic analysis annotated tree high-level optimization front-end back-end target code generation low-level optimization intermediate code target code Joey Paquet, 2000, 2002, 2007, 2008 optimized target code 16

Lexical analysis • Transforms the initial stream of characters into a stream of tokens

Lexical analysis • Transforms the initial stream of characters into a stream of tokens – keywords – identifiers – literals – operators – punctuation : : : while, to, do, int, main i, max, total, i 1, i 2 123, 12. 34, “Hello” +, *, and, >, < {, }, [, ], ; Joey Paquet, 2000, 2002, 2007, 2008 17

Syntactic analysis • Attempts to build a valid parse tree from the grammatical description

Syntactic analysis • Attempts to build a valid parse tree from the grammatical description of the language. S Distance = rate * time; id = E ; E * E id Joey Paquet, 2000, 2002, 2007, 2008 id 18

Semantic Analysis • The semantics of a program is its meaning. • It is

Semantic Analysis • The semantics of a program is its meaning. • It is possible to have syntactically valid program that does not have any meaning. • Semantic analysis has two parts: – Semantic checking: Validating the semantics of a syntactically valid program and gathering information about the meaning of its constitents (attributes). – Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation. Joey Paquet, 2000, 2002, 2007, 2008 19

Semantic Translation: example • Breaks the statements into small pieces corresponding roughly to machine

Semantic Translation: example • Breaks the statements into small pieces corresponding roughly to machine instructions. x = a*y+z; t 1 = a*y; t 2 = t 1+z; x = t 2; Joey Paquet, 2000, 2002, 2007, 2008 20

High-Level Optimization • The generated intermediate representation is often inefficient because of bad structure

High-Level Optimization • The generated intermediate representation is often inefficient because of bad structure or redundancy. t 1 = a*y; t 2 = t 1+z; x = t 2; t 1 = a*y; x = t 1+z; • This kind of optimization is not bound to the target machine’s architecture. Joey Paquet, 2000, 2002, 2007, 2008 21

Target Code Generation • Translates the optimized intermediate representation into the target code (normally

Target Code Generation • Translates the optimized intermediate representation into the target code (normally machine language or assembler). t 1 = a*y; x = t 1+z; LE 4, a ME 4, y AE 4, z STE 4, x a in register 4 multiply by y add z store register 4 in x Joey Paquet, 2000, 2002, 2007, 2008 22

Passes, Front End and Back End • A pass consists in reading a high-level

Passes, Front End and Back End • A pass consists in reading a high-level version of the program and writing a new lower-level version. • Several passes are often needed: – To resolve forward references – To limit the memory used by the different phases. Joey Paquet, 2000, 2002, 2007, 2008 23

Low-Level Optimization • The generated target code is analyzed for inefficiencies such as dead

Low-Level Optimization • The generated target code is analyzed for inefficiencies such as dead code or code redundancy. • Care is taken to exploit as much as possible the CPU’s capabilities. • This phase is heavily architecture dependent. • Lots of research is still done in this very complex area. Joey Paquet, 2000, 2002, 2007, 2008 24

Passes, Front End and Back End • The front-end is composed of: Lexical, Syntactic,

Passes, Front End and Back End • The front-end is composed of: Lexical, Syntactic, Semantic analysis and High-level optimization. • In most compilers, most of the front-end is driven by the Syntactic analyzer. • It calls the Lexical analyzer for tokens and generates an abstract syntax tree when syntactic elements are recognized. • The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process. • It has little or no concern with the target machine. Joey Paquet, 2000, 2002, 2007, 2008 25

Passes, Front End and Back End • The back-end is composed of: Code generation

Passes, Front End and Back End • The back-end is composed of: Code generation and low-level optimization. • Uses the intermediate representation generated by the front-end to generate target machine code. • Heavily dependent on the target machine. • Independent on the programming language compiled. Joey Paquet, 2000, 2002, 2007, 2008 26

System Support • Symbol table – Central repository of identifiers (variable or function names)

System Support • Symbol table – Central repository of identifiers (variable or function names) used in the compiled program. – Contains information such as the data type or value in the case of constants. – Used to identify undeclared or multiply declared identifiers, as well as type mismatches. – Provides temporary variables for intermediate code generation. Joey Paquet, 2000, 2002, 2007, 2008 27

System Support • Error handling procedures – Implement the compiler’s response to errors in

System Support • Error handling procedures – Implement the compiler’s response to errors in the code it is compiling. – Provides useful insight to the user about where is the error and what it is. – Should find all errors in the whole program. – Can attempt to correct some errors and only give a warning. Joey Paquet, 2000, 2002, 2007, 2008 28

System Support • Run-time system – Some programming languages concepts raise the need for

System Support • Run-time system – Some programming languages concepts raise the need for dynamic memory allocation. What are they? – The running program must then be able to manage its own memory use. – Some will require a stack, others a heap. These are managed by the run-time system. Joey Paquet, 2000, 2002, 2007, 2008 29

Writing of Early Compilers • The first C compiler minimal C compiler source assembler

Writing of Early Compilers • The first C compiler minimal C compiler source assembler executable C compiler (minimal) full C compiler source C compiler (minimal) executable C compiler (full) Joey Paquet, 2000, 2002, 2007, 2008 30

Writing Cross-Compilers • A Unix-Mac. Intosh C cross compiler Mac C compiler source code

Writing Cross-Compilers • A Unix-Mac. Intosh C cross compiler Mac C compiler source code in Unix C compiler Mac C complier usable on Unix Mac C compiler source code in Unix C Mac C complier usable on Unix Mac C complier usable on Mac Joey Paquet, 2000, 2002, 2007, 2008 31

Writing Retargetable Compilers • Two methods: – Make a strict distinction between front-end and

Writing Retargetable Compilers • Two methods: – Make a strict distinction between front-end and back-end, then use different back-ends. – Generate code for a virtual machine, then build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project. Joey Paquet, 2000, 2002, 2007, 2008 32

Summary • The first compiler was the assembler, a one-to-one direct translator. • Complex

Summary • The first compiler was the assembler, a one-to-one direct translator. • Complex compilers were written incrementally, first using assemblers. • All compilation techniques are well known since the 60’s and early 70’s. Joey Paquet, 2000, 2002, 2007, 2008 33

Summary • The compilation process is divided into phases. • The input of a

Summary • The compilation process is divided into phases. • The input of a phase is the output of the previous phase. • It can be seen as a pipeline, where the phases are filters that successively transform the input program into an executable. Joey Paquet, 2000, 2002, 2007, 2008 34