Formal languages and Compiler Design Simona Motogna S

  • Slides: 18
Download presentation
Formal languages and Compiler Design Simona Motogna S. Motogna - LFTC

Formal languages and Compiler Design Simona Motogna S. Motogna - LFTC

Why? Formal Languages Compiler Design FLCD S. Motogna - LFTC

Why? Formal Languages Compiler Design FLCD S. Motogna - LFTC

Organization Issues • Course – 2 h/ week • Seminar – 1 h/week •

Organization Issues • Course – 2 h/ week • Seminar – 1 h/week • Laboratory - 1 h/week 5 presences – seminar 6 presences - lab PRESENCE IS MANDATORY S. Motogna - LFTC

Organization Issues • Final grade = 70% written exam + 20% lab + 10%

Organization Issues • Final grade = 70% written exam + 20% lab + 10% seminar Lab: - all laboratory assignments are mandatory - delays NO more than 2 weeks Seminar: - solved problems, answers (blackboard), homeworks S. Motogna - LFTC

References • See fișa disciplinei S. Motogna - LFTC

References • See fișa disciplinei S. Motogna - LFTC

What is a compiler? Interpreter? Source code / program Compiler Object code / program

What is a compiler? Interpreter? Source code / program Compiler Object code / program Assembler? S. Motogna - LFTC

A little bit of history … Pascal 1968 - 1970 Lisp 1962 N. Wirth

A little bit of history … Pascal 1968 - 1970 Lisp 1962 N. Wirth Mc. Carthy Fortran 1954 -1957 Backus S. Motogna - LFTC C 1969 - 1973 D. Ritchie Java 1995 J. Gosling

Structure of a compiler Source code/ program analysis Scanning (lexical analysis) Error handling Parsing

Structure of a compiler Source code/ program analysis Scanning (lexical analysis) Error handling Parsing (syntactical analysis) tokens Syntax tree Semantic analysis Adnotated syntax tree Symbol Table management Intermediary code generation Intermediary code optimization Optimized intermediary code S. Motogna - LFTC synthesis Object code generation Object code / program

Chapter 1. Scanning Definition = treats the source program as a sequence of characters,

Chapter 1. Scanning Definition = treats the source program as a sequence of characters, detect lexical tokens, classify and codify them INPUT: source program OUTPUT: PIF + ST Algorithm Scanning v 1 While (not(eof)) do detect(token); classify(token); codify(token); End_while S. Motogna - LFTC

Detect I am a student. - Separators => Remark 1) + 2) if (x==y)

Detect I am a student. - Separators => Remark 1) + 2) if (x==y) {x=y+2} - Look-ahead => Remark 3) S. Motogna - LFTC

Classify • Classes of tokens: • • • Identifiers Constants Reserved words (keywords) Separators

Classify • Classes of tokens: • • • Identifiers Constants Reserved words (keywords) Separators Operators • If a token can NOT be classified => LEXICAL ERROR S. Motogna - LFTC

Codify • Codification table • Identifier, constant => Symbol Table (ST) • PIF =

Codify • Codification table • Identifier, constant => Symbol Table (ST) • PIF = Program Internal Form = array of pairs • Token – replaced by pair (code, position in ST) identifier, constant S. Motogna - LFTC

Algorithm Scanning v 2 While (not(eof)) do detect(token); if token is reserved word OR

Algorithm Scanning v 2 While (not(eof)) do detect(token); if token is reserved word OR operator OR separator then gen. FIP(code, 0) else if token is identifier OR constant then index = pos(token, ST); gen. FIP(code, index) else message “Lexical error” endif endwhile S. Motogna - LFTC

Remarks: • gen. FIP = adds a pair (code, position) to PIF • Pos(token,

Remarks: • gen. FIP = adds a pair (code, position) to PIF • Pos(token, ST) – searches token in symbol table ST; if found then return position; if not found insert in SR and return position • Order of classification (reserved word, then identifier) • If-then-else imbricate => detect error if a token cannot be classified S. Motogna - LFTC

Symbol Table Definition = contains all information collected during compiling regarding the symbolic names

Symbol Table Definition = contains all information collected during compiling regarding the symbolic names from the source program identifiers, constants, etc. Variants: - Unique symbol table – contains all symbolic names - distinct symbol tables: IT (identifiers table) + CT (constants table) S. Motogna - LFTC

ST organization Remark: search and insert 1. 2. 3. 4. Unsorted table – in

ST organization Remark: search and insert 1. 2. 3. 4. Unsorted table – in order of detection in source code Sorted table: alphabetic (numeric) Binary search tree (balanced) Hash table S. Motogna - LFTC O(n) O(lg n) O(1)

Hash table • K = set of keys (symbolic names) • A = set

Hash table • K = set of keys (symbolic names) • A = set of positions (|A| = m; m –prime number) h: K→A h(k) = (val(k) mod m) + 1 • Conflicts: k 1 ≠ k 2 , h(k 1) = h(k 2) S. Motogna - LFTC

Visibility domain (scope) • Each scope – separate ST • Structure -> inclusion tree

Visibility domain (scope) • Each scope – separate ST • Structure -> inclusion tree S. Motogna - LFTC