Formal languages and Compiler Design Simona Motogna S


















- Slides: 18
Formal languages and Compiler Design Simona Motogna S. Motogna - LFTC
Why? Formal Languages Compiler Design FLCD S. Motogna - LFTC
Organization Issues • Course – 2 h/ week • Seminar – 1 h/week • Laboratory - 1 h/week 5 presences – seminar 6 presences - lab PRESENCE IS MANDATORY S. Motogna - LFTC
Organization Issues • Final grade = 70% written exam + 20% lab + 10% seminar Lab: - all laboratory assignments are mandatory - delays NO more than 2 weeks Seminar: - solved problems, answers (blackboard), homeworks S. Motogna - LFTC
References • See fișa disciplinei S. Motogna - LFTC
What is a compiler? Interpreter? Source code / program Compiler Object code / program Assembler? S. Motogna - LFTC
A little bit of history … Pascal 1968 - 1970 Lisp 1962 N. Wirth Mc. Carthy Fortran 1954 -1957 Backus S. Motogna - LFTC C 1969 - 1973 D. Ritchie Java 1995 J. Gosling
Structure of a compiler Source code/ program analysis Scanning (lexical analysis) Error handling Parsing (syntactical analysis) tokens Syntax tree Semantic analysis Adnotated syntax tree Symbol Table management Intermediary code generation Intermediary code optimization Optimized intermediary code S. Motogna - LFTC synthesis Object code generation Object code / program
Chapter 1. Scanning Definition = treats the source program as a sequence of characters, detect lexical tokens, classify and codify them INPUT: source program OUTPUT: PIF + ST Algorithm Scanning v 1 While (not(eof)) do detect(token); classify(token); codify(token); End_while S. Motogna - LFTC
Detect I am a student. - Separators => Remark 1) + 2) if (x==y) {x=y+2} - Look-ahead => Remark 3) S. Motogna - LFTC
Classify • Classes of tokens: • • • Identifiers Constants Reserved words (keywords) Separators Operators • If a token can NOT be classified => LEXICAL ERROR S. Motogna - LFTC
Codify • Codification table • Identifier, constant => Symbol Table (ST) • PIF = Program Internal Form = array of pairs • Token – replaced by pair (code, position in ST) identifier, constant S. Motogna - LFTC
Algorithm Scanning v 2 While (not(eof)) do detect(token); if token is reserved word OR operator OR separator then gen. FIP(code, 0) else if token is identifier OR constant then index = pos(token, ST); gen. FIP(code, index) else message “Lexical error” endif endwhile S. Motogna - LFTC
Remarks: • gen. FIP = adds a pair (code, position) to PIF • Pos(token, ST) – searches token in symbol table ST; if found then return position; if not found insert in SR and return position • Order of classification (reserved word, then identifier) • If-then-else imbricate => detect error if a token cannot be classified S. Motogna - LFTC
Symbol Table Definition = contains all information collected during compiling regarding the symbolic names from the source program identifiers, constants, etc. Variants: - Unique symbol table – contains all symbolic names - distinct symbol tables: IT (identifiers table) + CT (constants table) S. Motogna - LFTC
ST organization Remark: search and insert 1. 2. 3. 4. Unsorted table – in order of detection in source code Sorted table: alphabetic (numeric) Binary search tree (balanced) Hash table S. Motogna - LFTC O(n) O(lg n) O(1)
Hash table • K = set of keys (symbolic names) • A = set of positions (|A| = m; m –prime number) h: K→A h(k) = (val(k) mod m) + 1 • Conflicts: k 1 ≠ k 2 , h(k 1) = h(k 2) S. Motogna - LFTC
Visibility domain (scope) • Each scope – separate ST • Structure -> inclusion tree S. Motogna - LFTC