Chapter 2 Chang ChiChung 2007 3 15 Lexical

  • Slides: 21
Download presentation
Chapter 2 Chang Chi-Chung 2007. 3. 15

Chapter 2 Chang Chi-Chung 2007. 3. 15

Lexical Analyzer n The tasks of the lexical analyzer: q q Remove white space

Lexical Analyzer n The tasks of the lexical analyzer: q q Remove white space and comments Encode constants as tokens Recognize Keywords and Identifiers Store identifier names in a symbol table.

Lexical Analyzer if (peek == ‘n’) line = line +1 token Lexical analyzer Lexer()

Lexical Analyzer if (peek == ‘n’) line = line +1 token Lexical analyzer Lexer() token Attribute <if> <(> <id, “peek”> <eq> <const, ‘n’> <)> <id, “line”> <assign> <id, “line”> <+> <num, 1> <; > Parser or Syntax-Directed Translator Parser()

Remove white space and comments n For white spaces and comments q q Eliminated

Remove white space and comments n For white spaces and comments q q Eliminated by the lexical analyzer. Modifying the grammar to incorporate it into the syntax. ( not easy ) for ( ; { if ( else } ; peek = next character ) peek is a blank or a tab ) do nothing; if (peek is a newline) line = line + 1; break;

Encode constants as tokens n For a sequence of digits, the lexical analyzer must

Encode constants as tokens n For a sequence of digits, the lexical analyzer must pass to the parser a token. q n The token consists of the terminal along with an integervalued attribute computed from the digits. Example q q 31 + 28 + 29 <num, 31> <+> <num, 28> <+> <num, 29> if ( peek holds a digit ) { v = 0; do { v = v * 10 + integer value of digit peek; peek = next input character; } while (peek holds a digit) return token <num, v>; }

Recognize Keywords and Identifiers n Keyword q q A fixed character string as punctuation

Recognize Keywords and Identifiers n Keyword q q A fixed character string as punctuation marks or to identify constructs. Example n n for、while、if Identifier q q q Use to name variables, arrays, functions, and the like. Parser treat identifiers as terminals. Example n count = count + increment; n <id, ”count”> = <id, “count”> <+> <id, “increment”> <; >

Recognize Keywords and Identifiers n The lexical analyzer uses a table to hold character

Recognize Keywords and Identifiers n The lexical analyzer uses a table to hold character strings. q q q A string table can be implemented by a hash table. Single Representation Reserved Words. if ( peek holds a letter ) { collect letters or digits into a buffer b; s = string formed from the characters in b; w = token returned by words. get(s); if (w is not null) return w; else { Enter the key-value pair (s, <id, s>) into words return token <id, s>; } }

Create a Lexical Analyzer Token scan() { skip white space. handle numbers. (A) (B)

Create a Lexical Analyzer Token scan() { skip white space. handle numbers. (A) (B) handle reserved words and identifiers. Token t = new Token(peek); peek = blank; return t; } (D) (C)

Complete Lexical Analyzer (1) class Token +int tag class Num +int value class Word

Complete Lexical Analyzer (1) class Token +int tag class Num +int value class Word +string lexeme package lexer; public class Token { public final int tag; public Token(int t) { tag = t; } } public class Tag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } public class Num extends Token { public final int value; public Num(int v) { super(Tag. NUM); value = v; } } public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); } }

Complete Lexical Analyzer (2) package lexer; import java. io. *; import java. util. *;

Complete Lexical Analyzer (2) package lexer; import java. io. *; import java. util. *; public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable(); void reserve(Word t) { words. put(t. lexeme, t); } public Lexer() { reserve( new Word(Tag. TRUE, "true") ); reserve( new Word(Tag. FALSE, "false") ); }

Complete Lexical Analyzer (3) public Token for ( ; if ( else } scan()

Complete Lexical Analyzer (3) public Token for ( ; if ( else } scan() throws IOException { ; peek = (char) System. in. read() ) { peek == ' ' || peek == 't' ) continue; if ( peek == 'n' ) line = line + 1; break; if ( Character. is. Digit(peek) ) { int v = 0; do { v = v * 10 + Character. digit(peek, 10); peek = (char) System. in. read(); } while ( Character. is. Digit(peek) ) return new Num(v); } C } } D

Complete Lexical Analyzer (4) public Token scan() throws IOException { A B if (

Complete Lexical Analyzer (4) public Token scan() throws IOException { A B if ( Character. is. Letter(peek) ) { String. Buffer b = new String. Buffer(); do { b. append(peek); peek = (char) System. in. read() } while ( Character. is. Letter. Or. Digit(peek) ); String s = b. to. String(); Word w = (Word) words. get(s); if (w != null) return w; w = new Word(Tag. ID, s); words. put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; } }

Symbol Tables n Symbol tables are data structures q n Scope of identifier x

Symbol Tables n Symbol tables are data structures q n Scope of identifier x q n Used by compilers to hold information about source-program constructs. The scope of a particular declaration x Scope q A portion of a program that is the scope of one or more declaration.

Symbol Tables { int x 1, int y 1; { int w 2; bool

Symbol Tables { int x 1, int y 1; { int w 2; bool y 2; int z 2; w 2; x 1; y 2; z 2; } w 0; x 1; y 1; B 0 B 1 } B 3 w int y bool z int x int y int w

Symbol Tables B 0 package symbols; import java. util. *; public class Env {

Symbol Tables B 0 package symbols; import java. util. *; public class Env { private Hashtable; protected Env prev; public Env(Env p) { table = new Hashtable(); prev = p; } B 1 B 3 x int y int w int y bool z int public void put(String s, Symbol sym) { table. put(s, sym); } public Symbol get(String s) { for (Env e = this; e != null; e = e. prev) { Symbol found = (Symbol)(e. table. get(s)); if (found != null) return found; } return null; } } w

The Use of Symbol Tables program → block → ‘{‘ decls stmts ‘}’ {

The Use of Symbol Tables program → block → ‘{‘ decls stmts ‘}’ { top = null; } { saved = top; top = new Env(top); print(“{ “); } { top = saved; print(“} “); } decls → decls decl | ε decl → type id ; { s = new Symbol; s. type = type. lexeme; top. put(id. lexeme, s); } stmts → stmts stmt | ε stmt → block | factor → id factor ; { print(“; “); } { s = top. get(id. lexeme); print(“: ”); print(s. type); }

Intermediate Code Generation n Two most important intermediate representations. q Trees n n Parse

Intermediate Code Generation n Two most important intermediate representations. q Trees n n Parse trees, syntax trees (abstract trees) Example q q q while ( expr ) stmt op: while E 1 : expr E 2 : stmt Linear representations n n Three-address code Example q q q if. False x goto L if. True x goto L x[y]=z x=y[z] op E 1 E 2 x = y op z

Intermediate Code Generation if (peek == ‘n’) line = line +1 Parser or Syntax-Directed

Intermediate Code Generation if (peek == ‘n’) line = line +1 Parser or Syntax-Directed Translator Parser() or If eq peek 1: t 1 = (int) ‘n’ 2: if. False peek == t 1 goto 4 3: line = line + 1 4: assign (int) ‘n’ line + line 1

Syntax Trees < Concrete Syntax Abstract Syntax = || && assign cond rel op

Syntax Trees < Concrete Syntax Abstract Syntax = || && assign cond rel op op not minus access == != <= >= + * / % ! -unary [ ] >

Syntax Trees seq seq null while if some tree for an expression

Syntax Trees seq seq null while if some tree for an expression

Static Checking n Static checks are consistency checks that are done during compilation. q

Static Checking n Static checks are consistency checks that are done during compilation. q q n Syntactic Checking Type Checking L-values and R-values q q i=5 i=i+1