Lexical Analysis 4 Why separate lexical and syntax

  • Slides: 37
Download presentation
Lexical Analysis 4 Why separate lexical and syntax analyses? – simpler design – efficiency

Lexical Analysis 4 Why separate lexical and syntax analyses? – simpler design – efficiency – portability by Neng-Fa Zhou

Tokens, Patterns, Lexemes – Tokens • Terminal symbols in the grammar – Patterns •

Tokens, Patterns, Lexemes – Tokens • Terminal symbols in the grammar – Patterns • Description of a class of tokens – Lexemes • Words in the source program by Neng-Fa Zhou

Languages – Fixed and finite alphabet (vocabulary) – Finite length sentences – Possibly infinite

Languages – Fixed and finite alphabet (vocabulary) – Finite length sentences – Possibly infinite number of sentences 4 Examples – Natural numbers {1, 2, 3, . . . 10, 11, . . . } – Strings over {a, b} anban 4 Terms on parts of a string – prefix, suffix, substring, proper. . by Neng-Fa Zhou

Operations on Languages by Neng-Fa Zhou

Operations on Languages by Neng-Fa Zhou

Examples L = {A, B, . . . , Z, a, b, . .

Examples L = {A, B, . . . , Z, a, b, . . . , z} D = {0, 1, . . . , 9} L D : the set of letters and digits LD : a letter followed by a digit L 4 : four-letter strings L* : all strings of letters, including e L(L D)* : strings of letters and digits beginning with a letter D+ : strings of one or more digits by Neng-Fa Zhou

Regular Expression(RE) 4 e is a RE 4 a symbol in S is a

Regular Expression(RE) 4 e is a RE 4 a symbol in S is a RE 4 Let r and s be REs. – (r) | (s) : or – (r)(s) : concatenation – (r)* : zero or more instances – (r)+ : one or more instances – (r)? : zero or one instance by Neng-Fa Zhou

Precedence of Operators all left associative r* r+ r? high rs r|s 4 Examples

Precedence of Operators all left associative r* r+ r? high rs r|s 4 Examples S = {a, b} 1. a|b 2. (a|b) 3. a* 4. (a|b)* 5. a| a*b low by Neng-Fa Zhou

Algebraic Properties of RE by Neng-Fa Zhou

Algebraic Properties of RE by Neng-Fa Zhou

Regular Definitions d 1 r 1 d 2 r 2. . dn di is

Regular Definitions d 1 r 1 d 2 r 2. . dn di is a RE over S {d 1, d 2, . . . , di-1} rn not recursive by Neng-Fa Zhou

Example-1 %{ int num_lines = 0, num_chars = 0; %} %% n ++num_lines; ++num_chars;

Example-1 %{ int num_lines = 0, num_chars = 0; %} %% n ++num_lines; ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); } yywrap(){return 0; } by Neng-Fa Zhou

Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext);

Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext); }. {printf("unrecognized %sn", yytext); } %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex(); } yywrap(){return 0; } by Neng-Fa Zhou

java. util. regex import java. util. regex. *; class Number { public static void

java. util. regex import java. util. regex. *; class Number { public static void main(String[] args){ String reg. Ex. Num = "\d+(\. \d+((e|E)(\+|-)? \d+)? )? "; if (Pattern. matches(reg. Ex. Num, args[0])) System. out. println("valid"); else System. out. println("invalid"); } } by Neng-Fa Zhou

String Pattern Matching in Perl print "Input a string : "; $_ = <STDIN>;

String Pattern Matching in Perl print "Input a string : "; $_ = <STDIN>; chomp($_); if (/^[0 -9]+(. [0 -9]+((e|E)(+|-)? [0 -9]+)? )? $/){ print "validn"; } else { print "invalidn"; } by Neng-Fa Zhou

Finite Automata 4 Nondeterministic finite automaton (NFA) NFA = (S, T, s 0, F)

Finite Automata 4 Nondeterministic finite automaton (NFA) NFA = (S, T, s 0, F) – S: a set of states – T: a transition mapping – s 0: the start state – F: final states or accepting states by Neng-Fa Zhou

Example by Neng-Fa Zhou

Example by Neng-Fa Zhou

Deterministic Finite Automata (DFA) T: a transition function There is only one arc going

Deterministic Finite Automata (DFA) T: a transition function There is only one arc going out from each node on each symbol. by Neng-Fa Zhou

Simulating a DFA s = s 0; c = nextchar; while (c != eof)

Simulating a DFA s = s 0; c = nextchar; while (c != eof) { s = move(s, c); c = nextchar; } if (s is in F) return "yes"; else return "no"; by Neng-Fa Zhou

From RE to NFA –e – a in S – s|t by Neng-Fa Zhou

From RE to NFA –e – a in S – s|t by Neng-Fa Zhou

From RE to NFA (cont. ) – st – s* by Neng-Fa Zhou

From RE to NFA (cont. ) – st – s* by Neng-Fa Zhou

Example (a|b)*a by Neng-Fa Zhou

Example (a|b)*a by Neng-Fa Zhou

Building Lexical Analyzer RE NFA Algorithm 3. 23 (Thompson's construction) Algorithm 3. 32 (Subset

Building Lexical Analyzer RE NFA Algorithm 3. 23 (Thompson's construction) Algorithm 3. 32 (Subset construction) DFA Emulator by Neng-Fa Zhou

Conversion of an NFA into a DFA 4 Intuition – move(s, a) is a

Conversion of an NFA into a DFA 4 Intuition – move(s, a) is a function in a DFA – move(s, a) is a mapping in a NFA DFA A state reachable from s 0 in the DFA on an input string corresponds by Neng-Fa Zhou to a set of states in NFA that are reachable on the same string.

Computation of e-Closure(T): Set of NFA states reachable from some NFA state s in

Computation of e-Closure(T): Set of NFA states reachable from some NFA state s in T by e-transition alone. by Neng-Fa Zhou

From an NFA to a DFA (The subset construction) by Neng-Fa Zhou

From an NFA to a DFA (The subset construction) by Neng-Fa Zhou

Example NFA DFA by Neng-Fa Zhou

Example NFA DFA by Neng-Fa Zhou

Algorithm 3. 39 P = {F, S-F}; do begin P 0=P; for each group

Algorithm 3. 39 P = {F, S-F}; do begin P 0=P; for each group G in P do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in P by the set of all subgroups formed; end if (P == P 0) return; ; end; by Neng-Fa Zhou

Example a AC B D E b B AC B D B E B

Example a AC B D E b B AC B D B E B AC by Neng-Fa Zhou

Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou

Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou

Implementation Issues 4 Input buffering – Read in characters one by one • Unable

Implementation Issues 4 Input buffering – Read in characters one by one • Unable to look ahead • Inefficient – Read in a whole string and store it in memory • Requires a big buffer – Buffer pairs by Neng-Fa Zhou

Buffer Pairs by Neng-Fa Zhou

Buffer Pairs by Neng-Fa Zhou

Use Sentinels by Neng-Fa Zhou

Use Sentinels by Neng-Fa Zhou

Lexical Analyzer by Neng-Fa Zhou

Lexical Analyzer by Neng-Fa Zhou

Lex 4 A tool for automatically generating lexical analyzers by Neng-Fa Zhou

Lex 4 A tool for automatically generating lexical analyzers by Neng-Fa Zhou

Lex Specifications declarations %% translation rules %% auxiliary procedures p 1 p 2. .

Lex Specifications declarations %% translation rules %% auxiliary procedures p 1 p 2. . . pn by Neng-Fa Zhou {action 1} {action 2} {actionn}

Lex Regular Expressions by Neng-Fa Zhou

Lex Regular Expressions by Neng-Fa Zhou

yylex(){ switch (pattern_match()){ case 1: {action 1} case 2: {action 2}. . . case

yylex(){ switch (pattern_match()){ case 1: {action 1} case 2: {action 2}. . . case n: {actionn} } } by Neng-Fa Zhou

Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext,

Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext, atoi(yytext)); } {DIGIT}+". "{DIGIT}* {printf("A float: %s (%g)n", yytext, atof(yytext)); } if|then|begin|end|procedure|function {printf("A keyword: %sn", yytext); } {ID} {printf("An identifier %sn", yytext); } "+"|"-"|"*"|"/" {printf("An operator %sn", yytext); } "{"[^}n]*"}" {/* eat up one-line comments */} [ tn]+ {/* eat up white space */}. {printf("Unrecognized character: %sn", yytext); } %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex(); } by Neng-Fa Zhou