Lexical Analysis 4 Why separate lexical and syntax










![Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext); Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext);](https://slidetodoc.com/presentation_image_h/8f453fe17f5fe866f1209d473f902c8a/image-11.jpg)

























![Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext, Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext,](https://slidetodoc.com/presentation_image_h/8f453fe17f5fe866f1209d473f902c8a/image-37.jpg)
- Slides: 37

Lexical Analysis 4 Why separate lexical and syntax analyses? – simpler design – efficiency – portability by Neng-Fa Zhou

Tokens, Patterns, Lexemes – Tokens • Terminal symbols in the grammar – Patterns • Description of a class of tokens – Lexemes • Words in the source program by Neng-Fa Zhou

Languages – Fixed and finite alphabet (vocabulary) – Finite length sentences – Possibly infinite number of sentences 4 Examples – Natural numbers {1, 2, 3, . . . 10, 11, . . . } – Strings over {a, b} anban 4 Terms on parts of a string – prefix, suffix, substring, proper. . by Neng-Fa Zhou

Operations on Languages by Neng-Fa Zhou

Examples L = {A, B, . . . , Z, a, b, . . . , z} D = {0, 1, . . . , 9} L D : the set of letters and digits LD : a letter followed by a digit L 4 : four-letter strings L* : all strings of letters, including e L(L D)* : strings of letters and digits beginning with a letter D+ : strings of one or more digits by Neng-Fa Zhou

Regular Expression(RE) 4 e is a RE 4 a symbol in S is a RE 4 Let r and s be REs. – (r) | (s) : or – (r)(s) : concatenation – (r)* : zero or more instances – (r)+ : one or more instances – (r)? : zero or one instance by Neng-Fa Zhou

Precedence of Operators all left associative r* r+ r? high rs r|s 4 Examples S = {a, b} 1. a|b 2. (a|b) 3. a* 4. (a|b)* 5. a| a*b low by Neng-Fa Zhou

Algebraic Properties of RE by Neng-Fa Zhou

Regular Definitions d 1 r 1 d 2 r 2. . dn di is a RE over S {d 1, d 2, . . . , di-1} rn not recursive by Neng-Fa Zhou

Example-1 %{ int num_lines = 0, num_chars = 0; %} %% n ++num_lines; ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); } yywrap(){return 0; } by Neng-Fa Zhou
![Example2 D 0 9 INT DD INT INTeE INT printfvalid sn yytext Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext);](https://slidetodoc.com/presentation_image_h/8f453fe17f5fe866f1209d473f902c8a/image-11.jpg)
Example-2 D [0 -9] INT {D}{D}* %% {INT}(". "{INT}((e|E)("+"|-)? {INT})? )? {printf("valid %sn", yytext); }. {printf("unrecognized %sn", yytext); } %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex(); } yywrap(){return 0; } by Neng-Fa Zhou

java. util. regex import java. util. regex. *; class Number { public static void main(String[] args){ String reg. Ex. Num = "\d+(\. \d+((e|E)(\+|-)? \d+)? )? "; if (Pattern. matches(reg. Ex. Num, args[0])) System. out. println("valid"); else System. out. println("invalid"); } } by Neng-Fa Zhou

String Pattern Matching in Perl print "Input a string : "; $_ = <STDIN>; chomp($_); if (/^[0 -9]+(. [0 -9]+((e|E)(+|-)? [0 -9]+)? )? $/){ print "validn"; } else { print "invalidn"; } by Neng-Fa Zhou

Finite Automata 4 Nondeterministic finite automaton (NFA) NFA = (S, T, s 0, F) – S: a set of states – T: a transition mapping – s 0: the start state – F: final states or accepting states by Neng-Fa Zhou

Example by Neng-Fa Zhou

Deterministic Finite Automata (DFA) T: a transition function There is only one arc going out from each node on each symbol. by Neng-Fa Zhou

Simulating a DFA s = s 0; c = nextchar; while (c != eof) { s = move(s, c); c = nextchar; } if (s is in F) return "yes"; else return "no"; by Neng-Fa Zhou

From RE to NFA –e – a in S – s|t by Neng-Fa Zhou

From RE to NFA (cont. ) – st – s* by Neng-Fa Zhou

Example (a|b)*a by Neng-Fa Zhou

Building Lexical Analyzer RE NFA Algorithm 3. 23 (Thompson's construction) Algorithm 3. 32 (Subset construction) DFA Emulator by Neng-Fa Zhou

Conversion of an NFA into a DFA 4 Intuition – move(s, a) is a function in a DFA – move(s, a) is a mapping in a NFA DFA A state reachable from s 0 in the DFA on an input string corresponds by Neng-Fa Zhou to a set of states in NFA that are reachable on the same string.

Computation of e-Closure(T): Set of NFA states reachable from some NFA state s in T by e-transition alone. by Neng-Fa Zhou

From an NFA to a DFA (The subset construction) by Neng-Fa Zhou

Example NFA DFA by Neng-Fa Zhou

Algorithm 3. 39 P = {F, S-F}; do begin P 0=P; for each group G in P do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in P by the set of all subgroups formed; end if (P == P 0) return; ; end; by Neng-Fa Zhou

Example a AC B D E b B AC B D B E B AC by Neng-Fa Zhou

Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou

Implementation Issues 4 Input buffering – Read in characters one by one • Unable to look ahead • Inefficient – Read in a whole string and store it in memory • Requires a big buffer – Buffer pairs by Neng-Fa Zhou

Buffer Pairs by Neng-Fa Zhou

Use Sentinels by Neng-Fa Zhou

Lexical Analyzer by Neng-Fa Zhou

Lex 4 A tool for automatically generating lexical analyzers by Neng-Fa Zhou

Lex Specifications declarations %% translation rules %% auxiliary procedures p 1 p 2. . . pn by Neng-Fa Zhou {action 1} {action 2} {actionn}

Lex Regular Expressions by Neng-Fa Zhou

yylex(){ switch (pattern_match()){ case 1: {action 1} case 2: {action 2}. . . case n: {actionn} } } by Neng-Fa Zhou
![Example DIGIT 0 9 ID azaz 0 9 DIGIT printfAn integer sdn yytext Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext,](https://slidetodoc.com/presentation_image_h/8f453fe17f5fe866f1209d473f902c8a/image-37.jpg)
Example DIGIT [0 -9] ID [a-z][a-z 0 -9]* %% {DIGIT}+ {printf("An integer: %s(%d)n", yytext, atoi(yytext)); } {DIGIT}+". "{DIGIT}* {printf("A float: %s (%g)n", yytext, atof(yytext)); } if|then|begin|end|procedure|function {printf("A keyword: %sn", yytext); } {ID} {printf("An identifier %sn", yytext); } "+"|"-"|"*"|"/" {printf("An operator %sn", yytext); } "{"[^}n]*"}" {/* eat up one-line comments */} [ tn]+ {/* eat up white space */}. {printf("Unrecognized character: %sn", yytext); } %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex(); } by Neng-Fa Zhou