FLEX Fast Lexical Analyzer Generator Adapted from material

  • Slides: 12
Download presentation
FLEX Fast Lexical Analyzer Generator Adapted from material in: Gnu Manual for Flex by

FLEX Fast Lexical Analyzer Generator Adapted from material in: Gnu Manual for Flex by Vern Paxson CS 780(Prasad) L 5 Flex 1

Overview of Flex • Scanner generator • Interface with Parser Ø Scanner called as

Overview of Flex • Scanner generator • Interface with Parser Ø Scanner called as a subroutine when parser needs the next token. input. flex ytab. h (header file definitions for tokens and types for token attributes) Flex (flex format input file) lex. yy. c (yylex() routine) CS 780(Prasad) L 5 Flex 2

Flex input file format • The flex input file consists of three sections, separated

Flex input file format • The flex input file consists of three sections, separated by a line with just `%%' in it: definitions %% rules %% user code • Simple Example %% username printf( "%s", getlogin() ); CS 780(Prasad) L 5 Flex 3

Definitions • C Code Ø include files Ø global variables • Regular names defined

Definitions • C Code Ø include files Ø global variables • Regular names defined • Start Conditions defined (exclusive states, inclusive states) %{ #include <stdio. h> %} CS 780(Prasad) DIGIT ID [0 -9] [a-z. A-Z][a-z. A-Z 0 -9_]* %x INCOMMENT L 5 Flex 4

Rules • This section contains a list of pairs of the form: pattern action

Rules • This section contains a list of pairs of the form: pattern action where the pattern must be unindented and the action must begin on the same line. The pattern ends at the first non-escaped whitespace character; the remainder of the line is its action. • A pattern is an extended regular expression; an action is an arbitrary C statement. Ø If the action is empty, then when the pattern is matched, the input token is simply discarded. Ø If an input character matches no pattern, then the scanner writes a copy of the token to the output. CS 780(Prasad) L 5 Flex 5

Auxiliary Routines • The user code section is simply copied to `lex. yy. c'

Auxiliary Routines • The user code section is simply copied to `lex. yy. c' verbatim. It is used as companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped too. • Start State Ø Mechanism for conditionally activating rules. q. Any rule whose pattern is prefixed with "<sc>" will only be active when the scanner is in the start condition named "sc". <STRING>[^"]* CS 780(Prasad) { /* eat up the string body. . . */ …} L 5 Flex 6

A Simple Example int num_lines = 0, num_chars = 0; %% n ++num_lines; ++num_chars;

A Simple Example int num_lines = 0, num_chars = 0; %% n ++num_lines; ++num_chars; • ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); } CS 780(Prasad) L 5 Flex 7

Generating Scanner UNIX% flex count. flex UNIX% gcc lex. yy. c -lfl UNIX% a.

Generating Scanner UNIX% flex count. flex UNIX% gcc lex. yy. c -lfl UNIX% a. out < count. flex # of lines= 12, # of characters= 250 • Using Cygwin tools on PC: W 2 K% # of CS 780(Prasad) flex count. flex gcc lex. yy. c -lfl. /a. exe < count. flex lines= 12, # of characters= 250 L 5 Flex 8

Start State Example Here is a scanner which recognizes (and discards) C comments while

Start State Example Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input line. %x comment %% int line_num = 1; "/*" <comment>[^*n]* <comment>"*"+[^*/n]* <comment>n <comment>"*"+"/" %% CS 780(Prasad) BEGIN(comment); /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ ++line_num; BEGIN(INITIAL); L 5 Flex Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 9

Output of “self-scan” %x comment %% int line_num = 1; " /*" BEGIN(comment); <comment>[^*n]*

Output of “self-scan” %x comment %% int line_num = 1; " /*" BEGIN(comment); <comment>[^*n]* /* eat anything that's not a '*' */ <comment>"*"+[^*/n]* /* eat up '*'s not followed by '/'s */ <comment>n ++line_num; <comment>"*"+"/" BEGIN(INITIAL); %% CS 780(Prasad) L 5 Flex 10

“Self-scanning” comment. flex %x comment %% int line_num = 1; "/*" BEGIN(comment); <comment>[^ *n]

“Self-scanning” comment. flex %x comment %% int line_num = 1; "/*" BEGIN(comment); <comment>[^ *n] * /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ <comment>"*"+[^*/n]* <comment>n <comment>"*"+"/" %% CS 780(Prasad) ++line_num; BEGIN(INITIAL); L 5 Flex 11

print. Rules. Str. flex %x comment %% int line_num = 1; printf(" INITIAL: Default

print. Rules. Str. flex %x comment %% int line_num = 1; printf(" INITIAL: Default "); "/*" {BEGIN(comment); printf(", R 1 : |%s|, COMMENT : ", yytext); } <comment>[^*n]* printf(", R 2 : |%s|", yytext); <comment>"*"+[^*/n]* printf(", R 3 : |%s|", yytext); <comment>n printf(", R 4 : |%s| n, COMMENT: ", yytext); <comment>"*"+"/" {BEGIN(INITIAL); printf(", R 5 : |%s|, INITIAL : Default ", yytext); }. n printf("n INITIAL : Default "); %% CS 780(Prasad) L 5 Flex 12