Introduction to Lex YingHung Jiang compilercsie ntu edu

  • Slides: 20
Download presentation
Introduction to Lex Ying-Hung Jiang compiler@csie. ntu. edu. tw

Introduction to Lex Ying-Hung Jiang compiler@csie. ntu. edu. tw

Outlines l l l Review of scanner Introduction to flex Regular Expression Cooperate with

Outlines l l l Review of scanner Introduction to flex Regular Expression Cooperate with Yacc Resources

Review l l Parser -> scanner -> source code See teacher’s slide

Review l l Parser -> scanner -> source code See teacher’s slide

flex

flex

flex - fast lexical analyzer generator l l l Flex is a tool for

flex - fast lexical analyzer generator l l l Flex is a tool for generating scanners. Flex source is a table of regular expressions and corresponding program fragments. Generates lex. yy. c which defines a routine yylex()

Format of the Input File l The flex input file consists of three sections,

Format of the Input File l The flex input file consists of three sections, separated by a line with just %% in it: definitions %% rules %% user code

Definitions Section l l l The definitions section contains declarations of simple name definitions

Definitions Section l l l The definitions section contains declarations of simple name definitions to simplify the scanner specification. Name definitions have the form: name definition Example: DIGIT [0 -9] ID [a-z][a-z 0 -9]*

Rules Section l l The rules section of the flex input contains a series

Rules Section l l The rules section of the flex input contains a series of rules of the form: pattern action Example: {ID} printf( "An identifier: %sn", yytext ); l l The yytext and yylength variable. If action is empty, the matched token is discarded.

Action l l If the action contains a ‘{‘, the action spans till the

Action l l If the action contains a ‘{‘, the action spans till the balancing ‘}‘ is found, as in C. An action consisting only of a vertical bar ('|') means "same as the action for the next rule. “ The return statement, as in C. In case no rule matches: simply copy the input to the standard output (A default rule).

Precedence Problem l l l For example: a “<“ can be matched by “<“

Precedence Problem l l l For example: a “<“ can be matched by “<“ and “<=“. The one matching most text has higher precedence. If two or more have the same length, the rule listed first in the flex input has higher precedence.

User Code Section l l l The user code section is simply copied to

User Code Section l l l The user code section is simply copied to lex. yy. c verbatim. The presence of this section is optional; if it is missing, the second %% in the input file may be skipped. In the definitions and rules sections, any indented text or text enclosed in %{ and %} is copied verbatim to the output (with the %{}'s removed).

A Simple Example %{ int num_lines = 0, num_chars = 0; %} %% n.

A Simple Example %{ int num_lines = 0, num_chars = 0; %} %% n. ++num_lines; ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %dn", num_lines, num_chars ); }

Any Question So Far?

Any Question So Far?

Regular Expression

Regular Expression

Regular Expression (1/3) x. [xyz] match the character 'x' any character (byte) except newline

Regular Expression (1/3) x. [xyz] match the character 'x' any character (byte) except newline a "character class"; in this case, the pattern matches either an 'x', a 'y', or a 'z' [abj-o. Z] a "character class" with a range in it; matches an 'a', a 'b', any letter from 'j' through 'o', or a 'Z' [^A-Z] a "negated character class", i. e. , any character but those in the class. In this case, any character EXCEPT an uppercase letter. [^A-Zn] any character EXCEPT an uppercase letter or a newline

Regular Expression (2/3) r* r+ r? r{2, 5} r{2, } r{4} {name} zero or

Regular Expression (2/3) r* r+ r? r{2, 5} r{2, } r{4} {name} zero or more r's, where r is any regular expression one or more r's zero or one r's (that is, "an optional r") anywhere from two to five r's two or more r's exactly 4 r's the expansion of the "name" definition (see above) "[xyz]"foo“ the literal string: [xyz]"foo X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C interpretation of x. Otherwise, a literal 'X' (used to escape operators such as '*')

Regular Expression (3/3) � 123 x 2 a (r) rs r|s ^r r$ a

Regular Expression (3/3) 123 x 2 a (r) rs r|s ^r r$ a NUL character (ASCII code 0) the character with octal value 123 the character with hexadecimal value 2 a match an r; parentheses are used to override precedence (see below) the regular expression r followed by the regular expression s; called "concatenation" either an r or an s an r, but only at the beginning of a line (i. e. , which just starting to scan, or right after a newline has been scanned). an r, but only at the end of a line (i. e. , just before a newline). Equivalent to "r/n".

Cooperate with Yacc l l The y. tab. h file (generated by –d option).

Cooperate with Yacc l l The y. tab. h file (generated by –d option). See pascal. l and pascal. y for example.

Resources l l l Google directory of lexer and parser generators. Flex homepage: http:

Resources l l l Google directory of lexer and parser generators. Flex homepage: http: //www. gnu. org/software/flex Lex/yacc Win 32 port: http: //www. monmouth. com/~wstreett/lexyacc/lex-yacc. html Above links are available on course webpage. Flex(1)

Any Question?

Any Question?