Lex COP 3401 Fall 2009 What is Lex

What is Lex? • A tool for building lexical analyzers (lexers) • lexer (scanner)

Usage Lex source program lex. yy. c input Lex C compiler a. out lex.

Skeleton of a lex specification (. l file) x. l *. c is generated

The rules section %% [RULES SECTION] <pattern> { <action to take when matched> }

Regular Expression Basics. : matches any single character except n * : matches 0

Lex Reg Exp (cont) x|y x or y {i} definition of i x/yx, only

Meta-characters – meta-characters (do not match themselves, because they are used in the preceding

Regular Expression Examples • an integer: 12345 [1 -9][0 -9]* • a word: cat

Lex Regular Expressions Lex uses an extended form of regular expression: (c: character, x,

Rules • Lex patterns only match a given input or string once • Lex

Regular Expression Examples • a delimiter for an English sentence “. ” | “?

Special Functions • • yytext – where text matched most recently is stored yyleng

Slides: 15

Download presentation

Lex COP 3401, Fall 2009

What is Lex? • A tool for building lexical analyzers (lexers) • lexer (scanner) is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens. • E. g. , consider breaking a text file up into individual words.

Usage Lex source program lex. yy. c input Lex C compiler a. out lex. yy. c a. out tokens

Lex & Yacc Together

Skeleton of a lex specification (. l file) x. l *. c is generated after running %{ < C global variables, prototypes, comments > This part will be embedded into *. c %} [DEFINITION SECTION] %% [RULES SECTION] %% < C auxiliary subroutines> substitutions, code and start states; will be copied into *. c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex().

The rules section %% [RULES SECTION] <pattern> { <action to take when matched> } … %% Patterns are specified by regular expressions. For example: %% [A-Za-z]* %% { printf(“this is a word”); }

Input: Output:

Regular Expression Basics. : matches any single character except n * : matches 0 or more instances of the preceding regular expression + : matches 1 or more instances of the preceding regular expression ? : matches 0 or 1 of the preceding regular expression | : matches the preceding or following regular expression [ ] : defines a character class () : groups enclosed regular expression into a new regular expression “…”: matches everything within the “ “ literally

Lex Reg Exp (cont) x|y x or y {i} definition of i x/yx, only if followed by y (y not removed from input) x{m, n} m to n occurrences of x xx, but only at beginning of line x$ x, but only at end of line "s" exactly what is in the quotes (except for "" and following character) A regular expression finishes with a space, tab or newline

Meta-characters – meta-characters (do not match themselves, because they are used in the preceding reg exps): • ()[]{}<>+/, ^*|. "$? -% – to match a meta-character, prefix with "" – to match a backslash, tab or newline, use \, t, or n

Regular Expression Examples • an integer: 12345 [1 -9][0 -9]* • a word: cat [a-z. A-Z]+ • a (possibly) signed integer: 12345 or -12345 [-+]? [1 -9][0 -9]* • a floating point number: 1. 2345 [0 -9]*”. ”[0 -9]+

Lex Regular Expressions Lex uses an extended form of regular expression: (c: character, x, y: regular expressions, s: string, m, n integers and i: identifier). 1. 2. 3. 4. 5. 6. 7. 8. c any character except meta-characters (see below) [. . . ] the list of enclosed chars (may be a range) [. . . ] the list of chars not enclosed. any ASCII char except newline xy concatenation of x and y x* same as x* x+ same as x+ (i. e. x* but not ) x? an optional x (same as x+ )

Rules • Lex patterns only match a given input or string once • Lex executes the action for the longest possible match for the current input

Regular Expression Examples • a delimiter for an English sentence “. ” | “? ” | ! OR [“. ””? ”!] • C++ comment: // call foo() here!! “//”. * • white space [ t]+ • English sentence: Look at this! ([ t]+|[a-z. A-Z]+)+(“. ”|”? ”|!)

Special Functions • • yytext – where text matched most recently is stored yyleng – number of characters in text most recently matched yylval – associated value of current token yymore() – append next string matched to current contents of yytext yyless(n) – remove from yytext all but the first n characters unput(c) – return character c to input stream yywrap() – may be replaced by user – The yywrap method is called by the lexical analyser whenever it inputs an EOF as the first character when trying to match a regular expression