LEX SungDong Kim School of Computer Engineering Hansung

  • Slides: 22
Download presentation
LEX Sung-Dong Kim, School of Computer Engineering, Hansung University

LEX Sung-Dong Kim, School of Computer Engineering, Hansung University

LEX • 1975, Lesk • Input: regular expression + action code (tiny. l) •

LEX • 1975, Lesk • Input: regular expression + action code (tiny. l) • Output: C program (lex. yy. c or lexyy. c) • Procedure yylex • Table-driven implementation of a DFA RE + action Lex Scanner (C code) • Compilation: cc –o scanner lex. yy. c -lm (2017 -1) Compiler

LEX Convention (1) • Metacharacters • Quotes: actual characters • For not metacharacters: “if”,

LEX Convention (1) • Metacharacters • Quotes: actual characters • For not metacharacters: “if”, if • For metacharacters: “(” • Backslash • (* = “*” • n, t • (aa|bb)(a|b)*c? = (“aa”|“bb”)(“a”|“b”)* “c”? (2017 -1) Compiler

LEX Convention (2) • [. . . ] : any one of them •

LEX Convention (2) • [. . . ] : any one of them • [abxz]: any one of the characters a, b, x, z • (aa|bb)(ab)*c? • Hyphen • Ranges of characters • [0 -9] (2017 -1) Compiler

LEX Convention (3) • . • Represents a set of characters • Any character

LEX Convention (3) • . • Represents a set of characters • Any character except a newline • ^ • Complementary sets • [^0 -9 abc]: any character that is not a digit and is not one of the letter a, b, c (2017 -1) Compiler

LEX Convention (4) • Square bracket • Most of the meta-characters lose their special

LEX Convention (4) • Square bracket • Most of the meta-characters lose their special status • [-+] == (“+”|“-”) • [+-]: from “+”, all characters • [. ”? ]: any of the three characters. , ”, ? • [^\]: ^ or (2017 -1) Compiler

LEX Convention (5) • Curly bracket • Names of regular expressions nat = [0

LEX Convention (5) • Curly bracket • Names of regular expressions nat = [0 -9]+ signed. Nat = (“+”|“-”)? nat [0 -9]+ signed. Nat (“+”|“-”)? {nat} (2017 -1) Compiler

Format of LEX Input (1) • Input file = regular expression + C code

Format of LEX Input (1) • Input file = regular expression + C code • Definitions • Any C code that must be inserted to any function - %{…}% • Names of regular expressions • Rules • Regular expressions + C code (action) • Auxiliary routines (optional) • C code + main program (if needed) (2017 -1) Compiler

Format of LEX Input (2) • Layout {definitions} %% {rules} %% {auxiliary routines} (2017

Format of LEX Input (2) • Layout {definitions} %% {rules} %% {auxiliary routines} (2017 -1) Compiler

Example 1: scanner that adds line numbers to text %{ /* a Lex program

Example 1: scanner that adds line numbers to text %{ /* a Lex program that adds line numbers to lines of text, printing the new text to the standard output */ #include <stdio. h> int lineno = 1; %} line. *n %% {line} {printf(“%5 d %s”, lineno++, yytext); } %% main() { yylex(); return 0; } (2017 -1) Compiler

Example 2: prints the count of # of replacements %{ /* a Lex program

Example 2: prints the count of # of replacements %{ /* a Lex program that changes all numbers from decimal to hexadecimal notation, printing a summary statistic stderr */ #include <stdlib. h> #include <stdio. h> int count = 0; %} digit [0 -9] number {digit}+ %% {number} { int n = atoi(yytext); printf(“%x”, n); if (n > 9) count++; } %% (2017 -1) Compiler

main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } (2017

main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } (2017 -1) Compiler

Example 3: prints all input lines that begin or end with the ‘a’ %{

Example 3: prints all input lines that begin or end with the ‘a’ %{ /* Selects only lines that end or begin with the letter ‘a’. Deletes everything else. */ #include <stdio. h> %} ends_with_a. *an begins_with_a a. *n %% {ends_with_a} ECHO; {begins_with_a} ECHO; . *n ; %% main() { yylex(); return 0; } (2017 -1) Compiler

Summary (1) • Ambiguity resolution • The principles of longest substring • Substring with

Summary (1) • Ambiguity resolution • The principles of longest substring • Substring with equal length: first-match first-serve • No match: copy the next character to the output and continue (2017 -1) Compiler

Summary (2) • Insertion of C Code • %{ … %}: exact copy •

Summary (2) • Insertion of C Code • %{ … %}: exact copy • Auxiliary procedure section: exact copy at the end • Any code following a RE (action): at the appropriate place in yylex (2017 -1) Compiler

Lex Internal Names • lex. yy. c: Lex output file name or lexyy. c

Lex Internal Names • lex. yy. c: Lex output file name or lexyy. c • yylex: Lex scanning routine • yytext: String matched on current action • yyin: Lex input file (default: stdin) • yyout: Lex output file (default: stdout) • input: Lex buffered input routine • ECHO: Lex default action (print yytext to yyout) (2017 -1) Compiler

LEX for TINY %{ #include “globals. h” #include “util. h” #include “scan. h” /*

LEX for TINY %{ #include “globals. h” #include “util. h” #include “scan. h” /* lexeme of identifier or reserved word */ char token. String[MAXTOKENLEN+1]; %} digit [0 -9] number {digit}+ letter [a-z. A-Z] identifier {letter}+ newline n whitespace [ t] %% (2017 -1) Compiler

“if” { return IF; } “then” { return THEN; } “else” { return ELSE;

“if” { return IF; } “then” { return THEN; } “else” { return ELSE; } “end” { return END; } “repeat”{ return REPEAT; } “until” { return UNTIL; } “read” { return READ; } “write” { return WRITE; } “: =” { return ASSIGN; } “=” { return EQ; } “<” { return LT; } “+” { return PLUS; } “-” { return MINUS; } “*” { return TIMES; } “/” { return OVER; } “(” { return LPAREN; } “)” { return RPAREN; } “; ” { return SEMI; } (2017 -1) Compiler

{number} { return NUM; } {identifier} { return ID; } {newline} { lineno++; }

{number} { return NUM; } {identifier} { return ID; } {newline} { lineno++; } {whitespace} { /* skip whitespace */ } “{” { char c; do { c = input(); if (c == ‘n’) lineno++; } while (c != ‘}’); }. { return ERROR; } %% (2017 -1) Compiler

Token. Type get. Token(void) { static int first. Time = TRUE; Token. Type current.

Token. Type get. Token(void) { static int first. Time = TRUE; Token. Type current. Token; if (first. Time) { first. Time = FALSE; lineno++; yyin = source; yyout = listing; } current. Token = yylex(); strncpy(token. String, yytext, MAXTOKENLEN); if (Trace. Scan) { fprintf(listing, “t%d: “, lineno); print. Token(current. Token, token. String); } return current. Token; } (2017 -1) Compiler

참고 • 교재 • Lex & Yacc 2 nd Edition, John R. Levine, Tony

참고 • 교재 • Lex & Yacc 2 nd Edition, John R. Levine, Tony Mason, Doug Brown, O'Reilly, 1992 • 예제 • http: //myweb. stedwards. edu/laurab/cosc 4342/lexexamples. html (2017 -1) Compiler