LEX Yacc SungDong Kim Dept of Computer Engineering

  • Slides: 31
Download presentation
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University

LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University

LEX • Input: tiny. l • Output: lex. yy. c or lexyy. c —Procedure

LEX • Input: tiny. l • Output: lex. yy. c or lexyy. c —Procedure yylex – Table-driven implementation of a DFA – Similar to “get. Token” RE + action Lex (2011 -1) Compiler Scanner (C code) 2

LEX Convention (1) • Metacharacters —Quotes: actual characters – For not metacharacters: “if”, if

LEX Convention (1) • Metacharacters —Quotes: actual characters – For not metacharacters: “if”, if – For metacharacters: “(” —Backslash – (* = “*” – n, t —(aa|bb)(a|b)*c? = (“aa”|“bb”)(“a”|“b”)* “c”? (2011 -1) Compiler 3

LEX Convention (2) • [. . . ] : any one of them —[abxz]:

LEX Convention (2) • [. . . ] : any one of them —[abxz]: any one of the characters a, b, x, z —(aa|bb)(ab)*c? • Hyphen —Ranges of characters —[0 -9] (2011 -1) Compiler 4

LEX Convention (3) • . —Represents a set of characters —Any character except a

LEX Convention (3) • . —Represents a set of characters —Any character except a newline • ^ —Complementary sets – [^0 -9 abc]: any character that is not a digit and is not one of the letter a, b, c (2011 -1) Compiler 5

LEX Convention (4) • Square bracket —Most of the metacharacters lose their special status

LEX Convention (4) • Square bracket —Most of the metacharacters lose their special status —[-+] == (“+”|“-”) —[+-]: from “+”, all characters —[. ”? ]: any of the three characters. , ”, ? —[^\]: ^ or (2011 -1) Compiler 6

LEX Convention (5) • Curly bracket —Names of regular expressions nat = [0 -9]+

LEX Convention (5) • Curly bracket —Names of regular expressions nat = [0 -9]+ signed. Nat = (“+”|“-”)? nat [0 -9]+ signed. Nat (“+”|“-”)? {nat} (2011 -1) Compiler 7

Format of LEX Input (1) • Input file = regular expression + C code

Format of LEX Input (1) • Input file = regular expression + C code —Definitions – Any C code that must be inserted to any function %{…}% – Names of regular expressions —Rules – Regular expressions + C code (action) —Auxiliary routines (optional) – C code + main program (if needed) (2011 -1) Compiler 8

Format of LEX Input (2) • Layout {definitions} %% {rules} %% {auxiliary routines} (2011

Format of LEX Input (2) • Layout {definitions} %% {rules} %% {auxiliary routines} (2011 -1) Compiler 9

Example 1: scanner that adds line numbers to text %{ /* a Lex program

Example 1: scanner that adds line numbers to text %{ /* a Lex program that adds line numbers to lines of text, printing the new text to the standard output */ #include <stdio. h> int lineno = 1; %} line. *n %% {line} {printf(“%5 d %s”, lineno++, yytext); } %% main() { yylex(); return 0; } (2011 -1) Compiler 10

Example 2: prints the count of # of replacements %{ /* a Lex program

Example 2: prints the count of # of replacements %{ /* a Lex program that changes all numbers from decimal to hexadecimal notation, printing a summary statistic stderr */ #include <stdlib. h> #include <stdio. h> int count = 0; %} digit [0 -9] number {digit}+ %% {number} { int n = atoi(yytext); printf(“%x”, n); if (n > 9) count++; } %% (2011 -1) Compiler 11

main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } (2011

main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } (2011 -1) Compiler 12

Example 3: prints all input lines that begin or end with the ‘a’ %{

Example 3: prints all input lines that begin or end with the ‘a’ %{ /* Selects only lines that end or begin with the letter ‘a’. Deletes everything else. */ #include <stdio. h> %} ends_with_a. *an begins_with_a a. *n %% {ends_with_a} ECHO; {begins_with_a} ECHO; . *n ; %% main() { yylex(); return 0; } (2011 -1) Compiler 13

Summary (1) • Ambiguity resolution —The principles of longest substring —Substring with equal length:

Summary (1) • Ambiguity resolution —The principles of longest substring —Substring with equal length: first-match first-serve —No match: copy the next character and continue (2011 -1) Compiler 14

Summary (2) • Insertion of C Code — %{ … %}: exact copy —Auxiliary

Summary (2) • Insertion of C Code — %{ … %}: exact copy —Auxiliary procedure section: exact copy at the end —Any code following a RE (action): at the appropriate place in yylex (2011 -1) Compiler 15

Lex Internal Names • lex. yy. c: Lex output file name or lexyy. c

Lex Internal Names • lex. yy. c: Lex output file name or lexyy. c • yylex: Lex scanning routine • yytext: String matched on current action • yyin: Lex input file (default: stdin) • yyout: Lex output file (default: stdout) • input: Lex buffered input routine • ECHO: Lex default action (print yytext to yyout) (2011 -1) Compiler 16

LEX for TINY %{ #include “globals. h” #include “util. h” #include “scan. h” /*

LEX for TINY %{ #include “globals. h” #include “util. h” #include “scan. h” /* lexeme of identifier or reserved word */ char token. String[MAXTOKENLEN+1]; */ digit number letter identifier newline whitespace [0 -9] {digit}+ [a-z. A-Z] {letter}+ n [ t] %% (2011 -1) Compiler 17

“if” { “then” { “else” { “end” { “repeat” “until” { “read” { “write”

“if” { “then” { “else” { “end” { “repeat” “until” { “read” { “write” { “: =” { “<” { “+” { “-” { “*” { “/” { “(” { “)” { “; ” { return IF; } THEN; } ELSE; } END; } { return REPEAT; } return UNTIL; } return READ; } return WRITE; } return ASSIGN; } return EQ; } return LT; } return PLUS; } return MINUS; } return TIMES; } return OVER; } return LPAREN; } return RPAREN; } return SEMI; } (2011 -1) Compiler 18

{number} {identifier} {newline} {whitespace} “{” . { { { } return NUM; } return

{number} {identifier} {newline} {whitespace} “{” . { { { } return NUM; } return ID; } lineno++; } /* skip whitespace */ } char c; do { c = input(); if (c == ‘n’) lineno++; } while (c != ‘}’); { return ERROR; } %% (2011 -1) Compiler 19

Token. Type get. Token(void) { static int first. Time = TRUE; Token. Type current.

Token. Type get. Token(void) { static int first. Time = TRUE; Token. Type current. Token; if (first. Time) { first. Time = FALSE; lineno++; yyin = source; yyout = listing; } current. Token = yylex(); strncpy(token. String, yytext, MAXTOKENLEN); if (Trace. Scan) { fprintf(listing, “t%d: “, lineno); print. Token(current. Token, token. String); } return current. Token; } (2011 -1) Compiler 20

YACC • LALR(1) parser generator syntax spec. Parser Generator parser • Yet another compiler

YACC • LALR(1) parser generator syntax spec. Parser Generator parser • Yet another compiler (2011 -1) Compiler 21

YACC Basics (1) • Input/output filename. y Yacc y. tab. c ytab. c filename.

YACC Basics (1) • Input/output filename. y Yacc y. tab. c ytab. c filename. tab. c • Specification file format {definitions} %% {rules} %% {auxiliary routines} (2011 -1) Compiler 22

YACC Basics (2) —Definitions – Information about tokens, data types, grammar rules – C

YACC Basics (2) —Definitions – Information about tokens, data types, grammar rules – C code output file —Rules – Modified BNF format – C code —Auxiliary routines – Procedure and function declarations – main() yyparse() yylex() (2011 -1) Compiler 23

%{ #include <stdio. h> #include <ctype. h> %} %token NUMBER %% command : exp

%{ #include <stdio. h> #include <ctype. h> %} %token NUMBER %% command : exp ‘+’ term | exp ‘-’ term | term ; term : term ‘*’ factor | factor ; factor : NUMBER | ‘(’ exp ‘)’ ; %% {printf(“%dn”, $1); } {$$ = $1 + $3; } {$$ = $1 - $3; } {$$ = $1 * $3; } {$$ = $1; } {$$ = $2; } (2011 -1) Compiler 24

main() { return yyparse(); } int yylex(void) { int c; while((c = getchar()) ==

main() { return yyparse(); } int yylex(void) { int c; while((c = getchar()) == ‘ ‘); /* blank 제거 */ if (isdigit(c)) { ungetc(c, stdin); scanf(“%d”, &yylval); return(NUMBER); } if (c == ‘n’) return 0; /* 파싱 정지 */ return(c); } void yyerror(char *s) { fprintf(stderr, ”%sn”, s); /* 에러메시지 출력*/ return 0; } (2011 -1) Compiler 25

YACC Options (1) • -d —Header file generation —yacc –d filename. y – y.

YACC Options (1) • -d —Header file generation —yacc –d filename. y – y. tab. h, ytab. h, filename. tab. h —Other file – #include y. tab. h – Call yylex() (2011 -1) Compiler 26

YACC Options (2) • -v option —Verbose option —yacc –d filename. y – y.

YACC Options (2) • -v option —Verbose option —yacc –d filename. y – y. output (2011 -1) Compiler 27

state 0 $accept : command $end state 2 NUMBER shift 5 ( shift 6.

state 0 $accept : command $end state 2 NUMBER shift 5 ( shift 6. error command goto 1 exp goto 2 term goto 3 factor goto 4 + shift 7 - shift 8. reduce 1 state 3 state 1 $accept : command_$end accept. error command : exp_ (1) exp : exp_+ term exp : exp_- term exp : term_ (4) term : term_* factor * shift 9. reduce 4 state 4 term : factor_ (6). reduce 6 (2011 -1) Compiler 28

state 5 factor : NUMBER_ (7) state 7 . reduce 7 state 6 NUMBER

state 5 factor : NUMBER_ (7) state 7 . reduce 7 state 6 NUMBER shift 5 ( shift 6. error factor : (_exp ) NUMBER shift 5 ( shift 6. error exp : exp +_term goto 11 factor goto 4 state 8 exp goto 10 term goto 3 factor goto 4 exp : exp -_term NUMBER shift 5 ( shift 6. error term goto 12 factor goto 4 (2011 -1) Compiler 29

state 9 term : term *_factor NUMBER shift 5 ( shift 6. error factor

state 9 term : term *_factor NUMBER shift 5 ( shift 6. error factor goto 13 state 10 exp : exp_+ term exp : exp_- term factor : ( exp_) + shift 7 - shift 8 ) shift 14. error state 11 exp : exp + term_ (2) term : term_* factor * shift 9. reduce 2 state 12 exp : exp – term_ (3) term : term_* factor * shift 9. reduce 3 state 13 term : term * factor_ (5). reduce 5 (2011 -1) Compiler 30

state 14 factor : ( exp )_ (8). reduce 8 8/127 terminals, 4/600 nonterminals

state 14 factor : ( exp )_ (8). reduce 8 8/127 terminals, 4/600 nonterminals 9/300 grammar rules, 15/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 9/601 working sets used memory: states, etc. 36/2000, parser 11/4000 9/601 distinct lookahead sets 6 extra closures 18 shift entries, 1 exceptions 8 goto entries 4 entries saved by goto default Optimizer space used: input 50/2000, output 218/4000 218 table entries, 202 zero maximum spread: 257, maximum offset: 43 (2011 -1) Compiler 31