flex and bison CSE 4102 Prof Steven A
flex and bison CSE 4102 Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269 -3155 steve@engr. uconn. edu http: //www. engr. uconn. edu/~steve (860) 486 - 4818 Land. Y. 1
Flex and bison CSE 4102 m m m Two Compiler Writing Tools that are Utilized to easily Specify: q Lexical Tokens and their Order of Processing (Lex) q Context Free Grammar for LALR(1) (Yacc) Both Lex and Yacc have Long History in Computing q Lex and Yacc – Earliest Days of Unix Minicomputers q Flex and Bison – From GNU q JFlex - Fast Scanner Generator for Java q BYacc/J – Berkeley q CUP, ANTRL, PCYACC, … q PCLEX and PCYACC from Abacus See: http: //dinosaur. compilertools. net/ Land. Y. 2
Lex – A Lexical Analyzer Generator CSE 4102 m m m A Unix Utility from early 1970 s A Compiler that Takes as Source a Specification for: q Tokens/Patterns of a Language q Generates a “C” Lexical Analyzer Program Pictorially: Lex Source Program: lex. y Lex Compiler lex. yy. c C Compiler Input stream a. out lex. yy. c a. out Sequence of tokens Land. Y. 3
Format of a Lexical Specification – 3 Parts CSE 4102 m m Declarations: q Defs, Constants, Types, #includes, etc. that can Occur in a C Program q Regular Definitions (expressions) Translation Rules: q Pairs of (Regular Expression, Action) q Informs Lexical Analyzer of Action when Pattern is Recognized Lex. y File Format: DECLARATIONS Auxiliary Procedures: %% q Designer Defined C Code TRANSLATION RULES %% q Can Replace System Calls AUXILIARY PROCEDURES See Also q q http: //www. cs. fsu. edu/~langley/COP 4342 -2006 -Fall/17 programdevel 04. pdf http: //alumni. cs. ucr. edu/~lgao/teaching/flex. html Land. Y. 4
Example lex. l File CSE 4102 %{ #define #define #define #define %} letter digit ws id comment integer real string %% ": =" "else" T_IDENTIFIER T_INTEGER T_REAL T_STRING T_ASSIGN T_ELSE T_IF T_THEN T_EQ T_LT T_NE T_GT 300 301 302 303 304 305 306 307 308 309 310 311 312 User Defined Values to Each Token (else lex will assign) Regular Expression [a-z. A-Z] [0 -9] Rules for later [ tn]+ token definitions [A-Za-z][A-Za-z 0 -9]* "(*"([^*]|n|"*"+[^)])*"*"+")" [0 -9]+/([^0 -9]|". . ") [0 -9]+". "[0 -9]*([0 -9]|"E"[+-]? [0 -9]+) '([^']|'')*' {printf(" %s ", yytext); return(T_ASSIGN); } {printf(" %s ", yytext); return(T_ELSE); } Token Definitions Land. Y. 5
Example lex. l File CSE 4102 "then" { #ifdef PRNTFLG Conditional compilation action printf(" %s ", yytext); #endif return(T_THEN); } "<=" {printf(" %s ", yytext); return(T_EQ); } Token "<" {printf(" %s ", yytext); return(T_LT); } Definitions "<>" {printf(" %s ", yytext); return(T_NE); } ">=" {printf(" %s ", yytext); return(T_GE); } ">" {printf(" %s ", yytext); return(T_GT); } {id} {printf(" %s ", yytext); return(T_IDENTIFIER); } {integer} {printf(" %s ", yytext); return(T_INTEGER); } {real} {printf(" %s ", yytext); return(T_REAL); } {string} {printf(" %s ", yytext); return(T_STRING); } {comment} {/* T_COMMENT */} Discard {ws} {/* spaces, tabs, newlines */} %% yywrap(){return 0; } EOF for input main() { int i; do { i = yylex(); } while (i!=0); } Three Variables: yytext = “currenttoken” yylen = 12 yylval = 300 Land. Y. 6
What is wrong with Following? CSE 4102 letter digit ws id comment integer real string %% [a-z. A-Z] [0 -9] [ tn]+ [A-Za-z][A-Za-z 0 -9]* "(*"([^*]|n|"*"+[^)])*"*"+")" [0 -9]+/([^0 -9]|". . ") [0 -9]+". "[0 -9]*([0 -9]|"E"[+-]? [0 -9]+) '([^']|'')*' {id} {printf(" %s ", yytext); return(T_IDENTIFIER); } {integer} {printf(" %s ", yytext); return(T_INTEGER); } {real} {printf(" %s ", yytext); return(T_REAL); } {string} {printf(" %s ", yytext); return(T_STRING); } {comment} {/* T_COMMENT */} {ws} {/* spaces, tabs, newlines */} ": =" "else" "then" "<=" "<>" ">=" ">" {printf(" {printf(" %s %s ", ", yytext); return(T_ASSIGN); } yytext); return(T_ELSE); } yytext); return(T_THEN); } yytext); return(T_EQ); } yytext); return(T_LT); } yytext); return(T_NE); } yytext); return(T_GT); } %% Land. Y. 7
Other Possible Actions CSE 4102 %% ": =" "else" "then" "<=". . . Etc. . ">" {return(T_ASSIGN); } {return(T_ELSE); } {return(T_THEN); } {yylval = T_EQ; return(T_EQ); } {id} {integer} {real} {comment} {ws} {yylval = install_id(); return(T_IDENTIFIER); } {yylval = install_int(); return(T_INTEGER); } {yylval = install_real(); return(T_REAL); } {/* T_COMMENT */} {/* spaces, tabs, newlines */} {yylval = T_GT; return(T_GT); } %% install_id() { /* A procedure to install the lexeme whose first character is pointed to by yytext and whose length is yylen into symbol table and return a pointer */ } install_int() { /* Similar – but installs an integer lexeme into symbol table */ } install_real() { /* Similar – but installs a real lexeme into symbol table */ } Land. Y. 8
Revisiting Internal Variables in Lex CSE 4102 m m char *yytext; q Pointer to current lexeme terminated by ‘ ’ int yylen; q Number of chacters in yytex but not ‘ ’ yylval: q Global variable through which the token value can be returned to Yacc q Parser (Yacc) can access yylval, yylen, and yytext How are these used? q Consider Integer Tokens: q yylval = ascii_to_integer (yytext); q Conversion from String to actual Integer Value Land. Y. 9
Using the lex Compiler CSE 4102 m Important Highlights q Unix Lex defaults with respect to: Ø Single Rule size (2048 bytes) Ø All Actions (20480 bytes) Ø DFA States (512) Ø NFA States (254) m Command Line: q lex myfile. l Generates lex. yy. c q pclex myfile. l Generates myfile. c q -v flag Includes Statistics on State Machine, etc. Land. Y. 10
Highlights Generated lex. yy. c File # define output (c) putc(c, yyout); CSE # define input() ((( yytchar=yysptr>yysbug? U(*--yysptr); getc(yyin))==10? 4102 yylineno++, yytchar): yytchar)==EOF? 0: yytchar) # define uput() (yttchar= (c); if (yytchar==‚n‘)yylineno--; *yysptr__=yytchar; } FILE *yyin={stdin}, *yyout = {stdout}; yyinput () { return(input()); } yyoutput(c) int c { output(c); } yyunput(c) int c { upput(c); } Compilation at Unix Command Line: lexfile. l (creates lex. yy. c) cc lex. yy. c –ll (include lex library) Land. Y. 11
Full lex. yy. c File # CSE # 4102 # # # # # include "stdio. h" define U(x) x define NLSTATE yyprevious 3 YYNEWLINE define BEGIN yybgin - yysvec + 1 + define INITIAL 0 define YYLERR yysvec define YYSTATE (yyestate-yysvec-1) define YYOPTIM 1 define YYLMAX BUFSIZ define output(c) putc(c, yyout) define input. O (((yytchar-yysptr>yysbuf? U(*--yysptr): getc(yyin))--10? (yylineno++, yytchar): yytchar)--EOF? 0: yytchar) # define unput(c) {yytchar= (c); if(yytchar=-'n')yylineno--; *yysptr++-yytchar; } # define yymore () (yymorfg-1) # define ECHO fprintf(yyout, "%s", yytext) # define REJECT { nstr - yyreject(); goto yyfussy; } int yyleng; extern char yytext[]; int yymorfg; extern char *yysptr, yysbuf[]; int yytchar; FILE *yyin - {stdin}, *yyout - {stdout); extern int yylineno; struct yysvf { struct yywork *yystoff; struct yysvf *yyother; int *yystops; }; struct yysvf *yyestate; extern struct yysvf yysvec[], *yybgin; Land. Y. 12
Full lex. yy. c File CSE 4102 #define T_IDENTIFIER 300 #define T INTEGER 301 #define T_REAL 302 #define T STRING 303 #define T_ASSIGN 304 #define T ELSE 305 #define T_IF 306 #define T_THEN 307 #define T_EQ 308 #define T LT 309 #define T_NE 310 #define T GE 311 #define T_GT 312 #define YYNEWLINE 10 yylex ( ) { int nstr; extern int yyprevious; while((nstr - yylook()) >- 0) yyfussy: switch(nstr) { case 0: if(yywrap()) return(0); break; case 1: {printf(" %s ", yytext); return(TASSIGN); } break; case 2: {printf(" %s ", yytext); return(T_ELSE); } break; case 3: (printf(" %s ", yytext) ; return (T IF) ; } break; Land. Y. 13
Full lex. yy. c File CSE 4102 case 4: { #ifdef PRNTFLG printf(" %s ", yytext); #endif return(T_THEN); } break; case 5: {printf(" %s ", break; case 6: {printf(" %s ", break; case 7: {printf(" %s ", break; case 8: {printf(" %s ", break; case 9: {printf(" %s ", break; case 10: {printf(" %s ", break; case 11: {printf(" %s ", break; case 12: {printf(" %s ", break; case 13: {printf(" %s ", yytext); return(T_EQ); } yytext); return(T_LT); } yytext); return(T_NE); ) yytext); return(T_GE); } yytext); return(T_GT); } yytext); return(T_IDENTIFIER); } yytext); return(T_INTEGER); ) yytext) ; return(T_REAL); } yytext); return(T_STRING); } Land. Y. 14
Full lex. yy. c File CSE 4102 break; case 14: {/* T COMMENT */} break; case 15: {/* spaces, tabs, newlines */} break; case -1: break; default: fprintf(yyout, "bad switch yylook %d", nstr); ) return (0); } /* end of yylex */ yywrap. O{} main() { int i; do { i = yylex(); } while (i!=0); } Land. Y. 15
A Pascal lex. l CSE 4102 %{ #include "y. tab. h" %} letter digit [a-z. A-Z] [0 -9] ws id comment integer real string [ tn]+ [A-Za-z][A-Za-z 0 -9]* "(*"([^*]|n|"*"+[^)])*"*"+")" [0 -9]+/([^0 -9]|". . ") [0 -9]+". "[0 -9]*([0 -9]|"E"[+-]? [0 -9]+) '([^']|'')*' %% ": =" {return(T_ASSIGN); } ": " {return(T_COLON); } "array" {return(T_ARRAY); } "begin" {return(T_BEGIN); } "case" {return(T_CASE); } "const" {return(T_CONST); } "downto" {return(T_DOWNTO); } "do" {return(T_DO); } "else" {return(T_ELSE); } "end" {return(T_END); } "file" {return(T_FILE); } "for" {return(T_FOR); } Land. Y. 16
A Pascal lex. l "function" {return(T_FUNCTION); } /* "goto" {return(T_GOTO); } */ CSE "if" {return(T_IF); } 4102 "label" {return(T_LABEL); } "nil" {return(T_NIL); } "not" {return(T_NOT); } "of" {return(T_OF); } /* "packed" {return(T_PACKED); } */ "procedure" {return(T_PROCEDURE); } "end" {return(T_END); } "program" {return(T_PROGRAM); } "record" {return(T_RECORD); } "repeat" {return(T_REPEAT); } "set" {return(T_SET); } "then" {return(T_THEN); } "to" {return(T_TO); } "type" {return(T_TYPE); } "until" {return(T_UNTIL); } "var" {return(T_VAR); } "while" {return(T_WHILE); } /* "with" {return(T_WITH); } */ "+" {return(T_PLUS); } "-" {return(T_MINUS); } "or" {return(T_OR); } "and" {return(T_AND); } "div" {return(T_DIV); } "mod" {return(T_MOD); } "/" {return(T_RDIV); } Land. Y. 17
A Pascal lex. l "*" CSE "(" 4102 ")" "=" ", " ". . " "[" "]" "<=" "<>" ">=" ">" "in" "^" "; " {return(T_MULT); } {return(T_LPAREN); } {return(T_RPAREN); } {return(T_EQ); } {return(T_COMMA); } {return(T_RANGE); } {return(T_PERIOD); } {return(T_LBRACK); } {return(T_RBRACK); } {return(T_EQ); } {return(T_LT); } {return(T_NE); } {return(T_GT); } {return(T_IN); } {return(T_UPARROW); } {return(T_SEMI); } {id} {return(T_IDENTIFIER); } {integer} {return(T_INTEGER); } {real} {return(T_REAL); } {string} {return(T_STRING); } {comment} {/* T_COMMENT */} {ws} {/* spaces, tabs, newlines */} Land. Y. 18
Yacc- Yet Another Compiler CSE 4102 https: //www. slideshare. net/kinnarshah 8888/ch 4 c m m m Also from early 1970 s A Compiler that Takes as Source a Specification for: q Organization of Tokens into Grammar Rules q Generates a LALR(1) Parser Pictorially: Land. Y. 19
A Combined View CSE 4102 http: //slideplayer. com/slide/4937457/ Land. Y. 20
Bison CSE 4102 m m Compiler Writing Tool that Generates LALR(1) Parser Grammar Rules (BNF) can be Modified/Augmented with Semantic Actions via Code Segments Can work in Conjunction with Lex or Separately Three Major Parts of a Bison Specification: Declarations %% Grammar Rules %% User Supplied Programs Land. Y. 21
A First Example CSE 4102 %{ /*Includes and Global Variables here*/ #include <stdio. h> #include <ctype. h> %} %start line %token DIGIT %% /* Grammar Rules */ line : expr 'n' ; expr : expr '+' term | term ; term : term '*' fact | fact ; fact : '(' expr ')' | DIGIT ; %% %% /* Define own yylex */ yylex(){ int c; c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } /* Error Routine */ yyerror(){} /* yyparse calls yylex */ main() { yyparse(); } Land. Y. 22
How Do Grammar Rules Fire? CSE 4102 m Follow RM Derivation in Reverse! Input 5 + 3 * 8 line : expr 'n' expr : expr '+' term | term : term '*' fact | fact : '(' expr ')' | DIGIT E E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT Land. Y. 23
Stack Performs RM Derivation in Reverse CSE 4102 E E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT F T E * T + E DIGIT * T + E F * T + E + E DIGIT + E F + E T + E E Land. Y. 24
What is this Grammar Similar Too? CSE 4102 /* Grammar Rules */ line : expr 'n' ; expr : expr '+' term | term ; term : term '*' fact | fact ; fact : '(' expr ')' | DIGIT ; 1: E’ 2: E 3: → 4: T T*F 5: T 6: F 7: F →E$ →E+T E T → →F →(E) → Id 1: L 2: E 3: →T 4: T*F 5: T 6: F 7: F →E$ →E+T E T → →F →(E) → digit Land. Y. 25
What are the LALR(1) Item Sets? CSE 4100 L →. E GOTO(I 0, E) $ E →. E GOTO(I 0, T) +T E →. T GOTO(I 0, F) T →. T GOTO(I 0, ( ) *F T →. F F State →I 0. ( E) F →. digit GOTO(I 0, digit ) L $ E E +T T *F T →E. → State E. I →T. 2 State I 3 →F. State I 4 F E) E +T E T *F T F F E) →(. →. E →. T →State. FI → digit →State. (I. 5 6 CH 4 p 3. 26
LALR State Machine CSE 4102 m (yacc –v *. y) Generates y. output state 0 state 5 $accept : _line $end DIGIT shift 6 ( shift 5. error line goto 1 expr goto 2 term goto 3 fact goto 4 fact : (_expr ) DIGIT shift 6 ( shift 5. error expr goto 9 term goto 3 fact goto 4 state 6 state 1 fact : DIGIT. reduce 7 $accept line_$end accept. error state 7 expr : expr +_term DIGIT shift 6 ( shift 5. error term goto 10 U fact goto 4 state 2 line : expr_ expr : expr_+ term + shift 7 • reduce 1 (7) (1) state 8 state 3 expr : term_ term : term_* fact * shift 8. reduce 3 (3) term : fact_. reduce 5 (5) term : term *_fact DIGIT shift 6 ( shift 5. error fact goto 11 state 4 Land. Y. 27
LALR State Machine CSE 4102 state 9 expr : expr_+ term fact : ( expr_) + shift 7 ) shift 12 • error state 10 expr : expr + term_ (2) term : term_* fact * shift 8 • reduce 2 state 11 term : term * fact_ (4). reduce 4 state 12 fact : ( expr )_ reduce 6 (6) 7/300 terminals, 4/300 nonterminals 8/600 grammar rules, 13/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 8/350 working sets used memory: states, etc. 69/24000, parser 9/12000 9/600 distinct lookahead sets 4 extra closures 13 shift entries, 1 exceptions 7 goto entries 3 entries saved by goto default Optimizer space used: input 38/24000, output 218/12000 218 table entries, 205 zero maximum spread: 257, maximum offset: 43 Land. Y. 28
How Do These Relate to Item Sets? CSE state 0 4102 $accept : _line $end DIGIT shift 6 ( shift 5. error line goto 1 expr goto 2 term goto 3 fact goto 4 state 1 $accept line_$end accept. error state 2 line : expr_ expr : expr_+ term + shift 7 • reduce 1 (1) expr : term_ term : term_* fact * shift 8. reduce 3 (3) term : fact_. reduce 5 (5) state 3 state 4 Land. Y. 29
Defining Precedence CSE 4102 %token NUMBER %left '+' '-' %left '*' '/' %right UMINUS Left associative and Equal precedence %% expr : expr '+' expr {$$ = $1 + $3; } | expr '-' expr {$$ = $1 - $3; } | expr '*' expr {$$ = $1 * $3; } | expr '/' expr {$$ = $1 / $3; } | '(' expr ') {$$ = $2; } | '-‚expr %prec UMINUS {$$ = - $2; } | NUMBER UMINUS Highest ; precedence of all {fact. val = expr. val} $$ = $2 | DIGIT {fact. val = DIGIT. lexval} $$ = char_to_int(yytext) Land. Y. 30
Another Grammar a Programming Language CSE 4102 statement -> if_then opt_else | assign_stmt if_then -> T_IF rel_expr T_THEN statement opt_else -> /* empty */ | T_ELSE statement assign_stmt -> T_IDENTIFIER T_ASSIGN value -> T_INTEGER | T_REAL | T_STRING rel_expr -> compare rel_op compare -> T_IDENTIFIER rel_op -> T_EQ | T_LT | T_NE | T_GT Land. Y. 31
y. output as Generated by Bison CSE 4102 State 3 contains 1 shift/reduce conflict. Grammar rule 1 statement -> if_then opt_else rule 2 statement -> assign_stmt rule 3 if_then -> T_IF rel_expr T_THEN statement rule 4 opt_else -> /* empty */ rule 5 opt_else -> T_ELSE statement rule 6 assign_stmt -> T_IDENTIFIER T_ASSIGN value rule 7 value -> TINTEGER rule 8 value -> TREAL rule 9 value -> T_STRING rule 10 rel_expr -> compare rel_op compare rule 11 compare -> T_IDENTIFIER rule 12 compare -> value rule 13 rel_op -> T_EQ rule 14 rel_op -> T_LT rule 15 rel_op -> T_NE rule 16 rel_op -> T_GE rule 17 rel_op -> T_GT Land. Y. 32
y. output as Generated by Bison CSE 4102 Terminals, with rules where they appear $ (-1) error (256) T_IF (258) 3 T_THEN (259) 3 T_ELSE (260) 5 T_IDENTIFIER (261) 6 11 T_ASSIGN (262) 6 T_INTEGER (263) 7 T_REAL (264) 8 T_STRING (265) 9 T_EQ (266) 13 T_LT (267) 14 T_NE (268) 15 T_GE (269) 16 T_GT (270) 17 Land. Y. 33
y. output as Generated by Bison CSE 4102 Nonterminals, with rules where they appear statement (16) on left: 1 2, on right: 3 5 if_then (17) on left: 3, on right: 1 opt_else (18) on left: 4 5, on right: 1 assign_stmt (19) on left: 6, on right: 2 value (20) on left: 7 8 9, on right: 6 12 rel_expr (21) on left: 10, on right: 3 compare (22) on left: 11 12, on right: 10 rel_op (23) on left: 13 14 15 16 17, on right: 10 Land. Y. 34
y. output as Generated by Bison CSE 4102 state 0 T_IF T_IDENTIFIER statement if_then assign_stmt shift, and go to state 1 shift, and go to state 26 go to state 3 go to state 4 if_then -> T_IF TIDENTIFIER TINTEGER T REAL T_STRING value rel_expr compare . rel_expr T_THEN statement (rule 3) shift, and go to state 5 shift, and go to state 6 shift, and go to state 7 shift, and go to state 8 go to state 9 go to state 10 go to state 11 state 2 assign_stmt -> T_IDENTIFIER. TASSIGN value (rule 6) T_ASSIGN shift, and go to state 12 Land. Y. 35
y. output as Generated by Bison CSE 4102 state 3 statement -> if_then. opt_else (rule 1) T_ELSE shift, and go to state 13 T ELSE [reduce using rule 4 (opt_else)] $default reduce using rule 4 (opt_else) opt_else go to state 14. . . etc. . . state 25 rel_expr -> compare rel_op compare (rule 10) $default reduce using rule 10 (rel_expr) state 26 $ go to state 27 $ go to state 28 $default accept Land. Y. 36
Automatic Ambiguity Resolution CSE 4102 m m m Input Grammar May be Ambiguous Bison (and others) have Default Disambiguating Rules q In a Shift/Reduce Conflict, the Shift is Chosen q In a Reduce/Reduce Conflict, the Reduction is to Reduce by “earlier” rule (listed from top-down) Can’t Control S/R Conflict Resolution However, for R/R Resolution q Reorder Rules to Force Different Shift q Rewrite the Grammar to Remove Ambiguity Other Error is: q Rule Not Reduced Ø If S/R Picks Shift, and Rule Never Reduced Elsewhere Land. Y. 37
Hints for Writing Yacc Specifications CSE 4102 m m m Use All Capital Letters for Token Names and All Lower Case for Non-Terminals (Helps Debugging) Put Grammar Rules and Actions on Separate Lines (Makes Moving them Easier) Put all Rules with Same Left Hand Side Together and Utilize Veritical Bar for Alternatives Put a Semicolon After the Very Last Alternative for Each Left Hand Side and on a Separate Line Yacc Encourages Left Recursion LALR Discourages Right Recursion! Land. Y. 38
Revisiting First Example via Attr. Grammars CSE 4102 %{ /*Includes and Global Variables here*/ #include <stdio. h> #include <ctype. h> %} %start line %token DIGIT %% /* Grammar Rules */ line : expr 'n' ; expr : expr '+' term | term ; term : term '*' fact | fact ; fact : '(' expr ')' | DIGIT ; %% %% /* Define own yylex */ yylex(){ int c; c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } /* Error Routine */ yyerror(){} /* yyparse calls yylex */ main() { yyparse(); } Land. Y. 39
How Do Grammar Rules Fire? CSE 4102 m Just like Attribute Grammars! Input 5 + 3 * 8 line : expr 'n' expr : expr '+' term | term : term '*' fact | fact : '(' expr ')' | DIGIT E E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT Land. Y. 40
Stack Performs RM Derivation in Reverse CSE 4102 E E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT F T E * T + E DIGIT * T + E F * T + E + E DIGIT + E F + E T + E E Land. Y. 41
Corresponding Attribute Grammar CSE 4102 m val is a synthesized attribute line : expr {line. val = expr. val } ; expr : expr 1 '+' term {expr. val = expr 1. val + term. val} | term {expr. val = term. val} term : term 1 '*' fact {term. val = term 1. val * fact. val} | fact {term. val = fact. val} fact : '(' expr ')‚ {fact. val = expr. val} | DIGIT {fact. val = DIGIT. lexval} Land. Y. 42
How Does this Transition into Bison? CSE 4102 m Bison (in y. tab. c) Maintains User-Accessible Parsing Value type. */ Stack Defined as: /* #if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED typedef int YYSTYPE; # define YYSTYPE_IS_TRIVIAL 1 # define YYSTYPE_IS_DECLARED 1 #endif union yyalloc {yytype_int 16 yyss; YYSTYPE yyvs; }; yyvs $3 $2 $1 Consider Grammar Rule S -> A B C Eventually, A B C on Stack to be Replaced by S in Reduction For that Rule, Offsets into Parsing Stack are Defined as: $1 = A, $2 = B, $3 = C Land. Y. 43
How Does this Transition into Yacc? CSE 4102 yyvs $3 $2 $1 Consider Grammar Rule S -> A B C (all are nonterminals) Eventually, A B C on Stack to be Replaced by S in Reduction For that Rule, Offsets into Parsing Stack are Defined as: $1 = A, $2 = B, $3 = C S : A {$1 = 5; } B {$2 = 7; } C {$3 = 9; $$ = $1 + $2 + $3; } ; Land. Y. 44
Attribute Grammar Transitions to … CSE 4102 m val is a synthesized attribute line : expr {line. val = expr. val } ; expr : expr 1 '+' term {expr. val = expr 1. val + term. val} | term {expr. val = term. val} term : term 1 '*' fact {term. val = term 1. val * fact. val} | fact {term. val = fact. val} fact : '(' expr ')‚ {fact. val = expr. val} | DIGIT {fact. val = DIGIT. lexval} Land. Y. 45
Attribute Grammar + Bison commands to … CSE 4102 line : expr {line. val = expr. val } $$ = $1 expr : expr 1 '+' term {expr. val = expr 1. val + term. val} $$ = $1 + $3 | term {expr. val = term. val} $$ = $1 term : term 1 '*' fact {term. val = term 1. val * fact. val} $$ = $1 * $3 | fact {term. val = fact. val} $$ = $1 fact : '(' expr ')‚ {fact. val = expr. val} $$ = $2 | DIGIT {fact. val = DIGIT. lexval} $$ = char_to_int(yytext) Land. Y. 46
…Realization in Bison CSE 4102 %{ /*Includes and Global Variables here*/ #include <stdio. h> #include <ctype. h> %} %start line %token DIGIT %% /* Grammar Rules */ line : expr 'n‚ {$$ = $1} ; expr : expr '+' term {$$ = $1 + $3} | term {$$ = $1} ; term : term '*' fact {$$ = $1 + $3} | fact {$$ = $1} ; fact : '(' expr ')‚ {$$ = $2} | DIGIT {$$ = char_to_int(yytext) } ; %% %% /* Define own yylex */ yylex(){ int c; c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } /* Error Routine */ yyerror(){} /* yyparse calls yylex */ main() { yyparse(); } Land. Y. 47
Interactions Between Lex and Bison CSE 4102 IN LEX: char yytext[YYLMAX]; int yylength; yytext: globally passes lexeme to parser /* Value type. */ #if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED typedef int YYSTYPE; # define YYSTYPE_IS_TRIVIAL 1 # define YYSTYPE_IS_DECLARED 1 #endif union yyalloc { yytype_int 16 yyss; YYSTYPE yyvs; }; yylval: Set in lexical analyzer Returns Token value What is place in stack yyvs S -> A B C $$ $1 $2 $3 $3 $2 $1 Land. Y. 48
Pascal to C Conversion CSE 4102 m m m Utilize a Limited Subset of Pascal q If-Then-Else and Assignment Statements q Relational (Boolean) Expressions and Operators Conversions of Note: q If-Then-Else goes to If-Else (no then in C) q = Goes to == q < > Goes to != q : = Goes to = Key Issues q Define String Variables to Hold Concatenated “Program” Bottom Up q Construction Utilizes Current Lexeme (yytext) Concatenated with Appropriate Conversions q Information Passes “Up” the Grammar Land. Y. 49
Pascal to C Conversion CSE 4102 %{ #include <stdio. h> #include <ctype. h> char strans[100], atrans[100], itrans[100], etrans[100], vtrans[100], retrans[100], ctrans[100], rtrans[100]; %} %start statement %token T_IF T_THEN T_ELSE T_IDENTIFIER T_ASSIGN T_INTEGER T_REAL %token T_STRING T_EQ T_LT T_LE T_NE T_GT %% statement : if_then opt_else {strcpy(strans, itrans); strcat(strans, etrans); printf("%sn", strans); } | assign_stmt {strcat(strans, atrans); printf("%sn", strans); } ; if_then : T_IF rel_expr {strcpy(itrans, "if "); strcat(itrans, retrans); } T_THEN assign_stmt{strcat(itrans, atrans); } ; Land. Y. 50
Pascal to C Conversion CSE 4102 opt_else : /* the empty case */ {strcpy(etrans, ""); } | T_ELSE assign_stmt {strcpy(etrans, " else "); strcat(etrans, atrans); } ; assign_stmt : T_IDENTIFIER {strcpy(atrans, yytext); } T_ASSIGN {strcat(atrans, "="); } value {strcat(atrans, vtrans); } ; value : T_INTEGER {strcpy(vtrans, yytext); } | T_REAL {strcpy(vtrans, yytext); } | T_STRING {strcpy(vtrans, yytext); } ; rel_expr : compare {strcpy(retrans, ctrans); } rel_op {strcat(retrans, rtrans); } compare {strcat(retrans, ctrans); } ; Land. Y. 51
Pascal to C Conversion CSE 4102 compare : T_IDENTIFIER {strcpy(ctrans, yytext); } | value {strcpy(ctrans, yytext); } ; rel_op : T_EQ {strcpy(rtrans, "=="); } | T_LT {strcpy(rtrans, "<"); } | T_LE {strcpy(rtrans, "<="); } | T_NE {strcpy(rtrans, "!="); } | T_GE {strcpy(rtrans, ">="); } | T_GT {strcpy(rtrans, ">"); } ; %% #include "lex. yy. c" yyerror(){} main() { yyparse(); } Land. Y. 52
What would Pascal to C Generate? CSE 4102 /* SAMPLE INPUT. . . */ procedure MAIN is X, Y: INTEGER; A, B, C: FLOAT; D, E: CHARACTER; begin if (X = Y) and (Z /= W) then Z: = X; if (A <= B) then A : = B; end if; X : = X + 1; else Y: =Y+1; end if; A : =B +C * D; A : =B * C / D; end MAIN; Land. Y. 53
What would Pascal to C Generate? CSE 4102 /* AND OUTPUT */ TYPE BEING CONVERTED TO: assign_stmt*** Z = X ; assign stmt*** A = B ; if stmt*** if ( A <= B { A = B ; } assign stmt*** X = X + 1 assign_stmt*** Y = Y + 1 if stmt*** if ( X == Y && Z != W { Z =- X ; if ( A <= B { A = B ; } X = X + 1 ; } else { Y = Y + 1; } assign_stmt*** A = B + C assign_stmt*** A = B * C int float char ; ; * D ; / D ; Land. Y. 54
Redefine Parsing Stack CSE 4102 %{ #include <stdio. h> #include <ctype. h> Typedef char *stype; #define YYSTYPE stype; char strans[100], atrans[100], itrans[100], etrans[100], vtrans[100], retrans[100], ctrans[100], rtrans[100]; %}. . . Etc. . . %% statement : if_then opt_else {strcat(itrans, etrans); $$ = itrans; printf("%sn", $$); } | assign_stmt {$$ = atrans; printf("%sn", $$); } ; Land. Y. 55
Utilizing Unions to Redefine Parsing Stack CSE 4102 m m Unions Define Ability of Data Structure to be of Multiple Types (one or other attribute active) Consider the C Union Definition: union EITHEROR /* Union Type Name */ { char trans[100]; int XYZ; } EOR; /* Variable Name */ EOR. trans is a string (use strcpy, strcat, etc. ) EOR. XYZ is an int (use assignment, boolean expr, etc. ) Only trans or XYZ has a value but NOT both! Land. Y. 56
Utilizing Unions to Redefine Parsing Stack CSE 4102 %{ #include <stdio. h> #include <ctype. h> %} %start statement %union { char trans[100]; int XYZ; } Union Definition Defines what can be sent through $$, $1, $2, etc. %token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GT %type <trans> statement if_then opt_else %type <trans> assign_stmt value compare %type <trans> rel_op variable rel_expr Redefines nonterminals of type <trans> to allow them to be that part of the union /* ALSO, types and tokens for XYZ are possible */ %% Land. Y. 57
Utilizing Unions to Redefine Parsing Stack CSE 4102 m What Does the Parsing Stack now Contain IN LEX: char yytext[YYLMAX]; int yylength; THIS EFFECTIVELY REPLACES YYSTYPE %union { yyvs char trans[100]; int XYZ; S -> A B C } $$ $1 $2 $3 $$. trans $1. XYX $2. trans Etc. $3 $2 $1 Land. Y. 58
Unions for Pascal to C Conversion CSE 4102 statement : if_then opt_else {strcpy($$, $1); strcat($$, $2); printf("%sn", $$); } | assign_stmt {strcpy($$, $1); printf("%sn", $$); } ; if_then : T_IF rel_expr {strcpy($$, strcat($$, ; T_THEN assign_stmt " if "); $2); $4); } opt_else : /* the empty case */ {strcpy($$, ""); } | T_ELSE assign_stmt {strcpy($$, " else "); strcat($$, $2); } ; Land. Y. 59
Unions for Pascal to C Conversion CSE 4102 assign_stmt : variable T_ASSIGN value {strcpy($$, $1); strcat($$, " = "); strcat($$, $3); } ; value : T_INTEGER {strcpy($$, yytext); } | T_REAL {strcpy($$, yytext); } | T_STRING {strcpy($$, yytext); } ; rel_expr : compare rel_op compare {strcpy($$, $1); strcat($$, $2); strcat($$, $3); } ; compare : T_IDENTIFIER {strcpy($$, yytext); } | value {strcpy($$, yytext); } Land. Y. 60
Unions for Pascal to C Conversion CSE 4102 variable : T_IDENTIFIER {strcpy($$, yytext); } ; rel_op : T_EQ {strcpy($$, | T_LT {strcpy($$, | T_LE {strcpy($$, | T_NE {strcpy($$, | T_GT {strcpy($$, ; %% #include "lex. yy. c" yyerror(){} yywrap(){} main() { yyparse(); } " == "); } " <= "); } " != "); } " > "); } Land. Y. 61
Also Possible to Redefine Tokens CSE 4102 %{ #include <stdio. h> #include <ctype. h> %} %start statement %union { char trans[100]; int XYZ; } %token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GT %type <trans> T_IDENTIFIER T_ASSIGN etc. . . statement if_then opt_else assign_stmt value compare rel_op variable rel_expr /* ALSO, types and tokens for XYZ are possible */ %% Land. Y. 62
Also Possible to Redefine Tokens CSE 4102 assign_stmt : T_IDENTIFIER T_ASSIGN value {strcpy($$, $1); strcat($$, " = "); strcat($$, $3); } ; value : T_INTEGER {strcpy($$, yytext); } | T_REAL {strcpy($$, yytext); } | T_STRING {strcpy($$, yytext); } ; Land. Y. 63
Using Structures in %union CSE 4102 #define BUF_SIZE struct { 512 symtabtest int char a, b; c[BUF_SIZE]; d[BUF_SIZE]; }; %} %start latexstatement %union { struct int } symtabtest val; st; %token ETC. . . %type <st> entrylist entry DBLBS listblock anitem %type <st> textoption wsorword WORD WS ITEM %% ETC. . . Land. Y. 64
Using Structures in %union CSE 4102 mainoption : textoption | | ; : | wsorword ; : | textoption { fprintf(fp, "%d %d %s %sn", $1. a, $1. b, $1. c, $1. d); } commentoption latexoptions textoption wsorword { $$. a = 5; } wsorword { $$. b = 10; } WS { strcpy($$. c, yytext); } WORD { strcpy($$. d, yytext); } ; Land. Y. 65
Additional Lex/Yacc Examples CSE 4102 m m m Consider Ada 9 X (originally Ada 95 and now Ada 2005) is a Package Based, OO Programming Language Builds Upon the Original Ada Language q Extension of Pascal q Developed as a Language for Do. D Named After Ada Lovelace (1815 -1852) q Worked on Charles Babbage’s Early Mechanical Gerneral Purpose Computer/Analytical Engine q The world’s “First Programmer” q Wrote the world’s “First Computer Program on Bernoulli Numbers … Land. Y. 66
Ada 9 X Lex CSE 4102 %{ /******* A "lex"-style lexer for Ada 9 X **************/ /* Copyright (C) Intermetrics, Inc. 1994 Cambridge, MA USA */ /* Copying permitted if accompanied by this statement. */ /* Derivative works are permitted if accompanied by this statement. */ /* This lexer is known to be only approximately correct, but it is */ /* more than adequate for most uses (the lexing of apostrophe is */ /* not as sophisticated as it needs to be "perfect"). */ /* As usual there is *no warranty* but we hope it is useful. */ /**********************************/ int error_count; %} DIGIT EXTENDED_DIGIT INTEGER EXPONENT DECIMAL_LITERAL BASED_INTEGER BASED_LITERAL [0 -9] [0 -9 a-z. A-Z] ({DIGIT}(_? {DIGIT})*) ([e. E](+? |-){INTEGER}) {INTEGER}(. ? {INTEGER})? {EXPONENT}? {INTEGER} {EXTENDED_DIGIT}(_? {EXTENDED_DIGIT})* {BASE}#{BASED_INTEGER}(. {BASED_INTEGER})? #{EXPONENT}? Land. Y. 67
Ada 9 X Lex CSE 4102 %% ". " "<" "(" "+" "|" "&" "*" ")" "; " "-" "/" ", " ">" ": " "=" "'" ". . " "<<" "<>" "<=" "**" "/=" ">>" ">=" ": =" "=>" return('. '); return('<'); return('('); return('+'); return('|'); return('&'); return('*'); return(')'); return('; '); return('-'); return('/'); return(', '); return('>'); return(': '); return('='); return(TIC); return(DOT_DOT); return(LT_LT); return(BOX); return(LT_EQ); return(EXPON); return(NE); return(GT_GT); return(GE); return(IS_ASSIGNED); return(RIGHT_SHAFT); Land. Y. 68
Ada 9 X Lex [a-z. A-Z](_? [a-z. A-Z 0 -9])* { return(lk_keyword(yytext)); CSE 4102 "'" "(""|[^n"])*" {DECIMAL_LITERAL} {BASED_LITERAL} --. *n [ tnf]. } return(char_lit); return(char_string); return(numeric_lit); ; ; {fprintf(stderr, " Illegal character: %c: on line %dn", *yytext, yylineno); error_count++; } %% /* * Keywords stored in alpha order */ typedef struct { char * kw; int kwv; } KEY_TABLE; /* Reserved keyword list and Token values * as defined in y. tab. h */ # define NUM_KEYWORDS 69 Land. Y. 69
Ada 9 X Lex KEY_TABLE key_tab[NUM_KEYWORDS] = { CSE {"ABSTRACT", ABSTRACT}, {"ACCEPT", ACCEPT}, {"ACCESS", ACCESS}, 4102 {"ALIASED", ALIASED}, {"ALL", ALL}, {"AND", AND}, {"ARRAY", ARRAY}, {"AT", AT}, {"BEGIN", BEGi. N}, {"BODY", BODY}, {"CASE", CASE}, {"CONSTANT", CONSTANT}, {"DECLARE", DECLARE}, {"DELAY", DELAY}, {"DELTA", DELTA}, {"DIGITS", DIGITS}, {"DO", DO}, {"ELSE", ELSE}, {"ELSIF", ELSIF}, {"END", END}, {"ENTRY", ENTRY}, {"EXCEPTION", EXCEPTION}, {"EXIT", EXIT}, {"FOR", FOR}, {"FUNCTION", FUNCTION}, {"GENERIC", GENERIC}, {"GOTO", GOTO}, {"IF", IF}, {"IN", IN}, {"IS", IS}, {"LIMITED", LIMITED}, {"LOOP", LOOP}, {"MOD", MOD}, {"NEW", NEW}, {"NOT", NOT}, {"NULL", Nu. LL}, {"OF", OF}, {"OR", OR}, {"OTHERS", OTHERS}, {"OUT", OUT}, {"PACKAGE", PACKAGE}, {"PRAGMA", PRAGMA}, {"PRIVATE", PRIVATE}, {"PROCEDURE", PROCEDURE}, {"PROTECTED", PROTECTED}, {"RAISE", RAISE}, {"RANGE", RANGE}, {"RECORD", RECORD}, {"REM", REM}, {"RENAMES", RENAMES}, {"REQUEUE", REQUEUE}, {"RETURN", RETURN}, {"REVERSE", REVERSE}, {"SELECT", SELECT}, {"SEPARATE", SEPARATE}, {"SUBTYPE", SUBTYPE}, {"TAGGED", TAGGED}, {"TASK", TASK}, {"TERMINATE", TERMINATE}, {"THEN", THEN}, {"TYPE", TYPE}, {"UNTIL", UNTIL}, {"USE", USE}, {"WHEN", WHEN}, {"WHILE", WHILE}, {"WITH", WITH}, {"XOR", XOR} }; Land. Y. 70
Ada 9 X Lex to_upper(str) CSE char *str; 4102 { char * cp; for (cp=str; *cp; cp++) { if (islower(*cp)) *cp -= ('a' - 'A') ; } } lk_keyword(str) char *str; { int min; int max; int guess, compare; min = 0; max = NUM_KEYWORDS-1; guess = (min + max) / 2; to_upper(str); for (guess=(min+max)/2; min<=max; guess=(min+max)/2) { if ((compare = strcmp(key_tab[guess]. kw, str)) < 0) { min = guess + 1; } else if (compare > 0) { max = guess - 1; } else {return key_tab[guess]. kwv; } } return identifier; } Land. Y. 71
Ada 9 X Lex yyerror(s) char *s; CSE 4102 { extern int yychar; error_count++; fprintf(stderr, " %s", s); if (yylineno) fprintf(stderr, ", on line %d, ", yylineno); fprintf(stderr, " on input: "); if (yychar >= 0400) { if ((yychar >= ABORT) && (yychar <= XOR)) { fprintf(stderr, "(token) %s #%dn", key_tab[yychar-ABORT]. kw, yychar); } else switch (yychar) { case char_lit : fprintf(stderr, "character literaln"); break; case identifier : fprintf(stderr, "identifiern"); break; case char_string : fprintf(stderr, "stringn"); break; case numeric_lit : fprintf(stderr, "numeric literaln"); break; case TIC : fprintf(stderr, "single-quoten"); break; case DOT_DOT : fprintf(stderr, ". . n"); Land. Y. 72 break;
Ada 9 X Lex CSE 4102 case LT_LT : fprintf(stderr, "<<n"); break; case BOX : fprintf(stderr, "<>n"); break; case LT_EQ : fprintf(stderr, "<=n"); break; case EXPON : fprintf(stderr, "**n"); break; case NE : fprintf(stderr, "/=n"); break; case GT_GT : fprintf(stderr, ">>n"); break; case GE : fprintf(stderr, ">=n"); break; case IS_ASSIGNED : fprintf(stderr, ": =n"); break; case RIGHT_SHAFT : fprintf(stderr, "=>n"); break; default : fprintf(stderr, "(token) %dn", yychar); } } else {switch (yychar) { case 't': fprintf(stderr, "horizontal-tabn"); return; case 'n': fprintf(stderr, "newlinen"); return; case '