More yacc What is yacc Tool to produce

  • Slides: 15
Download presentation
More yacc

More yacc

What is yacc – Tool to produce a parser given a grammar – YACC

What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar – Input is a grammar (rules) and actions to take upon recognizing a rule – Output is a C program and optionally a header file of tokens

Works with lex • Lex is a scanner generator • Input is description of

Works with lex • Lex is a scanner generator • Input is description of patterns and actions • Output is a C program which contains a function yylex() which, when called, matches patterns and performs actions per input • Typically, the generated scanner performs lexical analysis and produces tokens for the (YACCgenerated) parser

Structure of a YACC File • Has the same three-part structure as Lex •

Structure of a YACC File • Has the same three-part structure as Lex • Each part is separated by a %% symbol • The three parts are even identical: – definition section – rules section – code section (copied directly into the generated program)

Definition Section • Declare tokens used in the grammar and types of values used

Definition Section • Declare tokens used in the grammar and types of values used on the stack here • Tokens that are single quoted characters like “=“ or “+” need not be declared. • Literal C code can be included in a block in this section using %{…%}

Declaring Tokens • The tokens that are used in the grammar must be declared

Declaring Tokens • The tokens that are used in the grammar must be declared • Include lines like the one below in the definition section: %token CHARSTRING INT IDENTIFIER %token LPAREN RPAREN

The Rules Section • The rules of the grammar are placed here. • Here

The Rules Section • The rules of the grammar are placed here. • Here is an example of the basic syntax: Expr INTEGER + INTEGER | INTEGER - INTEGER expr : INTEGER + INTEGER {action} | INTEGER – INTEGER {action} ; YACC grammar definition

YACC Actions • Simiar to Lex, actions can be defined that will be performed

YACC Actions • Simiar to Lex, actions can be defined that will be performed whenever a production is applied in the stream of tokens. • These are usually included after the production whose action is to be defined. • Since every symbol in the grammar has a corresponding value, it will be necessary to access those values. • Accessing the YACC stack will be the way to do this.

Accessing the Stack • Since YACC generates an LR parser, it will push the

Accessing the Stack • Since YACC generates an LR parser, it will push the symbols that it reads along with their values on a stack until it is ready to reduce. • To access these values, include a dollar sign with a number to get at each value in the production in the action definition.

Accessing the Stack Refers to the value of the left nonterminal expr : INTEGER

Accessing the Stack Refers to the value of the left nonterminal expr : INTEGER + INTEGER {$$ = $1 + $3} | INTEGER – INTEGER {$$ = $1 - $3} ;

Tokens and values come from lex YACC yyparse LEX yylex

Tokens and values come from lex YACC yyparse LEX yylex

Revisiting Lex • The Lex file will have to be modified to work with

Revisiting Lex • The Lex file will have to be modified to work with the YACC parser in two main places. • In the definition section, include this statement: #include “y. tab. h” • That is a header file automatically created by YACC when the parser is generated. • The actions for the rules need to be changed too.

Revisiting Lex Actions • For tokens with a value, assign that value to yylval.

Revisiting Lex Actions • For tokens with a value, assign that value to yylval. YACC can read the value from that variable. • Include a return statement for the token name (this is the same name that is defined at the top of the YACC file). if [1 -9][0 -9]* {return IF; } {yylval = atoi(yytext); return INTEGER; }

The %union Declaration • Different tokens have different data types. • INTEGER are integers,

The %union Declaration • Different tokens have different data types. • INTEGER are integers, FLOAT are floats, CHARACTERSTRING are char *, IDENTIFIER are pointers to the entry in the symbol table for that identifier. • The %union will allow the parser to apply the right data type to the right token.

The %union Declaration YACC Definition Section %union { int. Value; float. Value; } %token

The %union Declaration YACC Definition Section %union { int. Value; float. Value; } %token <int. Value> INTEGER %token <float. Value> FLOAT Lex Rules Section … {yylval. int. Value = atoi(yytext); return INTEGER; } … {yylval. float. Value = atof(yytext); return FLOAT; }