Writing a parser with YACC Yet Another Compiler

  • Slides: 12
Download presentation
– Writing a parser with YACC (Yet Another Compiler). • Automatically generate a parser

– Writing a parser with YACC (Yet Another Compiler). • Automatically generate a parser for a context free grammar (LALR parser) – Allows syntax direct translation by writing grammar productions and semantic actions – LALR(1) is more powerful than LL(1). • Work with lex. YACC calls yylex to get the next token. – YACC and lex must agree on the values for each token. • Like lex, YACC pre-dated c++, need workaround for some constructs when using c++ (will give an example).

– Writing a parser with YACC (Yet Another Compiler). • YACC file format: declarations

– Writing a parser with YACC (Yet Another Compiler). • YACC file format: declarations /* specify tokens, and non-terminals */ %% translation rules /* specify grammar here */ %% supporting C-routines • Command “yaccfile” produces y. tab. c, which contains a routine yyparse(). – yyparse() calls yylex() to get tokens. • yyparse() returns 0 if the program is grammatically correct, non-zero otherwise

 • The declarations part specifies tokens, non-terminals symbols, other C/C++ constructs. – To

• The declarations part specifies tokens, non-terminals symbols, other C/C++ constructs. – To specify token AAA BBB • %token AAA BBB – To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token • %token EOFnumber 0 • %token SEMInumber 101 – Non-terminals do not need to be declared unless you want to associated it with a type to store attributes (will be discussed later).

 • Translations rules specify the grammar productions exp : exp PLUSnumber exp |

• Translations rules specify the grammar productions exp : exp PLUSnumber exp | exp MINUSnumber exp | exp TIMESnumber exp | exp DIVIDEnumber exp | LPARENnumber exp RPARENnumber | ICONSTnumber ; exp : exp PLUSnumber exp ; exp : exp MINUSnumber exp ;

 • Yacc environment – Yacc processes a yacc specification file and produces a

• Yacc environment – Yacc processes a yacc specification file and produces a y. tab. c file. – An integer function yyparse() is produced by Yacc. • Calls yylex() to get tokens. • Return non-zero when an error is found. • Return 0 if the program is accepted. – Need main() and yyerror() functions. – Example: yyerror(const char *str) { printf("yyerror: %s at line %dn", str, yyline); } main() { if (!yyparse()) {printf("acceptn"); } else printf("rejectn"); }

 • Hooking yacc and lex together, see example 0. y and lexer. l

• Hooking yacc and lex together, see example 0. y and lexer. l • Matching the tokens – In lex: #define INTEGERCONST 2 #define PLUSNUM 4 – In yacc: %token INTEGERCONST 2 %token PLUSNUM 4 All tokens used in the yacc grammar need to be specified. Some tokens recognized by lex may not be in the yacc grammar token. See lexer. l Non-terminals do not need to be specified. • lex. yy. c and y. tab. c may be compiled separately, or yacc file may just include lex. yy. c as in example 0. y • Global variables such as yyline, yycolumn, and yylval can be used in yacc routines.

– YACC automatically builds a parser for the grammar (LALR parser). • May have

– YACC automatically builds a parser for the grammar (LALR parser). • May have shift/reduce and reduce/reduce conflicts when the grammar is not LALR – In this case, you will need to modify grammar to make it LALR in order for yacc to work properly. • YACC tries to resolve conflicts automatically – Default conflict resolution: » shift/reduce --> shift » reduce/reduce --> first production in the state – Not very informative, not clear if such action is what you wanted. • ‘yacc -v *. y’ will generate a report in file ‘y. output’. • See example 1. y

– Resolving conflicts • modify the grammar. See example 1. y example 0. y

– Resolving conflicts • modify the grammar. See example 1. y example 0. y • Use precedence and associativity of operators. – Using keywords %left, %right, %nonassoc in the declarations section. » All tokens on the same line are the same precedence level and associativity. » The lines are listed in order of increasing precedence. %left PLUSnumber, MINUSnumber %left TIMESnumber, DIVIDEnumber – See example 3. y

 • Attribute grammar with yacc – Each symbol can be associated with some

• Attribute grammar with yacc – Each symbol can be associated with some attributes. • Data structure of the attributes can be specified in the union in the declarations. (see example 4. y). %union { int semantic_value; } %token <semantic_value> INTEGERCONST 2 %type <semantic_value> exp %type <semantic_value> term %type <semantic_value> item • Semantic actions associate with productions can be specified. • The union is used to define yylval (don’t need to redeclare again, but you can directly using yylval. semantic_value in the lex code).

 • Semantic actions – Semantic actions associate with productions can be specified. item

• Semantic actions – Semantic actions associate with productions can be specified. item : LPARENnumber exp RPARENnumber {$$ = $2; } | ICONSTnumber {$$ = $1; } ; • $$ is the attribute associated with the left handside of the production • $1 is the attribute associated with the first symbol in the right handside, $2 for the second symbol, … – An action can be in anywhere in the production, it is also counted as a symbol.

 • Semantic actions – Semantic actions can be in anywhere in the production,

• Semantic actions – Semantic actions can be in anywhere in the production, an action is also counted as a symbol. item : LPARENnumber {cout << “debug”; } exp RPARENnumber {$$ = $3; } | ICONSTnumber {$$ = $1; } ;

Multiple attributes and C/C++ issues • Multiple attributes can be associated with a symbol

Multiple attributes and C/C++ issues • Multiple attributes can be associated with a symbol by declaring a structure in the union. See cal_trans_c. y (in yacc 1_cop 4020). – Unfortunately C++ does not like union with a structure or a class. – A workaround example is given in cal_trans_cpp. y.