Lab 3 Using MLYacc Zhong Zhuang dyzzmail ustc
Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail. ustc. edu. cn
How to write a parser? Write a parser by hand Use a parser generator May not be as efficient as hand-written parser General and robust How it works? stream of tokens Parser Specification parser generator Parser abstract syntax
ML-Yacc specification Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax
ML-Yacc Definitions specify type of positions %pos int * int specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS. . . %nonterm prog | exp | op specify end-of-parse token %eop EOF specify start symbol (by default, non terminal in LHS of first rule) %start prog
A Simple ML-Yacc File grammar symbols %% %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base grammar rules %pos int %start exp %eop EOF semantic actions (currently do nothing) %% exp : fact | fact PLUS exp () () fact : base | base MUL factor () () base : NUM | LPAR exp RPAR () ()
each nonterminal may have a semantic value associated with it when the parser reduces with (X : : = s) a semantic action will be executed uses semantic values from symbols in s when parsing is completed successfully parser returns semantic value associated with the start symbol usually a syntax tree
to use semantic values during parsing, we must declare symbol types: %terminal NUM of int | PLUS | MUL |. . . %nonterminal exp of int | fact of int | base of int type of semantic action must match type declared for the nonterminal in rule
A Simple ML-Yacc File with Action grammar symbols with type declarations grammar rules with semantic actions %% %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF computing integer result via semantic actions %% exp : fact | fact PLUS exp (fact) (fact + exp) fact : base | base MUL base (base) (base 1 * base 2) base : NUM | LPAR exp RPAR (NUM) (exp)
Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer NUM PLUS NUM MUL NUM To be read State of Parser E+E
Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer If we shift NUM PLUS NUM MUL NUM Shift E+E* To be read Shift E+E*E Reduce E+E Reduce E State of Parser E+E Result is : E+(E*E)
Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer If we reduce NUM PLUS NUM MUL NUM To be read State of Parser E+E Result is: (E+E)*E Reduce E Shift E*E Reduce E
This is a shift-reduce conflict We want E+E*E, because “*” has higher precedence If we shift than “+” E+E+ Another shift-reduce conflict Shift Tokens from lexer NUM PLUS NUM State of Parser To be read E+E Result is : E+(E+E) and (E+E)+E Shift E+E+E Reduce E If we reduce Reduce E Shift E+E Reduce E
Deal with shift-reduce conflicts This case, we need to reduce, because “+” is left associative Deal with it! let ML-Yacc complain. rewrite the grammar to eliminate ambiguity default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant can be complicated and less clear use Yacc precedence directives %left, %right %nonassoc
Precedence and Associativity precedence of terminal based on order in which associativity is specified precedence of rule is the precedence of the rightmost terminal eg: precedence of (E : : = E + E) == prec(+) a shift-reduce conflict is resolved as follows prec(terminal) > prec(rule) ==> shift prec(terminal) < prec(rule) ==> reduce prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error
datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp %% %left PLUS MINUS %left MUL DIV Higher precedence %% exp : NUM | exp PLUS exp | exp MINUS exp | exp MUL exp | exp DIV exp | LPAR exp RPAR (Int NUM) (Add (exp 1, exp 2)) (Sub (exp 1, exp 2)) (Mul (exp 1, exp 2)) (Div (exp 1, exp 2)) (exp)
Reduce-reduce Conflict This kind of conflict is more difficult to deal with Example sequence: : = | maybeword | sequence word maybeword: : = | word When we get a “word” from lexer, word -> maybeword -> sequence (rule 1) empty –> sequence word -> sequence (rule 2) We have more than one way to get “sequence” from input “word”
Reduce-reduce Conflict Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. ML-Yacc reduce by first rule Generally, reduce-reduce conflict is not allowed in your ML-Yacc file We need to fix our grammar sequence: : = | sequence word
Summary of conflicts Shift-reduce conflict precedence and associativity Shift by default Reduce-reduce conflict reduce by first rule Not allowed!
Lab 3 Your job is to finish a parser for C language Input: A “. c” file Output: “Success!” if the “. c” file is correct File description c. lex c. grm main. sml call-main. sml sources. cm lab 3. mlb test. c
Using ML-Yacc Read the ML-Yacc Manual Run If your finish “c. grm” and “c. lex” In command-line: (use MLton’s) we will get “c. grm. sig”, “c. grm. sml”, “c. grm. desc”, “c. lex. sml” Then compile Lab 3 mlyacc c. grm mllex c. lex Start SML/NJ, Run CM. make “sources. cm”; or in command-line, mlton lab 3. mlb To run lab 3 In SML/NJ, Main. parse “test. c”; or in command-line, lab 3 test. c
“Debug” ML-Yacc File When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, mlyacc c. grm 2 shift/reduce conflicts open file “c. grm. desc”(This file is generated by mlyacc) The beginning of this file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog structs goto 2 structdec. reduce by rule 12 the rest are all the states rule 12 means the 12 th rule (from 0) in your ML-Yacc file goto 429 goto 1
Use ML-lex with ML-yacc Most of the work in “c. lex” this time can be copied from Lab 2 You can re-use Regular expressions and Lexical rules Difference with Lab 2 You have to define “token” in “c. grm” %term INT of int | EOF “%term” in “c. grm” will be automatically in “c. grm. sig” signature C_TOKENS = sig type ('a, 'b) token type svalue val EOF: 'a * 'a -> (svalue, 'a) token val INT: (int) * 'a -> (svalue, 'a) token end
Hints Read ML-Yacc Manual Read the language specification Test a lot!
- Slides: 23