Lab 3 Using MLYacc Zhong Zhuang dyzzmail ustc

Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail. ustc. edu. cn

How to write a parser? Write a parser by hand Use a parser generator May not be as efficient as hand-written parser General and robust How it works? stream of tokens Parser Specification parser generator Parser abstract syntax

ML-Yacc specification Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

each nonterminal may have a semantic value associated with it when the parser reduces with (X : : = s) a semantic action will be executed uses semantic values from symbols in s when parsing is completed successfully parser returns semantic value associated with the start symbol usually a syntax tree

Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer NUM PLUS NUM MUL NUM To be read State of Parser E+E

Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer If we shift NUM PLUS NUM MUL NUM Shift E+E* To be read Shift E+E*E Reduce E+E Reduce E State of Parser E+E Result is : E+(E*E)

Conflicts in ML-Yacc We often write ambiguous grammar exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR Example Tokens from lexer If we reduce NUM PLUS NUM MUL NUM To be read State of Parser E+E Result is: (E+E)*E Reduce E Shift E*E Reduce E

This is a shift-reduce conflict We want E+E*E, because “*” has higher precedence If we shift than “+” E+E+ Another shift-reduce conflict Shift Tokens from lexer NUM PLUS NUM State of Parser To be read E+E Result is : E+(E+E) and (E+E)+E Shift E+E+E Reduce E If we reduce Reduce E Shift E+E Reduce E

Deal with shift-reduce conflicts This case, we need to reduce, because “+” is left associative Deal with it! let ML-Yacc complain. rewrite the grammar to eliminate ambiguity default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant can be complicated and less clear use Yacc precedence directives %left, %right %nonassoc

Precedence and Associativity precedence of terminal based on order in which associativity is specified precedence of rule is the precedence of the rightmost terminal eg: precedence of (E : : = E + E) == prec(+) a shift-reduce conflict is resolved as follows prec(terminal) > prec(rule) ==> shift prec(terminal) < prec(rule) ==> reduce prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error

Reduce-reduce Conflict This kind of conflict is more difficult to deal with Example sequence: : = | maybeword | sequence word maybeword: : = | word When we get a “word” from lexer, word -> maybeword -> sequence (rule 1) empty –> sequence word -> sequence (rule 2) We have more than one way to get “sequence” from input “word”

Reduce-reduce Conflict Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. ML-Yacc reduce by first rule Generally, reduce-reduce conflict is not allowed in your ML-Yacc file We need to fix our grammar sequence: : = | sequence word

Summary of conflicts Shift-reduce conflict precedence and associativity Shift by default Reduce-reduce conflict reduce by first rule Not allowed!

Lab 3 Your job is to finish a parser for C language Input: A “. c” file Output: “Success!” if the “. c” file is correct File description c. lex c. grm main. sml call-main. sml sources. cm lab 3. mlb test. c

Using ML-Yacc Read the ML-Yacc Manual Run If your finish “c. grm” and “c. lex” In command-line: (use MLton’s) we will get “c. grm. sig”, “c. grm. sml”, “c. grm. desc”, “c. lex. sml” Then compile Lab 3 mlyacc c. grm mllex c. lex Start SML/NJ, Run CM. make “sources. cm”; or in command-line, mlton lab 3. mlb To run lab 3 In SML/NJ, Main. parse “test. c”; or in command-line, lab 3 test. c

“Debug” ML-Yacc File When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, mlyacc c. grm 2 shift/reduce conflicts open file “c. grm. desc”(This file is generated by mlyacc) The beginning of this file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog structs goto 2 structdec. reduce by rule 12 the rest are all the states rule 12 means the 12 th rule (from 0) in your ML-Yacc file goto 429 goto 1

Use ML-lex with ML-yacc Most of the work in “c. lex” this time can be copied from Lab 2 You can re-use Regular expressions and Lexical rules Difference with Lab 2 You have to define “token” in “c. grm” %term INT of int | EOF “%term” in “c. grm” will be automatically in “c. grm. sig” signature C_TOKENS = sig type ('a, 'b) token type svalue val EOF: 'a * 'a -> (svalue, 'a) token val INT: (int) * 'a -> (svalue, 'a) token end

Hints Read ML-Yacc Manual Read the language specification Test a lot!