MLYACC David Walker COS 320 Outline Last Week

Outline • Last Week – Introduction to Lexing, CFGs, and Parsing • Today: –

The Front End stream of characters stream of tokens Lexer abstract syntax Parser Type

Parser Implementation • Implementation Options: 1. Write a Parser from scratch – not as

ML-Yacc specification • three parts: User Declarations: declare values available in the rule actions

ML-Yacc declarations (preliminaries) • specify type of positions %pos int * int • specify

Simple ML-Yacc Example grammar symbols %% %term NUM | PLUS | MUL | LPAR

attribute-grammars • ML-Yacc uses an attribute-grammar scheme – each nonterminal may have an associated

attribute-grammars • semantic actions typically build the abstract syntax for the internal language •

ML-Yacc with Semantic Actions grammar symbols with type declarations grammar rules with semantic actions

ML-Yacc with Semantic Actions datatype exp = Int of int | Add of exp

A simpler grammar datatype exp = Int of int | Add of exp *

Recall how LR parsing works: desired parse tree: exp : : = NUM |

The alternative parse exp : : = NUM | exp PLUS exp | exp

The alternative parse E exp : : = NUM | exp PLUS exp |

Summary desired parse tree: exp : : = NUM | exp PLUS exp |

Example 2 exp : : = NUM | exp PLUS exp | exp MUL

Example 2 E exp : : = NUM | exp PLUS exp | exp

Example 2: Summary E exp : : = NUM | exp PLUS exp |

precedence and associativity • three solutions to dealing with operator precedence and associativity: 1)

precedence and associativity • given directives, ML-Yacc assigns precedence to each terminal and rule

precedence and associativity datatype exp = Int of int | Add of exp *

precedence and associativity precedence directives: %left PLUS MINUS %left MUL DIV yet to read

one more example datatype exp = Int of int | Add of exp *

the fix datatype exp = Int of int | Add of exp * exp

the dangling else problem • Grammar: S : : = if E then S

default behavior of ML-Yacc • Shift-Reduce error – shift • Reduce-Reduce error – reduce

Note: To enter ML-Yacc hell, use a parser to catch type errors • when

Slides: 47

Download presentation

ML-YACC David Walker COS 320

Outline • Last Week – Introduction to Lexing, CFGs, and Parsing • Today: – More parsing: • automatic parser generation via ML-Yacc – Reading: Chapter 3 of Appel

The Front End stream of characters stream of tokens Lexer abstract syntax Parser Type Checker • Lexical Analysis: Create sequence of tokens from characters • Syntax Analysis: Create abstract syntax tree from sequence of tokens • Type Checking: Check program for wellformedness constraints

Parser Implementation • Implementation Options: 1. Write a Parser from scratch – not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2. Use a Parser Generator – Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification

ML-Yacc specification • three parts: User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc declarations (preliminaries) • specify type of positions %pos int * int • specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS. . . %nonterm prog | exp | op • specify end-of-parse token %eop EOF • specify start symbol (by default, non terminal in LHS of first rule) %start prog

attribute-grammars • ML-Yacc uses an attribute-grammar scheme – each nonterminal may have an associated semantic value associated with it – when the parser reduces the parsing stack using rule (X : : = s), a semantic action that uses the semantic values from s will be executed – when parsing is completed successfully, the parser returns the value associated with the start symbol

attribute-grammars • semantic actions typically build the abstract syntax for the internal language • to use semantic values during parsing, we must declare symbol types: – %terminal NUM of int | PLUS | MUL |. . . – %nonterminal exp of int | fact of int | base of int • type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %%. . . %% exp : fact | fact PLUS exp (fact) (Add (fact, exp)) fact : base | base MUL exp (base) (Mul (base, exp)) base : NUM | LPAR exp RPAR (Int NUM) (exp) computing abstract syntax via semantic actions

A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %%. . . %% exp : NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR this grammar is ambiguous! (Int NUM) (Add (exp 1, exp 2)) (Mul (exp 1, exp 2)) (exp) E E NUM + NUM * NUM + E E * NUM E E E NUM E + NUM * E E NUM

a simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp But it is so clean that it would be nice to use. Moreover, we know which parse tree we want. We just need a mechanism to specify it! %%. . . %% exp : NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR (Int NUM) (Add (exp 1, exp 2)) (Mul (exp 1, exp 2)) (exp) E E NUM + NUM * NUM + E E * NUM E E E NUM E + NUM * E E NUM

Recall how LR parsing works: desired parse tree: exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E yet to read Input from lexer: NUM + NUM * NUM State of parse so far: NUM + E E * NUM E+E We have a shift-reduce conflict. What should we do to get the right parse? elements of desired parsed so far

Recall how LR parsing works: desired parse tree: exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E yet to read Input from lexer: NUM + NUM * NUM State of parse so far: NUM + E E * NUM E+E* We have a shift-reduce conflict. What should we do to get the right parse? SHIFT elements of desired parsed so far

Recall how LR parsing works: desired parse tree: exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E yet to read Input from lexer: NUM + NUM * NUM State of parse so far: NUM + E E * NUM E elements of desired parsed so far REDUCE

The alternative parse exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E + yet to read NUM E NUM Input from lexer: NUM + NUM * NUM State of parse so far: E+E We have a shift-reduce conflict. Suppose we REDUCE next elements parsed so far

The alternative parse exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E + yet to read NUM E NUM Input from lexer: NUM + NUM * NUM State of parse so far: REDUCE E elements parsed so far

The alternative parse exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E + yet to read NUM * E NUM Input from lexer: NUM + NUM * NUM State of parse so far: E*E Now: SHIFT REDUCE E elements parsed so far NUM

The alternative parse E exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E + yet to read NUM * E NUM Input from lexer: NUM + NUM * NUM State of parse so far: REDUCE E E elements parsed so far NUM

Summary desired parse tree: exp : : = NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR E E yet to read Input from lexer: NUM + NUM * NUM State of parse so far: NUM + E E * NUM E+E We have a shift-reduce conflict. We have E + E on stack, we see *. We want to shift. We ALWAYS want to shift since * has higher precedence than +. elements of desired parsed so far

Example 2 exp : : = NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR E yet to read NUM E NUM Input from lexer: NUM - NUM State of parse so far: E-E We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? elements parsed so far

Example 2 exp : : = NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR E E yet to read NUM E NUM Input from lexer: NUM - NUM State of parse so far: E We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE elements parsed so far

Example 2 exp : : = NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR E E yet to read NUM E NUM Input from lexer: NUM - NUM State of parse so far: E-E SHIFT REDUCE E elements parsed so far NUM

Example 2 E exp : : = NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR E E yet to read NUM E NUM Input from lexer: NUM - NUM State of parse so far: REDUCE E E elements parsed so far NUM

Example 2: Summary E exp : : = NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR E E NUM yet to read E NUM Input from lexer: NUM - NUM State of parse so far: E elements parsed so far We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE. We ALWAYS want to reduce since – is left-associative. E NUM

precedence and associativity • three solutions to dealing with operator precedence and associativity: 1) let Yacc complain. • its default choice is to shift when it encounters a shift-reduce error • programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant 2) rewrite the grammar to eliminate ambiguity • can be complicated and less clear 3) use Yacc precedence directives • %left, %right %nonassoc

precedence and associativity • given directives, ML-Yacc assigns precedence to each terminal and rule – precedence of terminal based on order in which associativity is specified – precedence of rule is the precedence of the right-most terminal • eg: precedence of (E : : = E + E) ==> prec(+) • a shift-reduced conflict is resolved as follows – prec(terminal) > prec(rule) ==> shift – prec(terminal) < prec(rule) ==> reduce – prec(terminal) = prec(rule) ==> • assoc(terminal) = left ==> reduce • assoc(terminal) = right ==> shift • assoc(terminal) = nonassoc ==> report as error yet to read input: terminal T next: . . T E RHS of rule on stack: . . . . E % E

precedence and associativity precedence directives: %left PLUS MINUS %left MUL DIV yet to read input: terminal T next: . . MUL E RHS of rule on stack: . . . E PLUS E prec(MUL) > prec(PLUS)

precedence and associativity precedence directives: %left PLUS MINUS %left MUL DIV yet to read input: terminal T next: . . MUL E RHS of rule on stack: . . . E PLUS E SHIFT prec(MUL) > prec(PLUS)

precedence and associativity precedence directives: %left PLUS MINUS %left MUL DIV yet to read input: terminal T next: . . SUB E RHS of rule on stack: . . . E PLUS E prec(PLUS) = prec(SUB)

precedence and associativity precedence directives: %left PLUS MINUS %left MUL DIV yet to read input: terminal T next: . . SUB E RHS of rule on stack: . . . E PLUS E REDUCE prec(PLUS) = prec(SUB)

the dangling else problem • Grammar: S : : = if E then S else S S : : = if E then S S : : =. . . • Consider: if a then if b then S else S – parse 1: if a then (if b then S else S) – parse 2: if a then (if b then S) else S • Parser reports shift-reduce error – in default behavior: shift (what we want)

the dangling else problem • Grammar: S : : = if E then S else S S : : = if E then S S : : =. . . • Alternative solution is to rewrite grammar: S : : = M S : : = U M : : = if E then M else M M : : =. . . U : : = if E then S U : : = if E then M else U

default behavior of ML-Yacc • Shift-Reduce error – shift • Reduce-Reduce error – reduce by first rule – generally considered unacceptable • for assignment 3, your job is to write a grammar for Fun such that there are no conflicts – you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors • when doing assignment 3, your job is to catch parse errors • there are lots of programming errors that will slip by the parser: – eg: 3 + true – catching these sorts of errors is the job of the type checker – just as catching program structure errors was the job of the parser, not the lexer – attempting to do type checking in the parser is impossible (in general) • why? Hint: what does “context-free grammar” imply?