Parsing Discrete Mathematics and Its Applications Baojian Hua
Parsing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc. edu. cn
Derivations n n A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string n Note that the final string (sentence) consists of only terminals
Question n n Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ? Or equivalently, is a given program p valid according to some language’s syntax (say C)?
Example: Context-Free Grammar S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // derivable? xum
Example: Context-Free Grammar S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // derivable? xum xuwz
Example: Context-Free Grammar S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // derivable? xum xuwz xwu
Example: Context-Free Grammar S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // derivable? xum xuwz xwu xuz
Lexical Analyzer n The lexical analyzer translates the source program into a stream of lexical tokens n Source program: n n Lexical token: n n stream of (ASCII or Unicode) characters compiler data structure that represents the occurrence of a terminal symbol Valid sentence consists of only allowable terminals
Example: Context-Free Grammar S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // all terminals T={x, y, u, v, t, w, z}
Example: Context-Free Grammar S : : = x A | y B // all terminals T={x, y, u, v, t, w, z} A : : = u C | v C B : : = t C : : = w | z // allowable strings T*
Predictive Parsing n n Parsing: recognizing a string and do something useful The most naïve approach to use when implementing a parser is to use recursive descent A form of top-down parsing Not as powerful as other methods, but easy enough to implement by hand
Predictive Parsing S : : = x A | y B A : : = u C | v C B : : = t C : : = w | z // Valid? xum xuwz xwu xuz
A Predictive Parser in C (Sketch) token. Ty token; void parse. S () { switch (token. kind) { case x: token = next. Token (); parse. A (); break; case y: token = next. Token (); parse. B (); break; default: error (…); } } // other functions are similar
Output: Abstract Syntax Tree xuz S x A u C z
A Predictive Parser Emitting AST in C (Sketch) token. Ty token; S parse. S () { switch (token. kind) { case x: token = next. Token (); a=parse. A (); return new. S 1 (x, a); case y: token = next. Token (); b=parse. B (); return new. S 2 (y, b); default: error (…); } } // other functions are similar
Predictive Parsing Difficulties S : : = x A | x B A : : = u C | v C B : : = t C : : = w | z // derivable? xuz
Or Even Worse 1 E : : = id 15*(3+4) E 2 | num By 4 => E * E 3 | E + E By 5 => E * (E + E) 4 | E * E By 2 => E * (E + 4) 5 | ( E ) By 2 => E * (3 + 4) By 2 => 15 * (3 + 4)
Or Even Worse 15*(3+4) E E E * (E + E) 15 * E E * (E + 4) 15 * (E + E) E * (3 + 4) 15 * (3 + E) 15 * (3 + 4) rightmost derivation leftmost derivation
Ambiguous grammars n A grammar is ambiguous if there is a sentence with >1 parse tree E E E 15 * E 3 15 * 3 + 4 E E + E E 4 15 * + E 3 E 15
Eliminating ambiguity n In programming language syntax, ambiguity often arises from missing operator precedence or associativity n n n * higher precedence than +? * and + are left associative? Can sometimes rewrite the grammar to disambiguate this n Beyond the scope of this course
Unambiguous Grammar E : : = id | num | E + E | E * E | ( E ) E : : = E + T | T T : : = T * F | F F : : = id | num | ( E ) Accepts the same language, but parses unambiguously
Limitations with Predictive Parsing n Rewriting grammar: n n n to resolve ambiguity Grammars/trees are ugly But…easy to write code by hand, and very good for error reporting
Doing better n n We can do better We can use a parsing algorithm that can handle all context-free languages n n (though not all context-free grammars) Remember: a context-free language might have many different context-free grammars
The Yacc Tool semantic analyzer specification parser Yacc Originally developed for C, and now almost every main-stream language has its own Yacc-tool: bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …
Whole Structure source code lexical analyzer tokens parser abstract syntax tree other part Pentium
- Slides: 25