CPSC 325 Compiler Tutorial 3 Parser Parsing Input
- Slides: 16
CPSC 325 - Compiler Tutorial 3 Parser
Parsing Input l l The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given a grammar and a sentence, traverse the derivation (parse tree) for the sentence in some standard order and do something useful at each node.
program : : = statement | program statement : : = assign. Stmt | if. Stmt assign. Stmt : : = id = expr; if. Stmt : : = if ( expr ) statement expr : : = id | int | expr + expr id : : = a | b | c | i | … | z int : : = 0 | 1 | … | 9 Example program statement if. Stmt assign. Stmt id expr int a = 1 if ( a assign. Stmt expr id ; statement int + 1 expr id ) int b = 2 ;
Standard Order l When we write a parser, We want the it to be deterministic (no backtracking), and examine the source program from left to right. – (Parse the program in linear time in the order it appears in the source file)
Parsing l Top-down – – – l Start with the root Traverse the parse tree depth-first, left-to-right Left recursive is evil. (example of if-else) Bottom-up – Start at leaves and build up to the root
Something Useful l At each point (node) in the traversal, perform some semantic action: – – Construct nodes of full parse tree (rare) Construct abstract syntax tree (common) Construct linear, lower-level representation (more common) Generate target code on the fly (1 -pass compiler; not common in production compilers – can’t generate very good code in one pass)
Context-Free Grammars l Formally, a grammar G is a 4 -tuples <N, T, P, S> where – – N: a finite set of non-terminal symbols T: a finite set of terminal symbols P: A finite set of productions S: the start symbol, a distinguished element of N α A γ => α β γ iff A : : = β in P
Reduced Grammars • Grammar G is reduced iff there is no useless production in G.
Ambiguity l Grammar G is unambiguous iff every sentence in L(G) has a unique leftmost (or rightmost) derivation – l A grammar without this property is ambiguous – l Fact: unique leftmost or unique rightmost implies the other Note that other grammars that generate the same language may be unambigious We need unambiguous grammars for parsing
Example expr : : = expr + expr | expr – expr | expr * expr | expr / expr | int : : = 0 | 1 | 2 | … | 9 Exercise: Show that this is ambiguous How? Show two different leftmost or right most derivations for the same string Equivalently: show two different parse trees for the same string
Example (cont) l Give a leftmost derivation of 2+3*4 and show the parse tree. l Give two different derivations of 7+3+1
Problem? l l The grammar has no notion of precedence or associatively Solution: – – – Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence sub expressions first
Classic Expression Grammar l l l expr : : = expr + term | expr – term | term : : = term * factor | term / factor | factor : : = int | ( expr ) int : : = 0 | 1 | 2 | … | 9 Check 7 + 3 + 2 Check (5 + 3) * 2
Another Classic example l Grammar for conditional statements. . . stmt : : = if. Stmt | while. Stmt if. Stmt : : = if ( cond ) stmt | if ( cond ) stmt lese stmt …… Is this grammar ok?
Solving Ambiguity Fix the grammar to separate if statements with else clause and if statement with no elxse - add lots of non-terminals Use some ad-hoc rule in parser - “else matches closest unpaired if”
Parser tools l l l Most parser tools can cope with ambiguous grammars Be sure that what the tool does is really what you want. next week we will talk about Bison and more Parsers.