Introduction to Parsing Lecture 8 Adapted from slides
- Slides: 55
Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 1
Outline • • • Limitations of regular languages Parser overview Context-free grammars (CFG’s) Derivations Syntax-Directed Translation 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 2
Languages and Automata • Formal languages are very important in CS – Especially in programming languages • Regular languages – The weakest formal languages widely used – Many applications • We will also study context-free languages 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 3
Limitations of Regular Languages • Intuition: A finite automaton that runs long enough must repeat states • Finite automaton can’t remember # of times it has visited a particular state • Finite automaton has finite memory – Only enough to store in which state it is – Cannot count, except up to a finite limit • E. g. , language of balanced parentheses is not regular: { (i )i | i 0} 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 4
The Structure of a Compiler Source Lexical analysis Today we start Optimization 2/8/2008 Tokens Parsing Interm. Language Prof. Hilfinger CS 164 Lecture 8 Code Gen. Machine Code 5
The Functionality of the Parser • Input: sequence of tokens from lexer • Output: abstract syntax tree of the program 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 6
Example • Pyth: if x == y: z =1 else: z = 2 • Parser input: IF ID == ID : ID = INT ELSE : ID = INT • Parser output (abstract syntax tree): IF-THEN-ELSE = == ID 2/8/2008 ID ID = INT Prof. Hilfinger CS 164 Lecture 8 ID INT 7
Why A Tree? • Each stage of the compiler has two purposes: – Detect and filter out some class of errors – Compute some new information or translate the representation of the program to make things easier for later stages • Recursive structure of tree suits recursive structure of language definition • With tree, later stages can easily find “the else clause”, e. g. , rather than having to scan through tokens to find it. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 8
Comparison with Lexical Analysis Phase Input Output Lexer Sequence of characters Sequence of tokens Parser Sequence of tokens Syntax tree 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 9
The Role of the Parser • Not all sequences of tokens are programs. . . • . . . Parser must distinguish between valid and invalid sequences of tokens • We need – A language for describing valid sequences of tokens – A method for distinguishing valid from invalid sequences of tokens 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 10
Programming Language Structure • Programming languages have recursive structure • Consider the language of arithmetic expressions with integers, +, *, and ( ) • An expression is either: – – an integer an expression followed by “+” followed by expression an expression followed by “*” followed by expression a ‘(‘ followed by an expression followed by ‘)’ • int , int + int , ( int + int) * int are expressions 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 11
Notation for Programming Languages • An alternative notation: E int E E+E E E*E E (E) • We can view these rules as rewrite rules – We start with E and replace occurrences of E with some right-hand side • E E*E (E)*E (E+E)*E … (int + int) * int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 12
Observation • All arithmetic expressions can be obtained by a sequence of replacements • Any sequence of replacements forms a valid arithmetic expression • This means that we cannot obtain ( int ) ) by any sequence of replacements. Why? • This set of rules is a context-free grammar 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 13
Context-Free Grammars • A CFG consists of – A set of non-terminals N • By convention, written with capital letter in these notes – A set of terminals T • By convention, either lower case names or punctuation – A start symbol S (a non-terminal) – A set of productions • Assuming E N E E Y 1 Y 2. . . Yn 2/8/2008 , or where Yi N T Prof. Hilfinger CS 164 Lecture 8 14
Examples of CFGs Simple arithmetic expressions: E int E E+E E E*E E (E) – One non-terminal: E – Several terminals: int, +, *, (, ) • Called terminals because they are never replaced – By convention the non-terminal for the first production is the start one 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 15
The Language of a CFG Read productions as replacement rules: X Y 1. . . Yn Means X can be replaced by Y 1. . . Yn X Means X can be erased (replaced with empty string) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 16
Key Idea 1. Begin with a string consisting of the start symbol “S” 2. Replace any non-terminal X in the string by a right-hand side of some production X Y 1 … Yn 3. Repeat (2) until there are only terminals in the string 1. The successive strings created in this way are called sentential forms. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 17
The Language of a CFG (Cont. ) More formally, may write X 1 … Xi-1 Xi Xi+1… Xn X 1 … Xi-1 Y 1 … Ym Xi+1 … Xn if there is a production X i Y 1 … Ym 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 18
The Language of a CFG (Cont. ) Write X 1 … X n * Y 1 … Ym if X 1 … X n … … Y 1 … Ym in 0 or more steps 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 19
The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: L(G) = { a 1 … an | S * a 1 … an and every ai is a terminal } 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 20
Examples: • S 0 also written as S 0 | 1 S 1 Generates the language { “ 0”, “ 1” } • What about S 1 A A 0|1 A • What about S | ( S ) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 21
Pyth Example A fragment of Pyth: Compound while Expr: Block | if Expr: Block Elses | else: Block | elif Expr: Block Elses Block Stmt_List | Suite (Formal language papers use one-character nonterminals, but we don’t have to!) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 22
Derivations and Parse Trees • A derivation is a sequence of sentential forms resulting from the application of a sequence of productions S … … • A derivation can be represented as a parse tree – Start symbol is the tree’s root – For a production X Y 1 … Yn add children Y 1, …, Yn to node X 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 23
Derivation Example • Grammar E E + E | E * E | (E) | int • String int * int + int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 24
Derivation Example (Cont. ) 2/8/2008 E E+E E*E+E int * E + E int * int + int Prof. Hilfinger CS 164 Lecture 8 25
Derivation in Detail (1) E E 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 26
Derivation in Detail (2) E E+E 2/8/2008 E E Prof. Hilfinger CS 164 Lecture 8 + E 27
Derivation in Detail (3) E E E+E E*E+E E E 2/8/2008 * Prof. Hilfinger CS 164 Lecture 8 + E E 28
Derivation in Detail (4) E E+E E*E+E int * E + E E * + E E int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 29
Derivation in Detail (5) E E+E E*E+E int * E + E int * int + E E * int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 + E E int 30
Derivation in Detail (6) E E+E E*E+E int * E + E int * int + int 2/8/2008 E E E * int Prof. Hilfinger CS 164 Lecture 8 + E E int 31
Notes on Derivations • A parse tree has – Terminals at the leaves – Non-terminals at the interior nodes • A left-right traversal of the leaves is the original input • The parse tree shows the association of operations, the input string does not ! – There may be multiple ways to match the input – Derivations (and parse trees) choose one 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 32
The Payoff: parser as a translator syntax-directed translation stream of tokens parser ASTs, or assembly code syntax + translation rules (typically hardcoded in the parser) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 33
Mechanism of syntax-directed translation • syntax-directed translation is done by extending the CFG – a translation rule is defined for each production given X d. ABc the translation of X is defined recursively using • translation of nonterminals A, B • values of attributes of terminals d, c • constants 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 34
To translate an input string: 1. 2. Build the parse tree. Working bottom-up • Use the translation rules to compute the translation of each nonterminal in the tree Result: the translation of the string is the translation of the parse tree's root nonterminal. Why bottom up? • a nonterminal's value may depend on the value of the symbols on the right-hand side, • so translate a non-terminal node only after children translations are available. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 35
Example 1: Arithmetic expression to value Syntax-directed translation rules: E E+T E 1. trans = E 2. trans + T. trans E T E. trans = T. trans T T*F T 1. trans = T 2. trans * F. trans T F T. trans = F. trans F int F. trans = int. value F (E) F. trans = E. trans 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 36
Example 1: Bison/Yacc Notation E: E+T { $$ = $1 + $3; } T: T*F { $$ = $1 * $3; } F : int { $$ = $1; } F : ‘(‘ E ‘) ‘ { $$ = $2; } • KEY: $$ : Semantic value of left-hand side $n : Semantic value of nth symbol on right-hand side 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 37
Example 1 (cont): Annotated Parse Tree E (18) Input: 2 * (4 + 5) T (18) T (2) F (9) * ( int (2) ) E (9) E (4) + T (5) T (4) F (4) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 int (4) F (5) int (5) 38
Example 2: Compute the type of an expression E -> E + E E -> E and E E -> E == E E -> true E -> false E -> int E -> ( E ) 2/8/2008 if $1 == INT and $3 == INT: $$ = INT else: $$ = ERROR if $1 == BOOL and $3 == BOOL: $$ = BOOL else: $$ = ERROR if $1 == $3 and $2 != ERROR: $$ = BOOL else: $$ = ERROR $$ = BOOL $$ = INT $$ = $2 Prof. Hilfinger CS 164 Lecture 8 39
Example 2 (cont) • Input: (2 + 2) == 4 E (BOOL) E (INT) == ( ) E (INT) + int (INT) E (INT) int (INT) 2/8/2008 E (INT) Prof. Hilfinger CS 164 Lecture 8 int (INT) 40
Building Abstract Syntax Trees • Examples so far, streams of tokens translated into – integer values, or – types • Translating into ASTs is not very different 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 41
AST vs. Parse Tree • AST is condensed form of a parse tree – – operators appear at internal nodes, not at leaves. "Chains" of single productions are collapsed. Lists are "flattened". Syntactic details are omitted • e. g. , parentheses, commas, semi-colons • AST is a better structure for later compiler stages – omits details having to do with the source language, – only contains information about the essential structure of the program. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 42
Example: 2 * (4 + 5) Parse tree vs. AST E * T T F int (2) F * ( E E + 2 ) + 4 T T F F int (5) 5 int (4) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 43
AST-building translation rules E E+T $$ = new Plus. Node($1, $3) E T $$ = $1 T T*F $$ = new Times. Node($1, $3) T F $$ = $1 F int $$ = new Int. Lit. Node($1) F (E) $$ = $2 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 44
Example: 2 * (4 + 5): Steps in Creating AST E * 2 T F int (2) + 5 4 F * ( E E + + 5 4 ) T T F F int (5) (Only some of the semantic values are shown) int (4) 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 45
Leftmost and Rightmost Derivations E E+E E*E+E int * E + E int * int + int Leftmost derivation: always act on leftmost non-terminal 2/8/2008 E E+E E + int E * int + int Rightmost derivation: always act on rightmost non-terminal Prof. Hilfinger CS 164 Lecture 8 46
rightmost Derivation in Detail (1) E 2/8/2008 E Prof. Hilfinger CS 164 Lecture 8 47
rightmost Derivation in Detail (2) E E+E E E 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 + E 48
rightmost Derivation in Detail (3) E E+E E + int E E + E int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 49
rightmost Derivation in Detail (4) E E+E E + int E * E + int E E E 2/8/2008 * Prof. Hilfinger CS 164 Lecture 8 + E E int 50
rightmost Derivation in Detail (5) E E+E E + int E * int + int E E E * + E E int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 51
rightmost Derivation in Detail (6) E E+E E + int E * int + int E E E * int 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 + E E int 52
Aside: Canonical Derivations • Take a look at that last derivation in reverse. • The active part (red) tends to move left to right. • We call this a reverse rightmost or canonical derivation. • Comes up in bottom-up parsing. We’ll return to it in a couple of lectures. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 53
Derivations and Parse Trees • For each parse tree there is exactly one leftmost and one rightmost derivation • The difference is the order in which branches are added, not the structure of the tree. 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 54
Summary of Derivations • We are not just interested in whether s L(G) • Also need derivation (or parse tree) and AST. • Parse trees slavishly reflect the grammar. • Abstract syntax trees abstract from the grammar, cutting out detail that interferes with later stages. • A derivation defines a parse tree – But one parse tree may have many derivations • Derivations drive translation (to ASTs, etc. ) • Leftmost and rightmost derivations most important in parser implementation 2/8/2008 Prof. Hilfinger CS 164 Lecture 8 55
- A small child slides down the four frictionless slides
- A spring loaded gun shoots a plastic ball
- Principles of economics powerpoint lecture slides
- Andrew ng machine learning slides
- Business communication lecture slides
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- This passage is adapted from jane austen
- Red blood cells are
- Adapted with permission from
- In what ways have the highland maya adapted to modern life?
- Xerophytic plants diagram
- Chaparral biome animals
- Mensajes subliminales camel
- Adapted animals in the rainforest
- Best brother quotes
- Adapted from the internet
- Gallant
- How have plants adapted to the rainforest
- Spermopsida as successful land plants
- The outsiders adapted for struggling readers
- Initiative synoynm
- How are giraffes long necks adapted to their lifestyle
- In panic mode recovery of ll(1) parsing ___________
- Semantic parsing
- Recursive descent parser
- Parsing methods
- Ll 1 parser
- Parsing syntax
- Panic mode error recovery in predictive parsing
- The lexical analysis for a modern computer
- Which of the following is top down parser?
- Steps of query processing
- Advantages of bottom up parsing
- Yang memeriksa sintaks dan memeriksa relasi adalah
- Parsing adalah
- Probabilistic parsing
- Yichao zhou
- Morphological parsing in nlp
- Visual studio regular expression
- Cfg adalah
- String parsing in c
- Parsing adalah
- Non recursive predictive parsing
- Mksks
- Top down parsing vs bottom up
- Parsing algorithms in nlp
- Tentang cfg
- Greenfoot reached end of file while parsing
- Advantages of bottom up parsing
- Predictive parsing
- Lr(0) parsing table
- Semantic parsing
- Predictive parsing
- Dfa
- Predictive parsing
- Parsing