Parsers A parser for a Grammar G must

  • Slides: 15
Download presentation
Parsers • A parser for a Grammar G must at least distinguish between those

Parsers • A parser for a Grammar G must at least distinguish between those sentences that are in LG and those that are not. • More often, while recognizing a sentence belonging to the language, the parser will build some alternate representation of the sentence. This alternate representation would usually be some form of parse tree or abstract syntax tree (AST). Lecture #9 PLP Spring 2004, UF CISE 1

Parsing Strategies • To insure that a sentence is in a language, the parser

Parsing Strategies • To insure that a sentence is in a language, the parser must somehow verify that the sentence can be derived from the start symbol. • Two fundamental strategies exist for doing this: – Start from the start symbol and try to find the sequence of derivations that generate the sentence (top-down) – Start from the sentence and try to find the sequence of reductions (the opposite of rewriting a nonterminal) that leads to the start symbol. Lecture #9 PLP Spring 2004, UF CISE 2

Recursive Descent Parsing • A time-honored way to build a parser is to inspect

Recursive Descent Parsing • A time-honored way to build a parser is to inspect the grammar and create a top-down parser by hand. Usually the method of recursive descent is employed. • In a recursive descent compiler, one procedure is written for each nonterminal symbol of the grammar. • If A is a nonterminal, then procedure A will be called whenever it is predicted that an A should appear at the current point in input. The job of procedure A is to read the input corresponding to an A and return. Lecture #9 PLP Spring 2004, UF CISE 3

Pseudocode Parser for our Simple Sentence Grammar void sentence(void) { noun. Phrase(); verb. Phrase();

Pseudocode Parser for our Simple Sentence Grammar void sentence(void) { noun. Phrase(); verb. Phrase(); } void noun. Phrase(void) { article(); noun(); } void article(void) { if (token == “a”) match(“a”) else if (token == “the”) match(“the”) else error(); } void match(Token. Type expected) { if (token == expected) get. Token(); else error(); } Lecture #9 • Parsing starts with variable token already set to the value returned by an initial call to get. Token. • Strings are used to represent tokens. This is not real C code! PLP Spring 2004, UF CISE 4

What if we try this on our Arithmetic Expression example void expr(void) { term();

What if we try this on our Arithmetic Expression example void expr(void) { term(); expr(); if (token while (token==“+” == “+” || token == “-”) { match(token); term(); expr(); } } Lecture #9 • But here we have a problem. This function will never return because of the left recursive rule expr → expr + term • We must remove the left recursion. • First try making the rule right recursive. • This changes the associativity to right associative. • Try using the EBNF rule expr → term { + term } PLP Spring 2004, UF CISE 5

The need for Left Factoring • Consider the case of if statements with or

The need for Left Factoring • Consider the case of if statements with or without a matching else: if-statement → if ( expression ) statement | if ( expression ) statement else statement • One can see that this rule has a left part― namely if ( expression ) statement―that can be factored out of the right hand side yielding an EBNF rule like this: if-statement → if ( expression ) statement [ else statement ] Lecture #9 PLP Spring 2004, UF CISE 6

Sample Grammar for Predictive Parsing • Let us use an example and some other

Sample Grammar for Predictive Parsing • Let us use an example and some other materials from Aho and Ullman’s Principles of Compiler Design. • Start with this grammar: E→E+T|T T→T*F|F F → ( E ) | id • Note that the left-recursions make this unsuitable for topdown parsing. • Remove immediate left-recursion yielding this grammar (don’t worry about associativity right now): E → T E’ E’ → + T E ’ | ε T → F T’ T’ → * F T ’ | ε F → ( E ) | id Lecture #9 PLP Spring 2004, UF CISE 7

Consider a Table driven parsing Algorithm a + b $ Stack X Y Z

Consider a Table driven parsing Algorithm a + b $ Stack X Y Z $ Input Program Output Parsing Table Lecture #9 PLP Spring 2004, UF CISE 8

What is the Parsing Algorithm? repeat begin let X be the top stack symbol

What is the Parsing Algorithm? repeat begin let X be the top stack symbol and a the next input symbol; if X is a terminal or $ then if X = a then pop X from the stack and remove a from the input else ERROR(); else if M[X, a] = X → Y 1 Y 2···Yk then begin pop X from the stack; push Yk, Yk-1, … Y 1 onto the stack, Y 1 on top end else ERROR(); end until X=$ Lecture #9 PLP Spring 2004, UF CISE 9

What does the Parse Table Look Like? id * E → T E’ E

What does the Parse Table Look Like? id * E → T E’ E F → id Lecture #9 $ E’ → ε T’→ε T → FT ’ T’→ε T’ ) E → TE’ T → FT ’ T ( E’ → +TE’ E’ F + T ‘ → *FT ’ F→(E) PLP Spring 2004, UF CISE 10

Sample Parse Stack $E $E’T’F $E’T’id $E’T’ $E’T+ $E’T’F $E’T’id $E’T’F* $E’T’F $E’T’id $E’T’

Sample Parse Stack $E $E’T’F $E’T’id $E’T’ $E’T+ $E’T’F $E’T’id $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ Lecture #9 Input id+id*id$ +id*id$ id*id$ id$ $ $ $ Output E→TE’ T→FT’ F→id T’→ε E’→+TE’ T→FT’ F→id T’→*FT’ F→id T’→ε E’→ε PLP Spring 2004, UF CISE 11

How are Parsing Tables Constructed • We need two get at two important ideas

How are Parsing Tables Constructed • We need two get at two important ideas to understand parse tables. These are captured by the functions FIRST and FOLLOW. • If is a string of grammar symbols, FIRST( ) is the set of terminals that begin strings derived from . If *ε, then ε is also in FIRST( ). • FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in a sentential form, in other words, if S * Aa , then a is in FOLLOW(A). Lecture #9 PLP Spring 2004, UF CISE 12

Constructing FIRST(X) • To compute FIRST(X) for all grammar symbols X, apply the following

Constructing FIRST(X) • To compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or can be added to any FIRST set: 1. If X is a terminal, then FIRST(X) is {X}. 2. If X is a nonterminal and X→a is a production, then add a to FIRST(X). 3. If X→Y 1 Y 2···Yk is a production, then for all I such that all of Y 1, … , Yi-1 are nonterminals and FIRST(Yj) contains ε for j = 1, 2, … , i-1 (i. e. Y 1 Y 2···Yi-1 *ε), add every non-ε symbol in FIRST(Yj) to FIRST(X). If ε is in FIRST(Yj) for all j = 1, 2, … , k, then add ε to FIRST(X). Lecture #9 PLP Spring 2004, UF CISE 13

Constructing FOLLOW • To compute FOLLOW(A) for all nonterminals A, apply the following rules

Constructing FOLLOW • To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set: 1. $ is in FOLLOW(S), where S is the start symbol. 2. If there is a production A→ B , ε, then everything in FIRST( ) but ε is in FOLLOW(B). 3. If there is a production A→ B, or a production A→ B where FIRST( ) contains ε (i. e. , *ε), then everything in FOLLOW(A) is in FOLLOW(B). Lecture #9 PLP Spring 2004, UF CISE 14

Parsing Table Construction 1. 2. 3. 4. • • For each production A→ of

Parsing Table Construction 1. 2. 3. 4. • • For each production A→ of the grammar, do steps 2 and 3. For each terminal a in FIRST( ), add A→ to M[A, a]. If ε is in FIRST( ), add A→ to M[A, b] for each terminal b in FOLLOW(A). If ε is in FIRST( ), and $ is in FOLLOW(A), add A→ to M[A, $]. Make each undefined entry of M error. Following this procedure will make a table for any grammar, however, that table may contain multiply defined entries. Any grammar for which the table contains only single entries is said to be LL(1). This is because it reads input from Left to right, constructs a Leftmost derivation, and needs to look only one (1) symbol ahead into the input to determine what action to take. Lecture #9 PLP Spring 2004, UF CISE 15