Top Down Parsing l Recursive Descent Parsing l

  • Slides: 15
Download presentation
Top Down Parsing l Recursive Descent Parsing l Top-down parsing: – Build tree from

Top Down Parsing l Recursive Descent Parsing l Top-down parsing: – Build tree from root symbol – Each production corresponds to one recursive procedure – Each procedure recognizes an instance of a non-terminal, returns tree fragment for the non-terminal 8 January 2004 Department of Software & Media Technology 1

General model Each right-hand side of a production provides body for a function l

General model Each right-hand side of a production provides body for a function l Each non-terminal on the right hand side is translated into a call to the function that recognizes that non-terminal l Each terminal in the right hand side is translated into a call to the lexical scanner. If the resulting token is not the expected terminal error occurs. l Each recognizing function returns a tree fragment. l 8 January 2004 Department of Software & Media Technology 2

Example: parsing a declaration FULL_TYPE_DECLARATION : : = l type DEFINING_IDENTIFIER is TYPE_DEFINITION; l

Example: parsing a declaration FULL_TYPE_DECLARATION : : = l type DEFINING_IDENTIFIER is TYPE_DEFINITION; l Translates into: – get token type – Find a defining_identifier -- function call – get token is – Recognize a type_definition -- function call – get token semicolon l In practice, we already know that the first token is type, that’s why this routine was called in the first place! Predictive parsing is guided by the next token l 8 January 2004 Department of Software & Media Technology 3

Example: parsing a loop l FOR_STATEMENT : : = ITERATION_SCHEME loop STATEMENTS end loop;

Example: parsing a loop l FOR_STATEMENT : : = ITERATION_SCHEME loop STATEMENTS end loop; Node 1 : = find_iteration_scheme; -- call function get token loop List 1 : = Sequence of statements -- call function get token end get token loop get token semicolon; Result : = build loop_node with Node 1 and List 1 return Result 8 January 2004 Department of Software & Media Technology 4

Problem: l If there are multiple productions for a non-terminal, mechanism is required to

Problem: l If there are multiple productions for a non-terminal, mechanism is required to determine which production to use: IF_STAT : : = if COND then Stats end if; IF_STAT : : = if COND then Stats ELSIF_PART end if; When next token is if, so which production to use 8 January 2004 ? Department of Software & Media Technology 5

One Solution: factorize grammar l If several productions have the same prefix, rewrite as

One Solution: factorize grammar l If several productions have the same prefix, rewrite as single production: l IF_STAT : : = if COND then STATS [ELSIF_PART] end if; – Problem now reduces to recognizing whether an optional – Component (ELSIF_PART) is present 8 January 2004 Department of Software & Media Technology 6

Second Problem of Recursion Grammar should not be left-recursive: l E : : =

Second Problem of Recursion Grammar should not be left-recursive: l E : : = E + T | T l Problem: to find an E, start by finding an E… l – Original scheme leads to infinite loop – Grammar is inappropriate for recursive-descent 8 January 2004 Department of Software & Media Technology 7

Solution to left-recursion l E : : = E + T | T means

Solution to left-recursion l E : : = E + T | T means that eventually E expands into T + T …. l Rewrite as: – E : : = TE’ – E’ : : = + TE’ | epsilon l Informally: E’ is a possibly empty sequence of terms separated by an operator 8 January 2004 Department of Software & Media Technology 8

Recursion can involve multiple productions A : : = B C | D l

Recursion can involve multiple productions A : : = B C | D l B : : = A E | F l – Can be rewritten as: A : : = A E C | F C | D – Now apply previous method – General algorithm to detect and remove left-recursion 8 January 2004 Department of Software & Media Technology 9

Further Problem l Transformation does not preserve associativity: – – E : : =

Further Problem l Transformation does not preserve associativity: – – E : : = E + T | T Parses a + b + c as (a + b) + c E : : = TE’, E’ : : = + TE’ | epsilon Parses a + b +c as a + (b + c) – Incorrect for a - b – c : must rewrite tree 8 January 2004 Department of Software & Media Technology 10

In practice: use loop to find sequence of terms Node 1 : = P_Term;

In practice: use loop to find sequence of terms Node 1 : = P_Term; -- call function that recognizes a term loop exit when Token not in Token_Class_Binary_Addop; Node 2 : = New_Node (P_Binary_Adding_Operator); Scan; -- past operator Set_Left_Opnd (Node 2, Node 1); Set_Right_Opnd (Node 2, P_Term); -- find next term Set_Op_Name (Node 2); Node 1 : = Node 2; -- operand for next operation end loop; 8 January 2004 Department of Software & Media Technology 11

LL (1) Parsing LL (1) grammars l l l If table construction is successful,

LL (1) Parsing LL (1) grammars l l l If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead. If construction fails, can conceive of LL (2), etc. Ambiguous grammars are never LL (k) If a terminal is in First for two different productions of A, the grammar cannot be LL (1). Grammars with left-recursion are never LL (k) Some useful constructs are not LL (k) 8 January 2004 Department of Software & Media Technology 12

Building LL (1) parse tables Table indexed by non-terminal and token. Table entry is

Building LL (1) parse tables Table indexed by non-terminal and token. Table entry is a production: for each production P: A a loop for each terminal a in First (a) loop T (A, a) : = P; end loop; if e in First (a), then for each terminal b in Follow (a) loop T (A, b) : = P; end loop; end if; end loop; l All other entries are errors. l If two assignments conflict, parse table cannot be built. 8 January 2004 Department of Software & Media Technology 13

Left Recursion Removal & Left Factoring Left Recursion Removal: Left Factoring: 8 January 2004

Left Recursion Removal & Left Factoring Left Recursion Removal: Left Factoring: 8 January 2004 Department of Software & Media Technology 14

Synatx Tree Construction in LL(1) First and Follow Sets LL(k) Parsers (Extending the Lookahead

Synatx Tree Construction in LL(1) First and Follow Sets LL(k) Parsers (Extending the Lookahead Error Recovery in Top Down Parsers Error Recovery in LL(1) Parsers 8 January 2004 Department of Software & Media Technology 15