BNF 15 Sep20 BNF BNF stands for either

BNF 15 -Sep-20

BNF • BNF stands for either Backus-Naur Form or Backus Normal Form • BNF is a metalanguage used to describe the grammar of a programming language • BNF is formal and precise – BNF is a notation for context-free grammars • BNF is essential in compiler construction

BNF • < > indicate a nonterminal that needs to be further expanded, e. g. <variable> • Symbols not enclosed in < > are terminals; they represent themselves, e. g. if, while, ( • The symbol : : = means is defined as • The symbol | means or; it separates alternatives, e. g. <addop> : : = + | -

BNF uses recursion • <integer> : : = <digit> | <integer> <digit> or <integer> : : = <digit> | <digit> <integer> • Recursion is all that is needed (at least, in a formal sense) • "Extended BNF" allows repetition as well as recursion • Repetition is usually better when using BNF to construct a compiler

BNF Examples I • <digit> : : = 0|1|2|3|4|5|6|7|8|9 • <if statement> : : = if ( <condition> ) <statement> | if ( <condition> ) <statement> else <statement>

BNF Examples II • <unsigned integer> : : = <digit> | <unsigned integer> <digit> • <integer> : : = <unsigned integer> | + <unsigned integer> | - <unsigned integer>

BNF Examples III • <identifier> : : = <letter> | <identifier> <digit> • <block> : : = { <statement list> } • <statement list> : : = <statement> | <statement list> <statement>

Extended BNF • The following are pretty standard: – [ ] enclose an optional part of the rule • Example: <if statement> : : = if ( <condition> ) <statement> [ else <statement> ] – { } mean the enclosed can be repeated any number of times (including zero) • Example: <parameter list> : : = ( ) | ( { <parameter> , } <parameter> )

Parser • Parser works on a stream of tokens. • The smallest item is a token. source program Lexical Analyzer token Parser parse tree get next token 10

Parse Tree • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. • A parse tree can be seen as a graphical representation of a derivation. Example: E E+E | E–E | E*E | E/ E | -E E (E) E id E E -(E) - E E - E ( E ) E - -(id+E) E - E ( E ) E + E id E -(E+E) -(id+id) E ( E ) E + E id id 11

Two groups of parser • We categorize the parsers into two groups: 1. Top-Down Parser – the parse tree is created top to bottom, starting from the root. 2. Bottom-Up Parser – the parse is created bottom to top; starting from the leaves • Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). • Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. – – LL for top-down parsing LR for bottom-up parsing 12

Left-Most and Right-Most Derivations Left-Most Derivation lm lm lm rm rm E -(E) -(E+E) -(id+id) Right-Most Derivation rm rm rm E -(E) -(E+id) -(id+id) 13

Ambiguity

Ambiguity rules • For the most parsers, the grammar must be unambiguous. • We should eliminate the ambiguity in the grammar during the design phase of the compiler. • An unambiguous grammar should be written to eliminate the ambiguity. • We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice. 15

Ambiguity • A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. Example: E E+E | E*E E id E E E+E id+E*E id+id*E id+id*id E E*E E+E*E id+E*E id+id*E id+id*id E + id E E * id E E E id + * E E id id 16

Unambiguous grammar unique selection of the parse tree for the grammar(Sentence).

Which is Ambiguity? BNF : stmt if expr then stmt | if expr then stmt else stmt | otherstmts if E 1 then if E 2 then S 1 else S 2 1? 2?

Ambiguity • We prefer the second parse tree (else matches with closest if). • So, we have to disambiguate our grammar to reflect this choice. The disambiguous grammar will be: stmt matchedstmt | unmatchedstmt if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmt 19

Operator Precedence- Ambiguity • Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associatively rules. E E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) 20

Left Recursion

Left Recursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation. + A A for some string • Top-down parsing techniques cannot handle left-recursive grammars. • So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. • The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. 22

Immediate Left-Recursion A A | A A’ A’ A ’ | where does not start with A eliminate immediate left recursion an equivalent grammar 23

In general A A 1 |. . . | A m | 1 |. . . | n where 1. . . n do not start wi eliminate immediate left recursion A 1 A’ |. . . | n A’ A’ 1 A’ |. . . | m A’ | an equivalent grammar

Immediate Left-Recursion E E+T | T T T*F | F F id | (E) 25

After eliminate immediate left recursion E T E’ E’ +T E’ | T F T’ T’ *F T’ | F id | (E)

Creating a top-down parser • Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting form the root and creating the nodes of the parse tree in preorder. • An example follows. 27

Top-down parsing • A top-down parsing program consists of a set of procedures, one for each non-terminal. • Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string. 28

Creating a top-down parser (Cont. ) • Given the grammar : – – – E → TE’ E’ → +TE’ | λ T → FT’ T’ → *FT’ | λ F → (E) | id • The input: id + id * id 29

30