Grammars Definitions Grammars BackusNaur Form Derivation terminology trees
Grammars · · Definitions Grammars Backus-Naur Form Derivation – terminology – trees · · · Grammars and ambiguity Simple example Grammar hierarchies Syntax graphs Recursive descent parsing (2. 1)
Definitions · Syntax – the form or structure of the expressions, statements, and program units · Semantics – the meaning of the expressions, statements, and program units · Sentence – a string of characters over some alphabet · Language – a set of sentences · Lexeme – the lowest level syntactic unit of a language » : =, {, while · Token – a category of lexemes (e. g. , identifier) (2. 2)
Grammars · Can serve as “generators” or “recognizers” – recognizers used in compilers – we’ll study grammars as generators · Contain 4 components – terminal symbols » atomic components of statements in the language • appear in source programs » identifiers, operators, punctuation, keywords – nonterminal symbols » intermediate elements in producing terminal symbols » never appear in source program – start (or goal) symbol » a special nonterminal which is the starting symbol for producing statements (2. 3)
Grammars (continued) (2. 4) · 4 components (continued) – productions » rules for transforming nonterminal symbols into terminals or other nonterminals » “nonterminal” : : = terminals and/or nonterminals » each has lefthand side (LHS) and righthand side (RHS) » every nonterminal must appear on LHS of at least one production
Grammars (continued) (2. 5) · 4 categories of grammars – regular » good for identifiers, parameter lists, subscripts – context free » LHS of production is single non-terminal – context sensitive – recursively enumerable enough for PLs
Backus-Naur Form (BNF) (2. 6) · Used to describe syntax of PL; first used for Algol-60 · Nonterminals are enclosed in <. . . > – <expression>, <identifier> · Alternatives indicated by | – <digit> : : = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 · Options (0 or 1 occurrences) indicated by [. . . ] – <stmt> : : = if <cond> then <stmt> [ else <stmt>] » note recursion · Repetition (0 or more occurrences) indicated by {. . . } – <unsigned> : : = <digit> {<digit>} · Derivation – repeated application of rules, starting with start symbol and ending with sentence
BNF (continued) (2. 7) · Example grammar and derivation <program> <stmts> <stmt> <var> <expr> <term> -> -> -> <stmts> <stmt> | <stmt> ; <stmts> <var> = <expr> a | b | c | d <term> + <term> | <term> - <term> <var> | const <program> => => <stmts> <stmt> <var> = <expr> a = <term> + <term> a = <var> + <term> a = b + const
Derivation Terminology (2. 8) · Every string of symbols in the derivation is a sentential form · A sentence is a sentential form that has only terminal symbols · A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded – similarly for rightmost derivation · A derivation may be neither leftmost nor rightmost
(2. 9) Derivation Trees · A derivation tree is the tree resulting from applying productions to rewrite start symbol – a parse tree is the same tree starting with terminals and building back to the start symbol <program> <stmts> <stmt> <var> a = <expr> <term> <var> + <term> const
(2. 10) Grammars and Ambiguity · A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees · An ambiguous expression grammar: – <expr> -> <expr> <op> <expr> | const – <op> -> / | <expr> const <expr> <op> <expr> - const <expr> / const - const <op> <expr> / const
Grammars and Ambiguity (continued) (2. 11) · We must have unambiguous grammars so compiler can produce correct code – because parse tree provides precedence and associativity of operators · Left recursive grammars produce left associativity · Right recursive grammars produce right associativity · An unambiguous expression grammar: – <expr> -> <expr> - <term> | <term> – <term> -> <term> / const | const <expr> - <term> const / const
Grammars and Ambiguity (continued) (2. 12) · One famous ambiguity is “dangling else” – <stmt> : : = if <cond> then <stmt> [else <stmt>] · This can derive if X > 9 then if B = 4 then X : = 5 else X : = 0
Grammars and Ambiguity (continued) (2. 13) · Can solve syntactically by adding nonterminals & prod – <stmt> : : = <matched> | <unmatched> – <matched> : : = if <cond> then <matched> else <matched> – <unmatched> : : = if <cond> then <stmt> | if <cond> then <matched> else <unmatched> · Can also solve semantically – “elses are associated with immediately preceding unmatched then”
Grammar Hierarchies (2. 14) · BNF (and equivalent notations such as syntax graphs) can describe context free grammars – nonterminals appear alone on the LHS of productions · But there is a whole hierarchy of grammar types – recursively enumerable » context sensitive • context free – regular · Context free grammars can describe the essential features of all current PLs · Regular grammars are good for identifiers, parameter lists, etc.
Simple Grammar Example (2. 15) · Consider following unambiguous grammar for expressions – <expr> : : = [<expr> <addop>] <term> – <term> : : = [<term> <mulop>] <factor> – <factor> : : = (<expr>) | <digit> – <addop> : : = + | – <mulop> : : = * | / – <digit> : : = 0 |. . . | 9 · This grammar is left recursive and generates expressions that are left associative · Changing <factor> production produces right associative exponentiation – <factor> : : = <expon> [ ** <factor> ]
(2. 16) Syntax Graphs · Are equivalent to CFGs – put the terminals in circles or ellipses and put the nonterminals in rectangles; – connect with lines with arrowheads · Terminals in circles · Non-terminals in rectangles · Lines and arrows indicate how constructs are built type_identifier ( identifier ) , constant . . constant
Recursive Descent Parsing (2. 17) · Parsing is the process of tracing or constructing a parse tree for a given input string · Parsers usually do not analyze lexemes – done by a lexical analyzer, which is called by the parser
Recursive Descent Parsing (continued) (2. 18) · A recursive descent parser traces out a parse tree in top-down order – top-down parser · Each nonterminal in the grammar has a subprogram associated with it – the subprogram parses all sentential forms that the nonterminal can generate · The recursive descent parsing subprograms are built directly from the grammar rules · Recursive descent parsers, like other topdown parsers, cannot be built from leftrecursive grammars
Recursive Descent Parsing (continued) (2. 19) Example For the grammar: <term> -> <factor> {(* | /) <factor>} void term () { factor (); /* parse the first factor*/ while (next_token == ast_code || next_token == slash_code) { lexical (); /* get next token */ factor (); /* parse the next factor */ } }
- Slides: 19