Topic 2 Infix to Postfix EE 456 Compiling
- Slides: 45
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003
Why this topic? • Program to convert infix to postfix – A simple, one-pass compiler! – Infix notation is source language – Postfix is intermediate code – No code generation (but could be actions) • A good example that introduces many aspects of writing a compiler
Syntax vs. Semantics • Syntax – Describes what is allowable in a language – Relatively easy to describe (we will use context free grammars) • Semantics – Describes the meaning of a program (or expression) – Very difficult to describe in a formal sense • We are discussing programming languages, but these definitions also apply to natural languages
Structure of Simple Compiler • A lexical analyzer will tokenize the input • A “syntax-directed translator” combines syntax analysis and intermediate code generation
Context-free Grammars • Often used to describe the hierarchical structure of a programming language • Every CFG has four components: – A set of tokens (terminal symbols) – A set of nonterminals – A set of productions (rules) – A start symbol (left side of first rule) Example rule: stmt → if (expr) stmt else stmt
Productions • Every production consists of left side, an arrow (“can have the form of”), right side • Left side is a always nonterminal; the production is “for” this nonterminal • Right side consists of terminals and/or non -terminals • Convention, nonterminals will be italicized Example rule: stmt → if (expr) stmt else stmt
Example CFG • List of digits separated by plus or minus signs: list → list + digit list → list – digit list → digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • For convenience, nonterminals can be grouped using ‘|’ (“or”) list → list + digit | list – digit | digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • The tokens of this grammar: + • The start symbol: list - 0 1 2 3 4 5 6 7 8 9
CFG for Expressions expr → expr + term | expr – term | term → term * factor | term / factor | factor → digit | (expr) digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Strings • A string of tokens is a sequence of zero or more tokens • The string containing zero tokens is the empty string: ‘ε’ • A grammar “derives” strings – starting with start symbol – repeatedly replacing nonterminals with right side of productions for the nonterminals – The set of strings that can be derived from the start symbol is the “language” for the grammar
Parse Trees • Shows how the start symbol of a grammar can derive a string in the language • A tree with the following properties: – – The root is labeled with a start symbol Each leaf labeled with a token or ε Each interior node is labeled by a nonterminal If A is the label for an interior node, and X 1, X 2, …, Xn (nonterminals or tokens) are the labels of its children, then the following production must exist: A → X 1 X 2 … Xn
Sample Parse Tree: 9 – 5 + 2
More on Parse Trees • A language can be defined as all strings that can be generated by some parse tree • The process of finding a parse tree for a given string is called “parsing” the string
Ambiguous Grammars • If any string has more than one parse tree, grammar is said to be ambiguous • Need to avoid for compilation, since string can have more than one meaning • List of digits separated by plus or minus signs: string → string + string | string – string |0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Example merges notion of digit and list into single nonterminal string • Same strings are derivable, but some strings have multiple parse trees (possible meanings)
Two Parse Trees: 9 – 5 + 2
Precedence and Associativity • Precedence – Determines the order in which different operators are evaluated when they occur in the same expression – Operators of higher precedence are applied before operators of lower precedence • Associativity – Determines the order in which operators of equal precedence are evaluated when they occur in the same expression – Most operators have a left-to-right associativity, but some have right-to-left associativity
Postfix Notation • Formal rules, infix → postfix – If E is variable or constant, E → E – If E is expression of form E 1 op E 2, where op is binary operator, E 1 → E 1’, and E 2 → E 2’, then E → E 1’ E 2’ op – If E is expression of form (E 1) and E 1 → E 1’, then E → E 1’ • Parentheses are not needed!
Syntax-Directed Definitions • CFG to specify syntactic structure of input • Each grammar symbol has associated attributes • Each production has associated semantic rules for computing values of attributes • Grammar and semantic rules together constitute the syntax-directed definition • A parse tree showing the attribute values at each node is called an annotated parse tree
Two Types of Attributes • Synthesized Attributes – Value at a parse-tree node can be determined based on values of attributes of children – Can be evaluated during a single bottom-up traversal of parse tree • Inherited Attributes – More complicated – Will be discussed later in course
Example Syntax-Directed Definition Production Semantic Rule expr 1 + term expr. t : = expr 1. t || term. t || ‘+’ expr 1 - term expr. t : = expr 1. t || term. t || ‘-’ expr term expr. t : = term. t term 0 term. t : = ‘ 0’ term 1 term. t : = ‘ 1’ … term 9 … term. t : = ‘ 9’
Depth-First Search • Recursively visit all children before evaluating semantic rules of given node • Suitable if all attributes are synthesized procedure visit (n: node) begin for each child m of n, from left to right do visit(m) evaluate semantic rules at node n end
Annotated Parse Tree: 9 – 5 + 2
Translation Schemes • Adds to a CFG • Includes “semantic actions” embedded within productions • Similar to syntax-directed definition, but order of evaluation is explicit
Example Translation Scheme expr term expr + term { print(‘+’) } expr – term { print(‘-’) } term 0 { print(‘ 0’) } 1 { print(‘ 1’) } … term 9 { print(‘ 9’) }
Equivalent Translation Scheme expr rest term term rest + term { print(‘+’) } rest - term { print(‘-’) } rest ε 0 { print(‘ 0’) } 1 { print(‘ 1’) } … term 9 { print(‘ 9’) }
Parsing • Parsing is the process of determining if a string of tokens can be generated by a grammar • For any CFG, there is a parser that takes at most O(n^3) time • For most programming languages that arise in practice, linear time parsers exist
Top-down Parsing • Recursively apply the following steps: – At node n with nonterminal A, select a production for A – Construct children at n for symbols on right side of selected production – Find next node for which subtree needs to be constructed • Top-down parsing uses a “lookahead” symbol • Selecting production may involve trial-and-error and backtracking
Predictive Parsing • Recursive-descent parsing is a recursive, top-down approach to parsing • A procedure is associated with each nonterminal of the grammar • Predictive parsing – Special case of recursive-descent parsing – The lookahead symbol unambiguously determines the procedure for each nonterminal
Procedures for Nonterminals • Production with right side α used if lookahead is in FIRST(α) – FIRST(α) is set of all symbols that can be first symbol of α – If lookahead symbol is not in FIRST set for any production, can use production with right side of ε – If two or more possibilities, can not use this method – If no possibilities, an error is declared • Nonterminals on right side of selected production are recursively expanded
Left Recursion • Left-recursive productions can cause recursive-descent parsers to loop forever • Example: example + term • Can eliminate left recursion A Aα|β A βR R αR|ε
Eliminating Left Recursion expr term expr rest term expr + term { print(‘+’) } expr – term { print(‘-’) } term 0 { print(‘ 0’) } 1 { print(‘ 1’) } … term 9 { print(‘ 9’) } term rest + term { print(‘+’) } rest - term { print(‘-’) } rest ε 0 { print(‘ 0’) } 1 { print(‘ 1’) } … term 9 { print(‘ 9’) }
Infix to Prefix Code: Part 1 #include <stdio. h> #include <ctype. h> int lookahead; void void expr(void); rest(void); term(void); match(int); error(void); int main(void) { lookahead = getchar(); expr(); putchar('n'); /* adds trailing newline character */ } …
Infix to Prefix Code: Part 2 … void expr(void) { term(); rest(); } void term(void) { if (isdigit(lookahead)) { putchar(lookahead); match(lookahead); } else error(); } …
Infix to Prefix Code: Part 3 … void rest(void) { if (lookahead == '+') { match('+'); term(); putchar('+'); rest(); } else if (lookahead == '-') { match('-'); term(); putchar('-'); rest(); } } …
Infix to Prefix Code: Part 4 … void match(int t) { if (lookahead == t) lookahead = getchar(); else error(); } void error(void) { printf("syntax errorn"); /* print error message */ exit(1); /* then halt */ }
Code Optimization 1 void rest(void) { REST: if (lookahead == '+') { match('+'); term(); putchar('+'); goto REST; } else if (lookahead == '-') { match('-'); term(); putchar('-'); goto REST; } }
Code Optimization 2 void expr(void) { term(); while (1) { if (lookahead == '+') { match('+'); term(); putchar('+'); } else if (lookahead == '-') { match('-'); term(); putchar('-'); } else break; } }
Improvements Remaining • • Want to ignore whitespace Allow numbers Allow identifiers Allow additional operators (multiplications and division) • Allow multiple expressions (separated by semicolons)
Lexical Analyzer • Eliminates whitespace (and comments) • Reads numbers (not just single digits) • Reads identifiers and keywords
Implementing the Lexical Analyzer
Allowable Tokens • expected tokens: +, -, *, /, DIV, MOD, (, ), ID, NUM, DONE • ID represents an identifier, NUM represents a number, DONE represents EOF
Tokens and Attributes LEXEME white space TOKEN ATTRIBUTE VALUE --- sequence of digits NUM numeric value of sequence div DIV --- mod MOD --- letter followed by letters and digits ID EOF DONE any other character that character index into symbol table --NONE
A Simple Symbol Table • Each record of symbol table contains a token type and a string (lexeme or keyword) • Symbol table has fixed size • All lexemes in array of fixed size • Will be able to insert and search for tokens: – insert(s, t): creates entry with string s and token t, returns index into symbol table – lookup(s): searches for entry with string s, returns index if found, 0 otherwise • Keywords (div and mod) will be inserted into symbol table, they can not be used as identifiers
Updated Translation Scheme start list eof list expr; list | ε expr + term { print(‘+’) } | expr – term { print(‘-’) } | term * factor { print(‘*’) } | term / factor { print(‘/’) } | term div factor { print(‘DIV’) } | term mod factor { print(‘MOD’) } | factor (expr) | id { print(id. lexeme) } | num { print(num. value) }
After Eliminating Left Recursion start list eof list expr; list | ε expr term moreterms + term { print(‘+’) } moreterms | - term { print(‘-’) } moreterms | ε term factor morefactors * factor { print(‘*’) } morefactors | / factor { print(‘/’) } morefactors | div factor { print(‘DIV’) } morefactors | mod factor { print(‘MOD’) } morefactors | ε factor (expr) | id { print(id. lexeme) } | num { print(num. value) }
Final Code • About 250 lines of C • Pretty sloppy, otherwise would be longer
- Polish prefix
- Infix equation
- Infix prefix postfix
- Infix to postfix in c
- Algorithm for infix to postfix conversion
- Infix to postfix converter
- Compiled data in research
- Compiling creates an
- Compiling information
- Compiling vs interpreting
- An integral part planning and compiling a calendar is
- Fase fase proses kompilasi
- Thoughts compiling
- A clincher sentence is
- Broad topic and specific topic examples
- Half of 456
- Bbm 456
- 888-456-7910
- 123456789 xxxx
- 456 name
- Slump value as per is 456
- Gcs 456
- 123 456 78
- Write a number in expanded form. 456
- One way slab design example as per is 456
- Concrete mix ratio 1:2:3
- Gcs 456
- 8004563355
- 123 + 456 + 789
- 123-456-7890
- Roland purcell a technical writer
- Cmsc 456 3 cryptology
- Gbc gang oklahoma city
- Identification test for esters
- 123 + 456 x 789
- Cmsc 456
- Give 10 examples of infix
- Cafeteria stack
- Binar tree
- Dari gambar ini, maka notasi infix yang dihasilkan adalah :
- Logo notasi
- What is inflectional and derivational morphology
- Stata infix
- Postfix expression 5 4 6 + * 4 9 3 / + *
- Postfix expression
- Solve the given postfix expression. 62-3+4*