CPS 506 Comparative Programming Languages Syntax Specification Compiling

  • Slides: 29
Download presentation
CPS 506 Comparative Programming Languages Syntax Specification

CPS 506 Comparative Programming Languages Syntax Specification

Compiling Process Steps • Program Lexical Analysis – Convert characters into a stream of

Compiling Process Steps • Program Lexical Analysis – Convert characters into a stream of tokens • Lexical Analysis Syntactic Analysis – Send tokens to develop an abstract representation or parse tree 2

Compiling Process Steps (con’t) • Syntactic Analysis Semantic Analysis – Send parse tree to

Compiling Process Steps (con’t) • Syntactic Analysis Semantic Analysis – Send parse tree to analyze for semantic consistency and convert for efficient run in the architecture (Optimization) • Semantic Analysis Machine Code – Convert abstract representation to executable machine code using code generation 3

Formal Methods and Language Processing • Meta-Language – A language to define other languages

Formal Methods and Language Processing • Meta-Language – A language to define other languages • BNF (Backus-Naur Form) A set of rewriting rules ρ A set of terminal symbols ∑ A set of non-terminal symbols Ν A start symbol S є Ν ρ: Α ω Α є Ν and ω є (Ν U Σ) Right-hand side: a sequence of terminal and non-terminal symbols – Left-hand side: a non-terminal symbol – – – – 4

BNF (con’t) • The words in Ν : grammatical categories – – Identifier, Expression,

BNF (con’t) • The words in Ν : grammatical categories – – Identifier, Expression, Loop, Program, … S : principal grammatical category Symbols in Σ : the basic alphabet Example 1: binary. Digit 0 binary. Digit 1 • or binary. Digit 0 | 1 – Example 2: Integer Digit | Integer Digit 0|1|2|3|4|5|6|7|8|9 5

BNF (con’t) • Parse Tree Integer Digit Integer • Derivation Integer Digit 8 1

BNF (con’t) • Parse Tree Integer Digit Integer • Derivation Integer Digit 8 1 2 Integer Digit Digit 2 Digit 281 6

BNF (con’t) • Lexeme: The lowest-level syntactic units • Tokens : A set of

BNF (con’t) • Lexeme: The lowest-level syntactic units • Tokens : A set of all grammatical categories that define strings of non-blank characters (Lexical Syntax) – – – Identifier (variable names, function names, …) Literal (integer and decimal numbers, …) Operator (+, -, *, /, …) Separator (; , . , (, ), {, }, …) Keyword (int, if, for, where, …) 7

BNF (con’t) Comment Keyword Identifier // comments … void main ( ) { float

BNF (con’t) Comment Keyword Identifier // comments … void main ( ) { float p; p = 3. 14 ; } Literal Separator Operator 8

BNF (con’t) 9

BNF (con’t) 9

Regular Expressions • An alternative for BNF to define a language lexical rules –

Regular Expressions • An alternative for BNF to define a language lexical rules – x : A character – “abc” : A literal string – A | B : A or B – A B : Concatenation of A and B – A* : Zero or more occurrence of A – A+ : One or more occurrence of A – A? : Zero or one occurrence of A – [a-z A-Z] : Any alphabetic character – [0 -9] : Any digit –. : Any single character • Example Integer : Identifier : [0 -9]+ [a-z A-Z][a-z A-Z 0 -9]* 10

Syntactic Analysis • • Primary tool: BNF Input: Tokens from lexical analysis Output: Parse

Syntactic Analysis • • Primary tool: BNF Input: Tokens from lexical analysis Output: Parse Syntactic categories – Program • • • Declaration Assignment Expression Loop Function definition 11

Syntactic Analysis (con’t) • Example Arithmetic Expression Term | Arithmetic Expression + Term |

Syntactic Analysis (con’t) • Example Arithmetic Expression Term | Arithmetic Expression + Term | Arithmetic Expression – Term Factor | Term * Factor | Term / Factor Identifier | Literal | ( Arithmetic Expression ) 12

Syntactic Analysis (con’t) Arithmetic Expression • Example Term 2 * a - 3 Term

Syntactic Analysis (con’t) Arithmetic Expression • Example Term 2 * a - 3 Term * Factor Identifier Literal Letter Integer 2 a Arithmetic Expression Term Factor Literal Integer 3 13

Syntactic Analysis (con’t) • BNF limitations – Declaration of identifiers? – Initial value of

Syntactic Analysis (con’t) • BNF limitations – Declaration of identifiers? – Initial value of identifiers? • In statically typed languages – Using Type System for the first problem – Detect in compile time or run time 14

Ambiguous Grammar • A string is parsed into two or more various trees •

Ambiguous Grammar • A string is parsed into two or more various trees • Example Exp Identifier | Literal | Exp – Exp Input: A – B – C Output: 1 - A – (B – C) 2 - (A – B) – C • Another example is “dangling else” – Using BNF rules – Using extra-grammatical rules 15

Operator Precedence <expr> <id> + <expr> | <id> * <expr> | ( <expr> )

Operator Precedence <expr> <id> + <expr> | <id> * <expr> | ( <expr> ) | <id> A = B + C * A A = B + (C * A) A = B * C + A A = B * (C + A) Solution <expr> + <term> | <term> * <factor> | <factor> ( <expr> ) | <id> A = B + C * A A = B + (C * A) A = B * C + A A = (B * C) + A 16

Associativity of Operators A+B+C A*B*C • Left Associativity A/B/C … – Left Recursive: In

Associativity of Operators A+B+C A*B*C • Left Associativity A/B/C … – Left Recursive: In a grammar rule, LHS also appears at the beginning of its RHS <expr> + <term> | <term> A+B+C (A + B) + C • Right Associativity – Right Recursive: In a grammar rule, LHS also appears at the end of its RHS <factor> <exp> ** <factor> | <exp> ( <expr> ) | <id> A + B ** C A + (B ** C) 17

Extended BNF (EBNF) • Optional part of an RHS <if_stmt> if ( <expr> )

Extended BNF (EBNF) • Optional part of an RHS <if_stmt> if ( <expr> ) <statement> [ else <statement> ] • Repetition, or recursion, part of an RHS <id_list> <id> { , <id_list> } • Multiple choice option of an RHS <term> ( * | / | % ) <factor> • Optional use of * and + <id_list> <id> { , <id_list> }* <integer> {0 | … | 9}+ 18

Extended BNF (EBNF) (con’t) • opt subscript Conditional Statement if ( Expr ) Statement

Extended BNF (EBNF) (con’t) • opt subscript Conditional Statement if ( Expr ) Statement { else Statement }opt • Syntax Diagram Term Factor *|/ 19

Case Study • A BNF or EBNF for one grammar, such as Expression, different

Case Study • A BNF or EBNF for one grammar, such as Expression, different Literals, or if Statement in Java, C, C++, or Pascal • BNF or EBNF for floating point numbers in Java, C, C++ • BNF or EBNF for loop statements in one language 20

Abstract Syntax • Consider the following codes: • Pascal • C or Java While

Abstract Syntax • Consider the following codes: • Pascal • C or Java While i < 10 do begin i : = i+ 1; end; while (i < 10) { i = i + 1; } Although syntax are different, they are essentially equivalent • Abstract Syntax is a solution to show the essential elements of a language 21

Abstract Syntax (con’t) • General Form Abstract Syntax Class = list of essential components

Abstract Syntax (con’t) • General Form Abstract Syntax Class = list of essential components Member • Example Loop = Expression test; Statement body Element • A Java class for abstract syntax of loop } class Loop extends Statement { Expression test; Statement body; 22

Abstract Syntax (con’t) • More examples Member Assignment = Variable target; Expression source Element

Abstract Syntax (con’t) • More examples Member Assignment = Variable target; Expression source Element • A Java class for abstract syntax of Assignment } class Assignment extends Statement { Variable target; Expression source; 23

Abstract Syntax Tree • A tree to show the abstract syntax tree Example x

Abstract Syntax Tree • A tree to show the abstract syntax tree Example x = 2; x : = 2; Assignment = Variable target; Expression source Statement Assignment Variable Expression x Value 2 24

Recursive Descent Parser • A top-down parser to verify the syntax of a stream

Recursive Descent Parser • A top-down parser to verify the syntax of a stream of text from left to right • It contains several recursive methods, each of which implements a rule of the grammar • More details and parsing algorithms in Compiler course 25

Exercises 1. Modify the following grammar to add a unary minus operator that has

Exercises 1. Modify the following grammar to add a unary minus operator that has higher precedence than either + or *. <assign> <id> = <expr> <id> A | B | C <expr> + <term> | <term> * <factor> | <factor> ( <expr> ) | <id> 26

Exercises 2. Consider the following grammar: <S> <A> a <B> b <A> b |

Exercises 2. Consider the following grammar: <S> <A> a <B> b <A> b | b <B> a <B> | a Which of the following sentences are in the language generated by this grammar? 1. 2. 3. 4. baab bbbab bbaaaaa bbaab 27

Exercises 3. Convert the following EBNF to BNF: S A { b. A }

Exercises 3. Convert the following EBNF to BNF: S A { b. A } A a [b]A 4. Using grammar in question 1, add the ++ and – unary operators of Java. 5. Using grammar in question 1, show a parse tree and a leftmost derivation for each of the following statements: a) b) A = (A+B) * C A = B * (C * (A + B)) 28

Exercises 6. Rrewrite the BNF in question 1 to give + precedence over *,

Exercises 6. Rrewrite the BNF in question 1 to give + precedence over *, and force + to be right associative. 7. Using BNF write an algorithm for the language consisting of strings {ab}n, where n>0, such as ab, aabb, …. Can you write this using regular expressions? 29