Chapter 2 Syntax Syntax n The syntax of

  • Slides: 37
Download presentation
Chapter 2 Syntax

Chapter 2 Syntax

Syntax n The syntax of a programming language specifies the structure of the language

Syntax n The syntax of a programming language specifies the structure of the language n The lexical structure specifies how words can be constituted from characters n The syntactic structure specifies how sentences can be constituted from words

Lexical Structure n The tokens of a programming language consist of the set of

Lexical Structure n The tokens of a programming language consist of the set of all baisc grammatical categories that are the building blocks of syntax n A program is viewed as a stream of tokens

Standard Token Categories n Keywords, such as if and while n Literals or constants,

Standard Token Categories n Keywords, such as if and while n Literals or constants, such as 42 (a numeric literal) or "hello" (a string literal) n Special symbols, such as “; ”, “<=”, or “+” n Identifiers, such as x 24, putchar, or monthly_balance

White Spaces and Comments n n n White spaces and comments are ignored except

White Spaces and Comments n n n White spaces and comments are ignored except they function as delimiters Typical white spaces: newlines, tabs, spaces Comments: n n /* … */, // … n (C, C++, Java) -- … n (Ada, Haskell) (* … *) (Pascal, ML) ; … n (Scheme)

C tokens There are six classes of tokens: identifiers, keywords, constants, string literals, operators,

C tokens There are six classes of tokens: identifiers, keywords, constants, string literals, operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments as described below (collectively, "white space") are ignored except as they separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords, and constants. If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.

An Example /* This program counts from 1 to 10. */ main( ) {

An Example /* This program counts from 1 to 10. */ main( ) { int i; for (i = 1; i <= 10; i++) { printf(“%dn”, i); } }

Backus-Naur Form (BNF) n n n BNF is a notation widely used in formal

Backus-Naur Form (BNF) n n n BNF is a notation widely used in formal definition of syntactic structure A BNF is a set of rewriting rules , a set of terminal symbols , a set of nonterminal symbols N, and a “start symbol” S N Each rule in has the following form A where A N and (N )*

Backus-Naur Form n The terminals in form the basic alphabet (tokens) from which programs

Backus-Naur Form n The terminals in form the basic alphabet (tokens) from which programs are constructed n The nonterminals in N identify grammatical categories like Identifier, Integer, Expression, Statement, Function, Program n The start symbol S identifies the principal grammatical category being defined by the grammar

Examples 1. binary. Digit 0 binary. Digit 1 binary. Digit 0 | 1 2.

Examples 1. binary. Digit 0 binary. Digit 1 binary. Digit 0 | 1 2. metasymbol or metasymbol concatenate Integer Digit | Integer Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Derivation Integer Digit Sentential Digit form 3 Digit 3 5 2 Sentence

Derivation Integer Digit Sentential Digit form 3 Digit 3 5 2 Sentence

Parse Tree Sentential form

Parse Tree Sentential form

Example: Expression Assignment Identifier = Expression Term | Expression + Term | Expression –

Example: Expression Assignment Identifier = Expression Term | Expression + Term | Expression – Term Factor | Term * Factor | Term / Factor Identifier | Literal | ( Expression )

Example: Expression x+2*y

Example: Expression x+2*y

Syntax for a Subset of C Program void main ( ) { Declarations Statements

Syntax for a Subset of C Program void main ( ) { Declarations Statements } Declarations | Declarations Declaration Type Identifiers ; Type int | boolean Identifiers Identifier | Identifiers , Identifier Statements | Statements Statement ; | Block | Assignment | If. Statement | While. Statement Block { Statements } Assignment Identifier = Expression ; If. Statement if ( Expression ) Statement | if ( Expression ) Statement else Statement While. Statement while ( Expression ) Statement

Syntax for a Subset of C Expression Conjuction | Expression || Conjuction Relation |

Syntax for a Subset of C Expression Conjuction | Expression || Conjuction Relation | Conjuction && Relation Addition | Relation <= Addition | Relation >= Addition | Relation == Addition | Relation != Addition Term | Addition + Term | Addition – Term Negation | Term * Negation | Term / Negation Factor | ! Factor Identifier | Literal | ( Expression )

Example: Program . . void main ( ) { int x; x = 1;

Example: Program . . void main ( ) { int x; x = 1; }

Ambiguity n A grammar is ambiguous if it permits a string to be parsed

Ambiguity n A grammar is ambiguous if it permits a string to be parsed into two or more different parse trees Amb. Exp Integer | Amb. Exp – Amb. Exp 2 -3 -4

An Example 2 – (3 – 4) (2 – 3) – 4

An Example 2 – (3 – 4) (2 – 3) – 4

The Dangling Else Problem if ( x < 0 ) if ( y <

The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;

The Dangling Else Problem if ( x < 0 ) if ( y <

The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;

The Dangling Else Problem n Solution I: use a special keyword fi to explicitly

The Dangling Else Problem n Solution I: use a special keyword fi to explicitly close every if statement. For example, in Ada If. Statement if ( E ) S fi | if ( E ) S else S fi n Solution II: use an explicit rule outside the BNF syntax. For example, in C, every else clause is associated with the closest preceding if in the statement

Extended BNF (EBNF) n EBNF introduces 3 parentheses: n It uses { } to

Extended BNF (EBNF) n EBNF introduces 3 parentheses: n It uses { } to denote repetition to simplify the specification of recursion n It uses [ ] to denote the optional part n It uses ( ) for grouping

An Example Expression Term | Expression + Term | Expression – Term Factor |

An Example Expression Term | Expression + Term | Expression – Term Factor | Term * Factor | Term / Factor + number | - number | number grouping Expression Term { ( + | – ) Term } Term Factor { ( * | / ) Factor } zero or more Factor [ + | - ] number occurrences optional

Abstract Syntax n The abstract syntax of a language identifies the essential syntactic elements

Abstract Syntax n The abstract syntax of a language identifies the essential syntactic elements in a program without describing how they are concretely constructed while i < n do begin i : = i + 1 end while (i < n) { i = i + 1; } Pascal C

Example: Loop n n n Thinking a loop abstractly, the essential elements are a

Example: Loop n n n Thinking a loop abstractly, the essential elements are a test expression for continuing a loop and a body which is the statement to be repeated All other elements constitute nonessential “syntactic sugar” The complete syntax is usually called concrete syntax

Example: Loop while i < n do begin i : = i + 1

Example: Loop while i < n do begin i : = i + 1 end loop while (i < n) { i = i + 1; } C = < Pascal i n + i i 1

Example: Expression x+2*y

Example: Expression x+2*y

Example: Expression + x+2*y x * 2 y

Example: Expression + x+2*y x * 2 y

Parser n n A parser of a language accepts or rejects strings based on

Parser n n A parser of a language accepts or rejects strings based on whether they are legal strings in the language In a recursive-descent parser, each nonterminal is implemented as a function, and each terminal is implemented as a matching with the current token

Example: Calculator command expr ‘n’ expr term { ‘+ ’ term } term factor

Example: Calculator command expr ‘n’ expr term { ‘+ ’ term } term factor { ‘*’ factor } factor number | ‘(’ expr ‘)’ number digit { digit } digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Example: Calculator #include <ctype. h> #include <stdlib. h> #include <stdio. h> int token; int

Example: Calculator #include <ctype. h> #include <stdlib. h> #include <stdio. h> int token; int pos = 0; void command(void); void expr(void); void term(void); void factor(void); void number(void); void digit(void);

Example: Calculator main() { parse(); return 0; } void parse(void) { get. Token(); command();

Example: Calculator main() { parse(); return 0; } void parse(void) { get. Token(); command(); } void get. Token(void) { token = getchar(); pos++; while (token == ' ') { token = getchar(); pos++; } }

Example: Calculator command expr ‘n’ void command(void) { expr(); match(‘n’); } void match(char c)

Example: Calculator command expr ‘n’ void command(void) { expr(); match(‘n’); } void match(char c) { if (token == c) get. Token(); else error(); }

Example: Calculator expr term { ‘+ ’ term } term factor { ‘*’ factor

Example: Calculator expr term { ‘+ ’ term } term factor { ‘*’ factor } void expr(void) { term(); while (token == '+') { match('+'); term(); } } void term(void) { factor(); while (token == '*') { match('*'); term(); } }

Example: Calculator factor number | ‘(’ expr ‘)’ void factor(void) { if (token ==

Example: Calculator factor number | ‘(’ expr ‘)’ void factor(void) { if (token == '(') { match('('); expr(); match(')'); } else { number(); } } number digit { digit } void number(void) { digit(); while (isdigit(token)) digit(); }

Example: Calculator void digit(void) { if (isdigit(token)) match(token); else error(); } void error(void) {

Example: Calculator void digit(void) { if (isdigit(token)) match(token); else error(); } void error(void) { printf("parse error: position %d: character %cn", pos, token); exit(1); }