Syntax and Semantics Lexeme Syntax form or structure

Languages • Language Recognizer – given a sentence, is it in the given language?

BNF (Backus Naur Form) • Equivalent to a context-free language – BNF is a

Examples of Grammar Rules <program> -> begin <stmt_list> end Notice the use of both

Example Grammar <program> -> begin <stmt_list> end <stmt_list> -> <stmt> | <stmt> ; <stmt_list>

A derivation from the grammar <program> => begin <stmt_list> end => begin <stmt> ;

Parse Trees • A hierarchical structure displaying the derivation of an expression in some

Grammar and Parse Tree <assign> <id> = <expr> <id> A | B | C

An Ambiguous Grammar • <assign> <id> : = <expr> • <id> A | B

Two Parse Trees for A = B + A * C <assign> <id> A

An Unambiguous Grammar • <assign> <id> : = <expr> • <id> A | B

Derivation and Parse Tree of the Unambiguous Grammar <assign> <id> = <expr> A =

Associativity • We maintain operator precedence through additional rules whereby higher precedence operators appear

Ambiguous If-Then-Else • <if_stmt> if <logical expr> then <stmt> | if <logical expr> then

Unambiguous If-Then-Else • <stmt> <matched> | <unmatched> • <matched> if <logical expr> then <matched>

Extended BNF Grammars • 3 common extensions to BNF: – [ ] - used

Attribute Grammars • It is not possible to describe all aspects of a language

Attribute Grammar Features • Synthesized attributes – information passed up the parse tree •

Example: Deriving Identifiers • In some languages, identifier names are limited – Here, we

Example: Assignment Stmt • Our grammar now becomes: • Attributes: – Actual_Type • synthesized

Example <assign> <expr> expected_type <var> actual_type <var>[2] A = A Assume A is a

Dynamic Semantics • Describing the meaning of a program or of a statement or

Operational Semantics • This can be thought of as “tracing” through a program to

Axiomatic Semantics • Used mainly to prove correctness of code – Each statement in

Weakest pre-condition • We will start with a given post-condition and derive the weakest

Assignment Statement Rule • We will use the following notation for an assignment statement

Sequences • In general, a series of statements S 1, S 2, S 3,

Rule of Consequence • In the previous example, we had a sequence of 2

Selection Axiomatic Semantic • Given a statement: if (B) S 1; else S 2;

Logical Pretest Loops • Our pre-test loop looks like this: – while (B) S;

While-Loop Example • Our loop is – while (y != x) y++; • Our

Two More Loop Examples while s > 1 do s = s / 2

Program Proofs • As you can see by the last example, finding an invariant

Denotational Semantics • This form of semantics is a more rigorous method of describing

Simple Examples • We define the value of a binary number – <bin_num> 0

Assignment and Loop Assignment Statements Ma(x : = E, s) if Me(E, s) =

Slides: 36

Download presentation

Syntax and Semantics • Lexeme • Syntax – form or structure of expressions or statements for a given language • For instance, in Java, the form of a while loop is – while(<bool_expr) <stmt> • Semantics – meaning of the expressions • Language – group of words that can be combined and the rules for combining those words • Sentence – a legal statement in the language – lowest level syntactic unit in the language • Token – a language category for the lexemes Example (in Java): index = 2 * count + 17; Lexeme: index = 2 * count + 17 ; Token: identifier assignment_operator integer_literal mult_operator identifier addition_operator integer_literal semicolon

Languages • Language Recognizer – given a sentence, is it in the given language? • Language Generator – given a language, create legal and meaningful sentences • We can build a language recognizer if we already have a language generator • Grammar – a description of a language can be used for generation or, given the grammar, a language recognizer (known as a parser) can be created • We classify languages into one of four categories: – – Regular Context-Free Context-Sensitive Recursively Enumerable • Here, we are interested in the context-free grammar – these include those which can be generated from a language generator – all natural languages and programming languages fall into this category – You will study the other languages in more detail in 485

BNF (Backus Naur Form) • Equivalent to a context-free language – BNF is a notation (or a meta-language) used to specify the grammar of a language • The BNF can then be used for language generation or recognition • BNF uses rules to map non-terminal symbols (tokens) into other nonterminals and terminals (lexemes) • We define a BNF Grammar as – G={alphabet, rules, <start>} • alphabet consists of those symbols used in the rules – both terminal symbols and non-terminal symbols, non-terminal symbols are placed in < > • rules map from a non-terminal to other elements in the alphabet – for instance, a rule might say <A> a<B> | b<A> – rules can be recursive as shown above where an <A> can be applied to generate a terminal (b) and another <A> • <start> is a non-terminal which is the starting point for a language generator and must be on at least 1 rule’s left hand side

Examples of Grammar Rules <program> -> begin <stmt_list> end Notice the use of both terminals and non-terminals on the right side Recursion is used as necessary <ident_list> -> ident | ident, <ident_list> The | symbol means “or” so that an <ident_list> can be a single ident, or an ident, a comma, and more of an ident_list Other examples are: <assign> -> <var> = <expression> <if_stmt> -> if <logical_expr> then <stmt> | if <logical_expr> then <stmt> else <stmt>

Example Grammar <program> -> begin <stmt_list> end <stmt_list> -> <stmt> | <stmt> ; <stmt_list> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <var> + <var> | <var> - <var> | <var> This grammar can be used to generate a program (granted, a program that will only consist of assignment statements) or we can use the grammar to generate a recognizer to recognize if a given program is syntactically valid in this language A derivation is a generation, starting at the <start> symbol (in this case, the start symbol is <program>) and applying rules until all non-terminal symbols have been removed from the generated sentence. Such a sentence will be a legal sentence in the language

A derivation from the grammar <program> => begin <stmt_list> end => begin <stmt> ; <stmt>_list> end => begin <var> = <expression> ; <stmt_list> end => begin A = <var> + <var> ; <stmt_list> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end So, the program begin We generate a leftmost derivation, where the A = B + C; next rule applied is applied to the leftmost B=C non-terminal symbol (which is why we end worked on the first assignment statement is legal before we generated the second <stmt>)

Parse Trees • A hierarchical structure displaying the derivation of an expression in some grammar • Leaf nodes are terminals, non-leaf nodes are nonterminals • Parser – Process which takes a sentence and breaks it into its component parts, deriving a parse tree – If the parser cannot generate a parse tree, then the sentence is not legal – If the parser can generate two or more parse trees for the same sentence, then the grammar is ambiguous

Grammar and Parse Tree <assign> <id> = <expr> <id> A | B | C <expr> <id> + <expr> | <id> * <expr> | (<expr>) <id> | <id> A <assign> = <expr> <id> * B <expr> ( <expr> ) Parse tree for the derivation <assign> <id> = <expr> A = <id> * <expr> A = B * (<expr>) A = B * (<id> + <expr>) A = B * (A + <id> ) A = B * (A + C) <id> A + <expr> <id> C

An Ambiguous Grammar • <assign> <id> : = <expr> • <id> A | B | C • <expr> + <expr> | <expr> * <expr> | (<expr>) | <id> • With this grammar, the sentence – <assign> A = B + A * C – has two distinct parse trees • see next slide – The reason this is important is that the second parse tree represents an interpretation of the expression where + has higher precedence than * which would give us an incorrect answer

Two Parse Trees for A = B + A * C <assign> <id> A = <id> <expr> <id> B + = A <expr> * <expr> + * <expr> <id> C <id> A C B A The lower down the operator in the parse tree, the higher its precedence so on the left, * has a higher precedence than + (which is as it should be) but the tree on the right is incorrect, essentially being A = (B + A) * C even though there are no parentheses specified to alter the precedence

An Unambiguous Grammar • <assign> <id> : = <expr> • <id> A | B | C • <expr> + <term> | <term> • <term> * <factor> | <factor> • <factor> (<expr>) | <id> Here, we force operator precedence by making a multiplication occur lower in the tree by deriving it through an additional rule ( ) having the highest precedence requires the most derivation to get to it Assume we wanted to add another operator, unary – (as in -5), how would we add it? What about adding ** (exponent)?

Derivation and Parse Tree of the Unambiguous Grammar <assign> <id> = <expr> A = <expr> + <term> A = <term> + <term> A = <factor> + <term> A = <id> + <term> A = B + <term> * <factor> A = B + <factor> * <factor> A = B + <id> * <factor> A = B + C * <id> A=B+C*A <id> A = <expr> + <term> * <factor> <id> A B C

Associativity • We maintain operator precedence through additional rules whereby higher precedence operators appear after the application of more rules – Should we worry about associativity? Notice that A + B + C = C + A + B, should we force them to generate the same parse tree? – It doesn’t seem worthwhile, and yet if A, B and C are floats instead of ints, then A + B + C may not equal C + A + B, so associativity should be preserved • How? – We will require that all rules in our BNF be left recursive for left associativity and right recursive for right associativity • Left recursive means that a recursive non-terminal must appear to the left of any other non-terminals – <expr> + <term> is left recursive and – <factor> <expr> ** <factor> is right recursive

Ambiguous If-Then-Else • <if_stmt> if <logical expr> then <stmt> | if <logical expr> then <stmt> else <stmt> • Since a <stmt> could be another <if_stmt> we could generate – if X > 0 then if Y > 0 then X: =0 else X: =Y • The problem is that this is ambiguous: – Is the else the alternative to the first then or the second then (that is, which condition does the else get attached to? ) – We could use { } to remove the ambiguity but it is better to create an unambiguous grammar

Unambiguous If-Then-Else • <stmt> <matched> | <unmatched> • <matched> if <logical expr> then <matched> else <matched> | any non-if statement • <unmatched> if <logical expr> then <stmt> | if <logical expr> then <matched> else <unmatched> Here, an if-then with a nested if-then-else is allowed, but an if-then-else where then-clause contains an if-then is not allowed (the then and else clauses must be matched, which means either another if-then-else, or a non-if statement In this way, any else clause is always associated with the most recent then clause Most languages follow this grammar, or require explicit delimiters (like { })

Extended BNF Grammars • 3 common extensions to BNF: – [ ] - used to denote optional elements (saves some space so that we don’t have to enumerate options as separate possibilities) – { } - used to indicate 0 or more instances – ( ) - for a list of choices • These extensions are added to a BNF Grammar for convenience allowing us to shorten the grammar Here, we revise our <expr> portion of the grammar to illustrate how much easier it is to notate the grammar using EBNF • BNF: <expr> + <term> | <expr> - <term> | <term> * <factor> | <term> / <factor> | <factor> <exp> ** <factor> | <exp> (<expr>) | <id> • EBNF: <expr> {(+ | -) <term>} <term> {(* | /) <factor>} <factor> <exp> { ** <factor>} <exp> (<expr>) | <id>

Attribute Grammars • It is not possible to describe all aspects of a language solely with a BNF Grammar – BNF Grammar lacks static semantics (that is, rules that the language dictates for a program to be syntactically correct) – For example: • Making sure that the number of parameters in a function call match the number of parameters in the function header • Making sure in an assignment statement that the left hand side type matches (or is compatible with) the value computed by the right hand side’s expression • Attribute grammars are added to BNF grammars to handle these gaps – We will add attributes and predicate functions to every BNF grammar rule – the attributes will store such information as variable type or number of parameters and the predicates will test to make sure that the attributes match accordingly

Attribute Grammar Features • Synthesized attributes – information passed up the parse tree • Inherited attributes – information passed down the parse tree • Semantic functions – rules or predicates associated with grammar rules that compare synthesized attributes to inherited attributes • if any predicate fails its test, then we have a syntax error • Intrinsic attributes – leaf node attributes derived when a rule generates a terminal • for instance, if <type> int, then the attribute for the declared variable receives its intrinsic value (whatever value we use to denote that the variable is an int)

Example: Deriving Identifiers • In some languages, identifier names are limited – Here, we look at Pascal where an identifier name must start with a letter or _, consist of _, letters and numbers, and be less than or equal to 31 characters in length • Our BNF rule for deriving an identifier is – <identifier> _<id> | <letter><id> – <id> _ | <letter> | <digit> | _<id> | <letter><id> | <digit><id> • We enhance our grammar with the attribute length – – – <identifier> _<id> | <letter><id> <identifier>. length = 1 <id> _ | <letter> | <digit> | _<id> | <letter><id> | <digit><id> <identifier>. length <identifer>. length + 1 Predicate: <identifer>. length <= 31

Example: Assignment Stmt • Our grammar now becomes: • Attributes: – Actual_Type • synthesized for <var> and <expr>, stores the type – Expected_Type • inherited for <expr> based on the <var> type – LHS_Type • synthesized for <assign> – Env • inherited for <assign>, <expr>, <var> carrying the reference to the symbol table – 1. Syntax rule: <assign> <var> = <expr> – Semantic rule: <expr>. expected_type <var>. actual_type – 2. Syntax rule: <expr> <var>[2] + <var>[3] – Semantic rule: <expr>. actual_type if (<var>[2]. actual_type = int) and (<var>[3]. actual_type = int) then int else real – Predicate: <expr>. actual_type = <expr>. expected_type – 3. Syntax rule: <expr> <var> – Semantic rule: <expr>. actual_type <var>. actual_type – Predicate: <expr>. actual_type = <expr>. expected_type – 4. Syntax rule: <var> A | B | C – Semantic rule: <var. actual_type lookup(<var>. string)

Example <assign> <expr> expected_type <var> actual_type <var>[2] A = A Assume A is a float and B is an int: actual_type <var>[3] + B <expr>. expected_type inherited from parent <var>[1]. actual_type lookup (A) <var>[2]. actual_type lookup (B) <var>[1]. actual_type =? <var>[2]. actual_type <expr>. actual_type <var>[1]. actual_type <expr>. actual_type =? <expr>. expected_type <var>. actual_type = float <var>[2]. actual_type = float <var>[3]. actual_type = int <expr>. actual_type = float (derived from var[2] and var[3] through semantic rule) <expr>. expected_type = float (inherited from <assign> which is inherited from <var>) <expr>. expected_type = <expr>. actual_type, so predicate is satisifed, no syntax error

Dynamic Semantics • Describing the meaning of a program or of a statement or group of statements – Describing the syntax of a language or of a set of code is relatively easy, how do we describe the meaning behind code? • We could express it in English (e. g. , through comments) but this is too informal and perhaps too incomplete/imprecise • What if we want to use the semantics to make sure that the program does what is intended? This is known as verification. We would need more formal methods of defining semantics for this, so we turn to: • Operational Semantics – how the statement will be executed • Axiomatic Semantics – what results to expect from the statement • Denotational Semantics – functional way of mapping the affects of a statement

Operational Semantics • This can be thought of as “tracing” through a program to see what affects an instruction will have • Implemented as an interpreter or compiler or assembler – that is, how will the computer execute this instruction? • This is simply a mechanistic description of the statement and does not necessarily help us understand the statement Example: C for-loop for(expr 1; expr 2; expr 3) stmt; Becomes: loop: out: expr 1; if expr 2 = 0 goto out stmt; expr 3; goto loop …

Axiomatic Semantics • Used mainly to prove correctness of code – Each statement in the language has associated assertions – what we expect to be true before and after the statement executes – We list these assertions as pre- and post-conditions that specify how the machine changes (changes to variables) – Given the state of the machine prior to executing a statement, we can then determine what must be true afterward • The basic form of an axiomatic semantic is {P} S {Q} • This is interpreted as: – if P is true before S, then Q is true after S – We must now define how to determine Q given P and S

Weakest pre-condition • We will start with a given post-condition and derive the weakest pre-condition – We work backwards mainly because we will start with an overall goal in mind for the given statement or program – We want to derive the weakest pre-condition for a given postcondition because this is the least restrictive pre-condition that will guarantee validity • Weakest means most general – what is the greatest range of values for a given variable such that the result will be true? • For example, consider the assignment statement – sum = 2*x+1; • with post-condition {sum > 1} • Possible pre-conditions are {x > 10}, {x > 50} and {x > 1000} • But the weakest pre-condition is {x > 0}

Assignment Statement Rule • We will use the following notation for an assignment statement axiomatic rule: – {Qx E} x = E {Q} • This is read as follows: – If Q is true after the assignment, then Qx E is true prior • The notation Qx E means to replace all instances of x in Q with E – Examples: • a=b/2 -1; {a < 10} – We replace a in {a < 10} with b / 2 – 1 and solve for b, thus {Qx E} is {b / 2 – 1 < 10} or {b < 22} – So we have: {b < 22} a = b / 2 – 1; {a < 10} – that is, if b < 22 prior to the assignment statement, then a will be less than 10 afterward • x = 2 * y – 3; {x > 25} – pre-condition is {2 * y – 3 > 25} or {y > 14} • c = d * e – 4; {c > 0} – pre-condition is {d * e – 4 > 0} or {d * e > 4}, we might want to list this as {d > 4 / e} or {e > 4 / d}, or even {d > 4 / e & d != 0 & e != 0}

Sequences • In general, a series of statements S 1, S 2, S 3, . . . , Sn can be expressed as: – {P} S 1 {Q 1}; {Q 1} S 2 {Q 2} ; {Q 2} S 3 {Q 3}; . . . {Qn} Sn {Q} – This can be simplified to {P} S 1, S 2, S 3, . . . , Sn{Q} – Therefore, we can combine rules to show the axiomatic semantics of a block of code • Example: – y = 3 * x + 1; x = y + 3; • If our post-condition is {x < 10} then our pre-condition between the two statements is {y+3 < 10} or {y < 7} and our pre-condition before the first statement is {3 * x + 1 < 7} or {x < 2} – If x < 2 before the first statement, then x < 10 after the second statement

Rule of Consequence • In the previous example, we had a sequence of 2 assignment statements, but this works in general with any number of statements of any kind • The rule of consequence is shown as follows: => means “implies” • The rule means that if P’ implies P and Q implies Q’ and we have already proven that {P} S {Q} is true, then we can infer {P’} S {Q’} is also true – notice that P’ => P means that P is a weaker condition than P’ whereas Q => Q’ means that Q is a stronger condition than Q’ • this allows us to take a proven rule and weaken its postcondition and strengthen its precondition

Selection Axiomatic Semantic • Given a statement: if (B) S 1; else S 2; • The semantic rule is: {B & P} S 1 {Q}, {(!B) & P} S 2 {Q} – if Q is our post-condition, then we have two pre-conditions, if the if statement’s condition is true (B) then B & P, and if the if statement’s condition is false (Not B) then !B & P, so we must derive P that will allow the same post-condition no matter if B or !B is true • Example: – if (x > 0) y--; else y++; • Suppose the post-condition is {y > 0} – the pre-condition for the if-clause is {y > 1} – the pre-condition for the else-clause is {y > -1} – the condition {y > 1} is subsumed by the condition {y > -1} (that is, if {y > 1} is true, then {y > -1} must also be true • So, we select {y > 1} as our weakest pre-condition – we cannot use {y > -1} because, if x > 0 and y = -1, our post-condition is not true

Logical Pretest Loops • Our pre-test loop looks like this: – while (B) S; – We must derive a pre-condition that is true prior to the loop whether it B is true or not, but also the pre-condition must remain true if B is true and S is executed – that is, P must be true prior to each loop iteration • To derive P, we create I, a “loop invariant” – The invariant will always be true both before and after each loop iteration • The pre-condition must include {I} and the postcondition must include {I and Not B} – NOTE: determine a loop invariant is difficult and does not necessarily seem to help us understand the loop • For these reasons, we will go over an example, but not cover this in any more detail

While-Loop Example • Our loop is – while (y != x) y++; • Our post-condition is {y = x} – the post-condition states that the condition (y != x)is false • The pre-condition must include the condition (y!= x) and the loop invariant – what is the invariant? We need to select something that will be true both before and after each loop iteration • notice that y initially will not equal x and then we add 1 to x, so that y < x or y = x after each loop iteration • we cannot have y > x before the loop because this would be an infinite loop and that would result in the post-condition never being true – since the post-condition must be true, y > x can not be true beforehand – either y < x or y = x will be true, our loop-invariant is {y <= x}

Two More Loop Examples while s > 1 do s = s / 2 end {s = 1} What is the weakest precondition? (wp) Let’s apply the loop one time: if s > 1 then for s = 1 afterward, we would have s = s / 2 {s = 1} our wp is {s = 2} for 2 iterations, we would then have s = s / 2 {s = 2} so our wp is {s = 4} for 3 iterations, we would then have s = s / 2 {s = 4} so our wp is {s = 8} We can now derive the invariant as being s is a non-negative power of 2 or {s = 2 n for n >= 0} The following code computes z = x * y assuming y is positive z=0; n=y; while(n>0) { z+=x; n--; } So, our post-condition is {z=x*y & y > 0 & n=0} where n=0 is NOT B and z=x*y is P. I, our loop invariant, is not merely y > 0 however. If we analyze each loop iteration for z and n, we find that z=x*(y-n) and n>=0 A precondition then is {s > 1 and s = 2 n} but this is not the weakest, we can make it weaker by using {s > 1}

Program Proofs • As you can see by the last example, finding an invariant is not necessarily easy – the invariant must include the loop’s terminating condition but also be weak enough to describe what happens during each loop iteration – in using axiomatic semantics for a loop, the requirement that the invariant include the terminating condition is often ignored • in such a case, the axiomatic description is known as offering only partial correctness rather than total correctness if the terminating condition is met • By combining these axiomatic rules, we can prove the correctness of an entire program – consider the example below which swaps two variable values – our precondition requires that the two variable values have in fact been swapped, now we will prove it {x = A AND y = B} t = x; x = y; y = t; {x = B AND y = A} {P} t = x; {P 1}, {P 1} x = y; {P 2}, {P 2} y = t; {x = B AND y = A} {P 2} is {x = B AND t = A} {P 1} is {y = B AND t = A} {P} is {y = B AND x = A} and {y = B AND x = A} => {x = A AND y = B} The chapter offers a more complete example if you are interested

Denotational Semantics • This form of semantics is a more rigorous method of describing the meaning of a program than our previous approaches – Denotational semantics is based on recursive function theory – That is, derive a function that defines the affects of an instruction – Because the function is recursive, this tends to be a very difficult topic, probably the hardest thing when studying programming languages – In essence, the function will map from an instance of a mathematical object (the state of the machine) onto another mathematic object • so this is telling us what happens to the machine after applying an instruction (or program) – We will look at an example of a recursive function and then apply the idea to 3 types of instructions

Simple Examples • We define the value of a binary number – <bin_num> 0 | 1 | <bin_num>0 | <bin_num>1 • that is, a binary number is a 0, a 1, or recursively a binary number followed by a 0 or a binary number followed by a 1 – the function must map from a binary number derived from the above grammar rule to a mathematical object (an integer value) • Mbin(<bin_num) = – – Mbin(0) = 0 Mbin(1) = 1 Mbin(<bin_num>0)=2*Mbin(<bin_num>) Mbin(<bin_num>1)=2*Mbin(<bin_num>) + 1 » Mbin(101) = 2*Mbin(10) + 1 = 2*(2*Mbin(1))+1 = 2*2*1+1 = 5 Expressions Me (E, s) if VARMAP(i, s) = undef for some i in E then error else E’, where E’ is the result of evaluating E after setting each variable i in E to VARMAP(i, s) Expression E, in state s, is an error if some i (variable) in E is undefined, otherwise it is E’ = value of evaluating E by applying each variable I and operator in E using VARMAP (symbol table) currently in s

Assignment and Loop Assignment Statements Ma(x : = E, s) if Me(E, s) = error then error else s’ = {<i 1’, v 1’>, <i 2’, v 2’>, . . . , <in’, vn’>}, where for j = 1, 2, . . . , n, vj’ = VARMAP(ij, s) if ij <> x = Me(E, s) if ij = x Here, the state of the machine is an error if there is an error when evaluating E in s, otherwise the state of the machine is modified where x is now equal to E, but all other variables in s remain the same If B, when evaluating given the state of the machine s, is undefined, Ml(while B do L, s) = then we have an error, otherwise if B evaluates to False, then the state if Mb(B, s) == undef then error remains s, otherwise the state becomes else if Mb(B, s) == falsethen s the state when L is executed, so the else if Msl(L, s) == error state of the machine changes to be then error whatever the function M(L, s) returns else Ml(while B do L, Msl(L, s)) Since L is some non-specified statement, we don’t know exactly what will happen