CS 3240 Chapter 5 ContextFree Languages Where Are

  • Slides: 29
Download presentation
CS 3240 – Chapter 5 Context-Free Languages

CS 3240 – Chapter 5 Context-Free Languages

Where Are We? Language Machine Grammar Regular Finite Automaton Regular Expression, Regular Grammar Context-Free

Where Are We? Language Machine Grammar Regular Finite Automaton Regular Expression, Regular Grammar Context-Free Pushdown Automaton Context-Free Grammar Recursively Enumerable Turing Machine Unrestricted Phrase. Structure Grammar CS 3240 - Introduction 2

Topics 5. 1: Context-Free Grammars Derivation Trees 5. 2: Parsing and Ambiguity 5. 3:

Topics 5. 1: Context-Free Grammars Derivation Trees 5. 2: Parsing and Ambiguity 5. 3: CFGs and Programming Languages Precedence Associativity Expression Trees CS 3240 - Context-Free Languages 3

A Curious Grammar S ➞ aa. Sa | λ It is not right-linear or

A Curious Grammar S ➞ aa. Sa | λ It is not right-linear or left-linear so it is not a “regular grammar” But it is linear only one variable What is it’s language? CS 3240 - Context-Free Languages 4

A Grammar for n n ab S ➝ a. Sb | λ Deriving aaabbb:

A Grammar for n n ab S ➝ a. Sb | λ Deriving aaabbb: S ⇒ a. Sb ⇒ aa. Sbb ⇒ aaa. Sbbb ⇒ aaabbb CS 3240 - Context-Free Languages 5

Context-Free Grammars Variables aka “non-terminals” Letters from some alphabet, Σ aka “terminals” Rules (“substitution

Context-Free Grammars Variables aka “non-terminals” Letters from some alphabet, Σ aka “terminals” Rules (“substitution rules”) of the form V → s ▪ where s is any string of letters and variables, or λ Rules are often called productions CS 3240 - Context-Free Languages 6

Sample CFGs ancbn anb 2 n anbm, where 0 ≤ n ≤ m ≤

Sample CFGs ancbn anb 2 n anbm, where 0 ≤ n ≤ m ≤ 2 n a n b m, n ≠ m Palindrome (start with a recursive definition) Non-Palindrome Equal anb nam CS 3240 - Context-Free Languages 7

A Grammar for Twice nb(w) = 2⋅na(w) S → a. Sb. S | b.

A Grammar for Twice nb(w) = 2⋅na(w) S → a. Sb. S | b. Sa. S | λ Trace ababbb When building CFGs, remember that the start variable (S) represents a string in the language. So, for example, if S has twice as many b’s as a’s, then so does a. Sb. S, etc. CS 3240 - Pushdown Automata 8

Derivations A derivation is a sequence of applications of grammatical rules, eventually yielding a

Derivations A derivation is a sequence of applications of grammatical rules, eventually yielding a string in the language A CFG can have multiple variables on the right -hand side of a rule Giving a choice of which variable to expand first By convention, we usually use a leftmost derivation CS 3240 - Context-Free Languages 9

A Leftmost Derivation <S> → <NP> <VP> <NP> → the <N> <VP> → <V>

A Leftmost Derivation <S> → <NP> <VP> <NP> → the <N> <VP> → <V> <NP> <V> → sings | eats <N> → cat | song | canary <S> ⇒ <NP> <VP> ⇒ the <N> <VP> ⇒ the canary <V> <NP> ⇒ the canary sings the <N> ⇒ the canary sings the song CS 3240 - Context-Free Languages “sentential forms” (aka “productions”) 10

Derivation Trees aka “Parse Trees” A graphical representation of a derivation The start symbol

Derivation Trees aka “Parse Trees” A graphical representation of a derivation The start symbol is the root Each symbol in the right-hand side of the rule is a child node at the same level Continue until the leaves are all terminals CS 3240 - Context-Free Languages 11

A Derivation Tree CS 3240 - Context-Free Languages 12

A Derivation Tree CS 3240 - Context-Free Languages 12

Ambiguity Section 5. 2 Note how there was only one parse tree or the

Ambiguity Section 5. 2 Note how there was only one parse tree or the string “the canary sings the song” And only one leftmost derivation This is not true of all grammars! Some grammars allow choices of distinct rules to generate the same string Or equivalently, where there is more than one parse tree for the same string Such a grammar is ambiguous Not easy to process programmatically CS 3240 - Context-Free Languages 13

An Ambiguous Grammar Derivation Perspective <exp> → <exp> + <exp> | <exp> * <exp>

An Ambiguous Grammar Derivation Perspective <exp> → <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c <exp> ⇒ <exp> + <exp> ⇒ a + <exp> * <exp> ⇒ a + b * <exp> ⇒a+b*c <exp> ⇒ <exp> * <exp> ⇒ <exp> + <exp> * <exp> ⇒ a + <exp> * <exp ⇒ a + b * <exp> ⇒a+b*c CS 3240 - Context-Free Languages 14

An Ambiguous Grammar Parse Tree Perspective Which one is “correct”? CS 3240 - Context-Free

An Ambiguous Grammar Parse Tree Perspective Which one is “correct”? CS 3240 - Context-Free Languages 15

Parsing The process of determining if a string is generated by a grammar And

Parsing The process of determining if a string is generated by a grammar And often we want the parse tree So that we know the order of operations Top-down Parsing Easiest conceptually Bottom-up Parsing Most efficient (used by commercial compilers) We will use a simple one in Chapter 6 CS 3240 - Context-Free Languages 16

Top-Down Parsing Try to match a string, w, to a grammar If there is

Top-Down Parsing Try to match a string, w, to a grammar If there is a rule S → w, we’re done! Fat chance : -) Try to find rules that match the first character A “look-ahead” strategy This is what we do “in our heads” anyway Repeat on the rest of the string… Very “brute force” CS 3240 - Context-Free Languages 17

Top-Down Parsing Example S → SS | a. Sb | b. Sa | λ

Top-Down Parsing Example S → SS | a. Sb | b. Sa | λ Parse “aabb”: CS 3240 - Context-Free Languages 18

Top-Down Parsing Example S → SS | a. Sb | b. Sa | λ

Top-Down Parsing Example S → SS | a. Sb | b. Sa | λ Parse “aabb”: Candidate rules: 1) S → SS, 2) S → a. Sb: 1) SS ⇒ SSS, SS ⇒ a. Sb. S 2) a. Sb ⇒ a. SSb, a. Sb ⇒ aa. Sbb Answer: S ⇒ a. Sb ⇒ aa. Sbb ⇒ aabb (2) Not a well-defined algorithm (yet)! CS 3240 - Context-Free Languages 19

Parsing by Recursive Descent A top-down parsing technique Grammar Requirements: no ambiguity no lambdas

Parsing by Recursive Descent A top-down parsing technique Grammar Requirements: no ambiguity no lambdas no left-recursion (e. g. , A -> Ab) … and some other stuff Create a function for each variable Check first character to choose a rule Start by calling S( ) CS 3240 - Context-Free Languages 20

Parsing anbn, n > 0, by Recursive Descent Grammar: S -> a. Sb |

Parsing anbn, n > 0, by Recursive Descent Grammar: S -> a. Sb | ab Function S: if length == 2, check to see if it is “ab” otherwise, consume outer‘a’ and ‘b’, then call S on what’s left See parseanbn. py, parseanbn 2. py CS 3240 - Context-Free Languages 21

Parsing b*a by Recursive Descent Grammar: A -> BA | a B -> b.

Parsing b*a by Recursive Descent Grammar: A -> BA | a B -> b. B | b See parsebstara. cpp CS 3240 - Context-Free Languages 22

The Problem with λ Lambda rules can cause productions to shrink Then they can

The Problem with λ Lambda rules can cause productions to shrink Then they can grow, and shrink again And grow, and shrink, and grow, and shrink… How then can we know if the string isn’t in the language? That is, how do we know when we’re done so we can stop and reject the string? CS 3240 - Context-Free Languages 23

Another Problem “Unit Production Rules” A rule of the form A → B doesn’t

Another Problem “Unit Production Rules” A rule of the form A → B doesn’t increase the size of the sentential form Once again, we could spend a long time cycling through unit rules before parsing |w| We prefer a method that always strictly grows to |w|, so we can stop and answer “yes” or “no” efficiently So, we will remove lambda and unit rules In Chapter 6 CS 3240 - Context-Free Languages 24

CFGs and Programming Languages Section 5. 3 Precedence Associativity CS 3240 - Context-Free Languages

CFGs and Programming Languages Section 5. 3 Precedence Associativity CS 3240 - Context-Free Languages 25

Fixing Our Expression Grammar Precedence It was ambiguous because it treated all operators equally

Fixing Our Expression Grammar Precedence It was ambiguous because it treated all operators equally But multiplication should have higher precedence than addition So we introduce a new variable for multiplicative expressions And place it further down in the rules Because we want it to appear further down in the parse tree CS 3240 - Context-Free Languages 26

Giving Precedence <exp> → <exp> + <mulexp> | <mulexp> → <mulexp> * <rootexp> |

Giving Precedence <exp> → <exp> + <mulexp> | <mulexp> → <mulexp> * <rootexp> | <rootexp> → (<exp>) | a | b | c Now only one leftmost derivation for a + b * c: <exp> ⇒ <exp> + <mulexp> ⇒ <mulexp> + <mulexp> ⇒ <rootexp> + <mulexp> ⇒ a + <mulexp> * <rootexp> ⇒ a + <rootexp> * <rootexp> ⇒ a + b * <rootexp> ⇒a+b*c CS 3240 - Context-Free Languages 27

Giving Precedence CS 3240 - Context-Free Languages 28

Giving Precedence CS 3240 - Context-Free Languages 28

Associativity Derive the parse tree for a + b + c … Note how

Associativity Derive the parse tree for a + b + c … Note how you get (a + b) + c, in effect Left-recursion gives left associativity Analogously for right associativity Exercise: Add a right-associative power (exponentiation) operator (^, with variable <powerexp>) to the grammar with the proper precedence CS 3240 - Context-Free Languages 29