AUTOMATA THEORY Chapter 05 CONTEXFREE GRAMMERS AND LANGUAGES

  • Slides: 54
Download presentation
AUTOMATA THEORY

AUTOMATA THEORY

Chapter 05 CONTEX-FREE GRAMMERS AND LANGUAGES

Chapter 05 CONTEX-FREE GRAMMERS AND LANGUAGES

Introduction § Context-free grammars (CFG) have played a central role in compiler technology since

Introduction § Context-free grammars (CFG) have played a central role in compiler technology since the 1960’s. § They turned the implementation of parsers, adhoc implementation task. § Parsers: functions that discover the structure of a program.

An informal example § Let us consider the language of palindromes. § A palindrome

An informal example § Let us consider the language of palindromes. § A palindrome is a string that reads the same forward and backward, such as otto, madamimadam. § Let’s consider describing only the palindromes with alphabet {0, 1}. EX: 0110, 11011 etc.

A Context-free Grammar for Palindromes 1. 2. 3. 4. 5. P є P 0

A Context-free Grammar for Palindromes 1. 2. 3. 4. 5. P є P 0 P 1 P 0 P 0 P 1 P 1 Only for binary strings.

Definition of CFG § A CFG is a way of describing language by recursive

Definition of CFG § A CFG is a way of describing language by recursive rules called productions. § A CFG consists of … 1. A finite set of symbols/terminal symbols. 2. A finite set of variables/nonterminals. 3. A start symbol/start variable. 4. A finite set of productions/rules.

Definition of CFG (continue) § a. b. c. Each productions consists of: the head

Definition of CFG (continue) § a. b. c. Each productions consists of: the head of the production symbol The body of the production, a string of zero or more terminals and variables.

Definition of CFG (continue) § The four components of CFG G can be represent

Definition of CFG (continue) § The four components of CFG G can be represent as follows: G = (V, T, P, S) Variables terminals productions Start variable

A Context-free Grammar for Palindromes § The grammar G for the palindrome is represented

A Context-free Grammar for Palindromes § The grammar G for the palindrome is represented by. . pal G = ({P}, {0, 1}, A, P) pal where A represents the set of five productions: § P є § P 0 § P 1 § P 0 P 0 § P 1 P 1 only for binary string

Example of CFG § A CFG for simple expressions where the operators ‘+’ and

Example of CFG § A CFG for simple expressions where the operators ‘+’ and ‘*’ present. It allows only the letters ‘a’ and ’b’ and the digits ‘ 0’ and ‘ 1’. Every identifiers must begin with a and b which may be followed by any other string in {a, b, 0, 1}* productions: § G=({E, I}, T, P, E) § E I 6. I b § T={0, 1, a, b, +, *, (, )} § E E+E 7. I Ia § E E*E 8. I Ib § E (E) 9. I I 0 § I a 10 I I 1

Derivation using grammar § 1. 2. 3. 4. 5. 6. 7. 8. 9. (ab+ab

Derivation using grammar § 1. 2. 3. 4. 5. 6. 7. 8. 9. (ab+ab 0) E (E)-------4 E (E+E)-----2 E (I+E)------1 E (Ib+E)-----8 E (ab+E)----5 E (ab+I)-----1 E (ab+I 0)-----9 E (ab+Ib 0)----8 E (ab+ab 0)-------5 productions: 1. E I 2. E E+E 3. E E*E 4. E (E) 5. I a 6. I b 7. I Ia 8. I Ib 9. I I 0 10 I I 1

Example of CFG § A CFG for syntactically correct infix algebraic expressions in the

Example of CFG § A CFG for syntactically correct infix algebraic expressions in the variables x, y and z. § G=({S}, T, P, S) § T={x , y, z, -, +, *, /, (, )} productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )

Derivation using grammar productions: S → x S → y S → z S

Derivation using grammar productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )

An informal example

An informal example

An example of CFG

An example of CFG

An example of CFG

An example of CFG

LMD and RMD § LMD (Left Most Derivation): At each step we replace the

LMD and RMD § LMD (Left Most Derivation): At each step we replace the left most variable by one of its production bodies. Such a derivation is called a leftmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. lm lm § RMD (Right Most Derivation): At each step we replace the right most variable by one of its production bodies. Such a derivation is called a rightmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. rm rm

Left Most Derivation § CFG: E I | E+E | E*E| (E) I a|

Left Most Derivation § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § LMD: a*(a+b 00): § E =>E*E lm=>I*E lm=>a*(E) lm=>a*(E+E) lm=>a*(I+E) lm=>a * (a+E) lm=>a*(a+I 0) lm=>a*(a+I 00) lm=>a*(a+b 00)

Right Most Derivation § CFG: E I | E+E | E*E| (E) I a|

Right Most Derivation § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § RMD: a*(a+b 00): § E =>E*E rm=>E*(E) rm=>E*(E+I) rm=>E*(E+I 00) rm=>E * (E+b 00) rm=>E*(I+b 00) rm=>E*(a+b 00) rm=>I*(a+I 00) rm=>a*(a+b 00)

The Language of a Grammar § If G(V, T, P, S) is a CFG,

The Language of a Grammar § If G(V, T, P, S) is a CFG, the language of G, denoted L(G), is the set of terminal strings that have derivations from the start symbol. That is, * L(G)={w in T | S w} G If a language L is the language of some context-free grammar, then L is said to be a context-free language, or CFL.

Parse Tree § A tree representation for derivations which shows clearly has the symbols

Parse Tree § A tree representation for derivations which shows clearly has the symbols of a terminal string are grouped into substrings. § Parse tree used in a compiler, data structure. § In a compiler, the tree structure of the source program facilities the translation of the source program into executable code by allowing natural, recursive functions to perform this translation process. § Graphical representation for a derivations.

Constructing Parse Tree § 1. 2. 3. Let us fix on a grammar G=(V,

Constructing Parse Tree § 1. 2. 3. Let us fix on a grammar G=(V, T, P, S). The parse trees for G are trees with the following conditions: Each interior node is labeled by a variable V. Each leaf is labeled by either variable, a terminal or є. If an interior node is labeled A, and its children are labeled X 1, X 2…………………. , Xk respectively, from the left, then A X 1 X 2…Xk is a production.

Parse Tree Example § A parse tree showing the derivation of I+E from E.

Parse Tree Example § A parse tree showing the derivation of I+E from E. E I E + E

Parse Tree Example (Continue. . ) * § A parse tree showing the derivation

Parse Tree Example (Continue. . ) * § A parse tree showing the derivation P 0110. 1. 2. 3. 4. 5. P є P 0 P 1 P 0 P 0 P 1 P 1 P 0 1 P 1 є

The Yield of a Parse Tree § If we look at the leaves of

The Yield of a Parse Tree § If we look at the leaves of any parse tree and concatenate them from left, we get a string called the yield of a parse tree, which is always a string that is derived from the root variable. 1. The yield is a terminal string. That is, all leaves are labeled either with a terminal or with є. The root is labeled by the start symbol. 2.

Parse tree showing a*(a+b 00) E E I E * ( E E )

Parse tree showing a*(a+b 00) E E I E * ( E E ) + E a I I b 0 0

Parse tree showing ( x + y ) * x - z * y

Parse tree showing ( x + y ) * x - z * y / ( x + x )

Parse tree showing The man read this book

Parse tree showing The man read this book

Inference, Derivations, and Parse Trees Parse Tree Leftmost Derivation Rightmost Derivation Recursive Inference

Inference, Derivations, and Parse Trees Parse Tree Leftmost Derivation Rightmost Derivation Recursive Inference

Self Study § § <5. 2. 4> <5. 2. 5> <5. 2. 6> Theorem

Self Study § § <5. 2. 4> <5. 2. 5> <5. 2. 6> Theorem 5. 12, 5. 14, 5. 18

Ambiguous Grammar § A grammar uniquely determines a structure for each string in its

Ambiguous Grammar § A grammar uniquely determines a structure for each string in its language. Not every grammar does provide unique structures. § When a grammar fails to provide unique structure, it is known as ambiguous grammar. § More than one derivation/parse tree.

Ambiguous Grammar example § Let us consider a CFG: § CFG: E I |

Ambiguous Grammar example § Let us consider a CFG: § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 Expression: a + a*a LMD: E E+E I+E a+ E*E a+ I*E a+ a*I a+ a*a lm lm RMD: E E*I E*a E+I*a E+ a*a I+ a*a a+ a*a rm rm

LMD E E E + E I * I a a Fig: Trees yield

LMD E E E + E I * I a a Fig: Trees yield a+a*a E I a

RMD E E E I a + E * E I I a a

RMD E E E I a + E * E I I a a Fig: Trees yield a+a*a

Removing Ambiguity from Grammar § Two causes of ambiguity in the grammar : 1.

Removing Ambiguity from Grammar § Two causes of ambiguity in the grammar : 1. The precedence of operator is not respected. 2. A sequence of identical operators can group either from the left or from the right.

Two derivation trees for Prof. Busch - LSU 36

Two derivation trees for Prof. Busch - LSU 36

take Prof. Busch - LSU 37

take Prof. Busch - LSU 37

Good Tree Bad Tree Compute expression result using the tree Prof. Busch - LSU

Good Tree Bad Tree Compute expression result using the tree Prof. Busch - LSU 38

Removing Ambiguity from Grammar The solution of the problem of enforcing precedence is to

Removing Ambiguity from Grammar The solution of the problem of enforcing precedence is to introduce several different variables. 1. A factor- is an expression that cannot be broken apart by any adjacent operators. The only factors in our expression language are: i. Identifiers: It is not possible to separate the letters of identifier by attaching an operator. ii. Any parenthesized expression, no matter what appears inside the parenthesis. 2. A term- is an expression that cannot be broken by the ‘+’ operator. Term is product of one or more factors. 3. An expression-is a sum of one or more terms.

Removing Ambiguity from Grammar § Let us consider a CFG: § CFG: E I

Removing Ambiguity from Grammar § Let us consider a CFG: § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § An unambiguous expression grammar : I a| B| Ia |Ib |I 0 | I 1 F I| (E) T F| T*F E T| E+T

Unambiguous Grammar example CFG: I a| B| Ia |Ib |I 0 | I 1

Unambiguous Grammar example CFG: I a| B| Ia |Ib |I 0 | I 1 F I| (E) T F| T*F E T| E+T Expression: a + a*a Derivation: E E+T T+T F+ T I+ T a+ T*F a+ F*F a+ I*I a+ a*a

Inherent Ambiguity Topic 5. 4. 4 L={anbncmdm|n>=1, m>=1}U{anbmcmdm| n>=1, m>=1}

Inherent Ambiguity Topic 5. 4. 4 L={anbncmdm|n>=1, m>=1}U{anbmcmdm| n>=1, m>=1}

Unambiguous Grammar example E E+T T+T F+ T I+ T a+ T*F a+ F*F

Unambiguous Grammar example E E+T T+T F+ T I+ T a+ T*F a+ F*F a+ I*I a+ a*a E E T F I a T + T F I a * F I a Fig: Trees yield a+a*a

Example of CFG § A CFG for generates prefix expressions with operands x and

Example of CFG § A CFG for generates prefix expressions with operands x and y and binary operators +, -, *. productions: E → x E → y E → +EE E → -EE E → *EE

Example of CFG § Design A CFG for the set of all strings with

Example of CFG § Design A CFG for the set of all strings with an equal number of a’s and b’s. productions: S→ a. Sb. S | b. Sa. S | Є

Example of CFG § Design A CFG on the string length that no string

Example of CFG § Design A CFG on the string length that no string in L(G) has ba as a substring. productions: S→ a. S | Sb | a| b

Example of CFG § Design A CFG for the regular expression 0*1(0+1)*. productions: S→

Example of CFG § Design A CFG for the regular expression 0*1(0+1)*. productions: S→ A 1 B A → 0 A | Є B → 0 B | 1 B| Є

Example of CFG

Example of CFG

Application of CFG § § CFG- a way to describe natural language Two of

Application of CFG § § CFG- a way to describe natural language Two of these uses: 1. Parsers 2. Markup language (HTML, XML) § Parsers: § A parse tree-as a graphical representation for derivations. § Parsing is the process of determining if a string of tokens can be generated by a grammar. § A complier may not actually construct a parse tree. However a parser must be capable of constructing such tree. § A parser can be constructed for any grammar. The CFG is an essential concept for the implementation of parsers.

YACC Parser Generator § Tools such as YACC take a CFG as input and

YACC Parser Generator § Tools such as YACC take a CFG as input and produce a parser § Exp: Id {…} | Exp ‘+’ Exp {…} | Exp ‘*’ Exp {…} | ‘(’ Exp ‘)’ {…} Id: ‘a’ {…} |’b’ {…} |Id ‘a’ {…} |Id ‘b’ {…} |Id ‘ 0’ {…} |Id ‘ 1’ {…} ;

Rules for YACC Parser Generator § Rules: 1. 2. 3. 4. 5. Colon is

Rules for YACC Parser Generator § Rules: 1. 2. 3. 4. 5. Colon is used as the production symbol, Productions-grouped together by the vertical bar List of bodies for a given head ends with semicolon. Terminals are quoted with single quotes Variable names unquoted.

Markup Language § A family of language called markup languages. The string in these

Markup Language § A family of language called markup languages. The string in these languages are documents with certain marks (called tags) in them. § Tags semantics of various string within the documents. § The things I hate : 1. ABC xyz 2. AB ABC XYZ xy a) The text as viewed <P> The things I <EM> hate</EM> <OL> <LI> ABC xyz <LI> AB ABC XYZ xy </OL> b) the HTML source EM Emphasized string P Paragraph OL Ordered Lists LI List Index

1. 2. 3. 4. Char a|A|… Text є |Char Text Doc є|Element Doc Element

1. 2. 3. 4. Char a|A|… Text є |Char Text Doc є|Element Doc Element Text| <EM> Doc </EM>| <P> Doc | <OL> List </OL>| 5. List. Item <LI> Doc 6. List є|List. Item List

Thank You

Thank You