AUTOMATA THEORY Chapter 05 CONTEXFREE GRAMMERS AND LANGUAGES
- Slides: 54
AUTOMATA THEORY
Chapter 05 CONTEX-FREE GRAMMERS AND LANGUAGES
Introduction § Context-free grammars (CFG) have played a central role in compiler technology since the 1960’s. § They turned the implementation of parsers, adhoc implementation task. § Parsers: functions that discover the structure of a program.
An informal example § Let us consider the language of palindromes. § A palindrome is a string that reads the same forward and backward, such as otto, madamimadam. § Let’s consider describing only the palindromes with alphabet {0, 1}. EX: 0110, 11011 etc.
A Context-free Grammar for Palindromes 1. 2. 3. 4. 5. P є P 0 P 1 P 0 P 0 P 1 P 1 Only for binary strings.
Definition of CFG § A CFG is a way of describing language by recursive rules called productions. § A CFG consists of … 1. A finite set of symbols/terminal symbols. 2. A finite set of variables/nonterminals. 3. A start symbol/start variable. 4. A finite set of productions/rules.
Definition of CFG (continue) § a. b. c. Each productions consists of: the head of the production symbol The body of the production, a string of zero or more terminals and variables.
Definition of CFG (continue) § The four components of CFG G can be represent as follows: G = (V, T, P, S) Variables terminals productions Start variable
A Context-free Grammar for Palindromes § The grammar G for the palindrome is represented by. . pal G = ({P}, {0, 1}, A, P) pal where A represents the set of five productions: § P є § P 0 § P 1 § P 0 P 0 § P 1 P 1 only for binary string
Example of CFG § A CFG for simple expressions where the operators ‘+’ and ‘*’ present. It allows only the letters ‘a’ and ’b’ and the digits ‘ 0’ and ‘ 1’. Every identifiers must begin with a and b which may be followed by any other string in {a, b, 0, 1}* productions: § G=({E, I}, T, P, E) § E I 6. I b § T={0, 1, a, b, +, *, (, )} § E E+E 7. I Ia § E E*E 8. I Ib § E (E) 9. I I 0 § I a 10 I I 1
Derivation using grammar § 1. 2. 3. 4. 5. 6. 7. 8. 9. (ab+ab 0) E (E)-------4 E (E+E)-----2 E (I+E)------1 E (Ib+E)-----8 E (ab+E)----5 E (ab+I)-----1 E (ab+I 0)-----9 E (ab+Ib 0)----8 E (ab+ab 0)-------5 productions: 1. E I 2. E E+E 3. E E*E 4. E (E) 5. I a 6. I b 7. I Ia 8. I Ib 9. I I 0 10 I I 1
Example of CFG § A CFG for syntactically correct infix algebraic expressions in the variables x, y and z. § G=({S}, T, P, S) § T={x , y, z, -, +, *, /, (, )} productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )
Derivation using grammar productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )
An informal example
An example of CFG
An example of CFG
LMD and RMD § LMD (Left Most Derivation): At each step we replace the left most variable by one of its production bodies. Such a derivation is called a leftmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. lm lm § RMD (Right Most Derivation): At each step we replace the right most variable by one of its production bodies. Such a derivation is called a rightmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. rm rm
Left Most Derivation § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § LMD: a*(a+b 00): § E =>E*E lm=>I*E lm=>a*(E) lm=>a*(E+E) lm=>a*(I+E) lm=>a * (a+E) lm=>a*(a+I 0) lm=>a*(a+I 00) lm=>a*(a+b 00)
Right Most Derivation § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § RMD: a*(a+b 00): § E =>E*E rm=>E*(E) rm=>E*(E+I) rm=>E*(E+I 00) rm=>E * (E+b 00) rm=>E*(I+b 00) rm=>E*(a+b 00) rm=>I*(a+I 00) rm=>a*(a+b 00)
The Language of a Grammar § If G(V, T, P, S) is a CFG, the language of G, denoted L(G), is the set of terminal strings that have derivations from the start symbol. That is, * L(G)={w in T | S w} G If a language L is the language of some context-free grammar, then L is said to be a context-free language, or CFL.
Parse Tree § A tree representation for derivations which shows clearly has the symbols of a terminal string are grouped into substrings. § Parse tree used in a compiler, data structure. § In a compiler, the tree structure of the source program facilities the translation of the source program into executable code by allowing natural, recursive functions to perform this translation process. § Graphical representation for a derivations.
Constructing Parse Tree § 1. 2. 3. Let us fix on a grammar G=(V, T, P, S). The parse trees for G are trees with the following conditions: Each interior node is labeled by a variable V. Each leaf is labeled by either variable, a terminal or є. If an interior node is labeled A, and its children are labeled X 1, X 2…………………. , Xk respectively, from the left, then A X 1 X 2…Xk is a production.
Parse Tree Example § A parse tree showing the derivation of I+E from E. E I E + E
Parse Tree Example (Continue. . ) * § A parse tree showing the derivation P 0110. 1. 2. 3. 4. 5. P є P 0 P 1 P 0 P 0 P 1 P 1 P 0 1 P 1 є
The Yield of a Parse Tree § If we look at the leaves of any parse tree and concatenate them from left, we get a string called the yield of a parse tree, which is always a string that is derived from the root variable. 1. The yield is a terminal string. That is, all leaves are labeled either with a terminal or with є. The root is labeled by the start symbol. 2.
Parse tree showing a*(a+b 00) E E I E * ( E E ) + E a I I b 0 0
Parse tree showing ( x + y ) * x - z * y / ( x + x )
Parse tree showing The man read this book
Inference, Derivations, and Parse Trees Parse Tree Leftmost Derivation Rightmost Derivation Recursive Inference
Self Study § § <5. 2. 4> <5. 2. 5> <5. 2. 6> Theorem 5. 12, 5. 14, 5. 18
Ambiguous Grammar § A grammar uniquely determines a structure for each string in its language. Not every grammar does provide unique structures. § When a grammar fails to provide unique structure, it is known as ambiguous grammar. § More than one derivation/parse tree.
Ambiguous Grammar example § Let us consider a CFG: § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 Expression: a + a*a LMD: E E+E I+E a+ E*E a+ I*E a+ a*I a+ a*a lm lm RMD: E E*I E*a E+I*a E+ a*a I+ a*a a+ a*a rm rm
LMD E E E + E I * I a a Fig: Trees yield a+a*a E I a
RMD E E E I a + E * E I I a a Fig: Trees yield a+a*a
Removing Ambiguity from Grammar § Two causes of ambiguity in the grammar : 1. The precedence of operator is not respected. 2. A sequence of identical operators can group either from the left or from the right.
Two derivation trees for Prof. Busch - LSU 36
take Prof. Busch - LSU 37
Good Tree Bad Tree Compute expression result using the tree Prof. Busch - LSU 38
Removing Ambiguity from Grammar The solution of the problem of enforcing precedence is to introduce several different variables. 1. A factor- is an expression that cannot be broken apart by any adjacent operators. The only factors in our expression language are: i. Identifiers: It is not possible to separate the letters of identifier by attaching an operator. ii. Any parenthesized expression, no matter what appears inside the parenthesis. 2. A term- is an expression that cannot be broken by the ‘+’ operator. Term is product of one or more factors. 3. An expression-is a sum of one or more terms.
Removing Ambiguity from Grammar § Let us consider a CFG: § CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I 0 | I 1 § An unambiguous expression grammar : I a| B| Ia |Ib |I 0 | I 1 F I| (E) T F| T*F E T| E+T
Unambiguous Grammar example CFG: I a| B| Ia |Ib |I 0 | I 1 F I| (E) T F| T*F E T| E+T Expression: a + a*a Derivation: E E+T T+T F+ T I+ T a+ T*F a+ F*F a+ I*I a+ a*a
Inherent Ambiguity Topic 5. 4. 4 L={anbncmdm|n>=1, m>=1}U{anbmcmdm| n>=1, m>=1}
Unambiguous Grammar example E E+T T+T F+ T I+ T a+ T*F a+ F*F a+ I*I a+ a*a E E T F I a T + T F I a * F I a Fig: Trees yield a+a*a
Example of CFG § A CFG for generates prefix expressions with operands x and y and binary operators +, -, *. productions: E → x E → y E → +EE E → -EE E → *EE
Example of CFG § Design A CFG for the set of all strings with an equal number of a’s and b’s. productions: S→ a. Sb. S | b. Sa. S | Є
Example of CFG § Design A CFG on the string length that no string in L(G) has ba as a substring. productions: S→ a. S | Sb | a| b
Example of CFG § Design A CFG for the regular expression 0*1(0+1)*. productions: S→ A 1 B A → 0 A | Є B → 0 B | 1 B| Є
Example of CFG
Application of CFG § § CFG- a way to describe natural language Two of these uses: 1. Parsers 2. Markup language (HTML, XML) § Parsers: § A parse tree-as a graphical representation for derivations. § Parsing is the process of determining if a string of tokens can be generated by a grammar. § A complier may not actually construct a parse tree. However a parser must be capable of constructing such tree. § A parser can be constructed for any grammar. The CFG is an essential concept for the implementation of parsers.
YACC Parser Generator § Tools such as YACC take a CFG as input and produce a parser § Exp: Id {…} | Exp ‘+’ Exp {…} | Exp ‘*’ Exp {…} | ‘(’ Exp ‘)’ {…} Id: ‘a’ {…} |’b’ {…} |Id ‘a’ {…} |Id ‘b’ {…} |Id ‘ 0’ {…} |Id ‘ 1’ {…} ;
Rules for YACC Parser Generator § Rules: 1. 2. 3. 4. 5. Colon is used as the production symbol, Productions-grouped together by the vertical bar List of bodies for a given head ends with semicolon. Terminals are quoted with single quotes Variable names unquoted.
Markup Language § A family of language called markup languages. The string in these languages are documents with certain marks (called tags) in them. § Tags semantics of various string within the documents. § The things I hate : 1. ABC xyz 2. AB ABC XYZ xy a) The text as viewed <P> The things I <EM> hate</EM> <OL> <LI> ABC xyz <LI> AB ABC XYZ xy </OL> b) the HTML source EM Emphasized string P Paragraph OL Ordered Lists LI List Index
1. 2. 3. 4. Char a|A|… Text є |Char Text Doc є|Element Doc Element Text| <EM> Doc </EM>| <P> Doc | <OL> List </OL>| 5. List. Item <LI> Doc 6. List є|List. Item List
Thank You
- Automata theory tutorial
- Automata theory tutorial
- Formal languages and automata theory tutorial
- Automata theory tutorial
- Formal language
- An introduction to formal languages and automata
- Central concept of automata theory
- Generalized transition graph
- Bidirectional transducers in automata theory
- Automata theory
- Automata theory
- Automata theory
- Difference between valid and invalid alphabets in automata
- Pushdown automata
- Automata theory
- Why we study automata theory
- Reverse of a string in automata theory
- Automata theory tutorial
- Theory of computation
- Turing machine algorithm
- Informal language definition
- Formal and informal language difference
- Reduksi dfa
- Cis 262
- Real-time systems and programming languages
- Cs 421
- Modern languages for life and work
- Aural language in media
- Front and back end of compiler
- Proto language
- Defence centre for languages and culture
- Strongly typed vs weakly typed
- List the primitives that specify a data mining task
- Storyboard about media and information literacy
- Advantages of application software
- Sound devices figure of speech
- Real-time systems and programming languages
- Decision properties of context free languages
- Reserval cf-l
- Regular and irregular languages
- Translators and facilities of languages
- School of languages cultures and societies
- Cs 421 uiuc
- Data mining languages and system architecture
- Boliang zhang
- Deterministic finite automaton
- Pda with two stacks
- Ekspresi reguler
- Grammar math
- Uts teori bahasa dan otomata
- Teori bahasa dan automata
- String matching finite automata
- Pushdown automata
- Language
- Push down automata