Chomsky Hierarchy In his landmark paper Chomsky N

  • Slides: 14
Download presentation
Chomsky Hierarchy • In his landmark paper (Chomsky, N. Three models for the description

Chomsky Hierarchy • In his landmark paper (Chomsky, N. Three models for the description of language. IRE Transactions on Information Theory, 1956, IT-2, 113 -124) Chomsky described a hierarchy of • • different types of grammars. Type 3 grammars (regular) describe the smallest set of languages. Type 2 grammars (context-free) describe a larger set of languages that contains the regular languages. Type 1 grammars (context-sensitive) describe a still larger set of languages that contains the context-free languages. Type 0 grammars describe all computable languages. Lecture #7 PLP Spring 2004, UF CISE 1

Chomsky Hierarchy Diagram Type 0 (unrestricted) Type 1 (context-sensitive) Type 2 (context-free) Type 3

Chomsky Hierarchy Diagram Type 0 (unrestricted) Type 1 (context-sensitive) Type 2 (context-free) Type 3 (regular) Lecture #7 PLP Spring 2004, UF CISE 2

Grammars • Each of Chomsky’s grammar types contains grammars of the following form: G

Grammars • Each of Chomsky’s grammar types contains grammars of the following form: G = (V, T, P, S) where – – – G, a grammar is a 4 -tuple consisting of (V, T, P, S). V is a finite set of nonterminal symbols, T is a finite set of terminal symbols, P is a finite set of productions or rewriting rules. S V is the distinguished symbol or the start symbol. • For all grammar types, V, T, and S are identical. The form of the productions P varies. Lecture #7 PLP Spring 2004, UF CISE 3

Type 3 Grammars (The Regular Languages) • P contains rules of the following forms:

Type 3 Grammars (The Regular Languages) • P contains rules of the following forms: A→a. B A→a where A, B V; a T ε. • Although I won’t do it here, one can prove that the set of languages generated by such a grammar is exactly the set of languages generated by regular expressions. (Hint: find a way to write a type 3 grammar rule for each of the base cases of the regular expressions, then figure out a way to write a grammar for each of the recursive rules). Lecture #7 PLP Spring 2004, UF CISE 4

How a Grammar Describes a Langauge • A grammar can be used to generate

How a Grammar Describes a Langauge • A grammar can be used to generate strings. The set of strings generated by a grammar is termed the language of the grammar. • A derivation starts with the start symbol S. • A sequence of terminals and nonterminals generated by a sequence of 0 or more derivation steps from the start symbol is termed a sentential form. • If A is a sentential form and A→ is a production, then A is a derivation step. • * is the transitive, reflexive closure of . That is, A * if there is a sequence of zero of more derivation steps, A 1 2 . . . , deriving from A. • A string X of terminal symbols is said to be in the language of grammar G (denoted LG) if S *X. Lecture #7 PLP Spring 2004, UF CISE 5

Generating a String • Consider the following grammar: – G = ({S}, {a, b},

Generating a String • Consider the following grammar: – G = ({S}, {a, b}, {S→Sa, S→b}, S) • Let’s derive some strings using G. S Sa ba S Sa Saa baa Lecture #7 PLP Spring 2004, UF CISE 6

Type 2 Grammars (The Context-Free Languages) • P contains rules of the following form:

Type 2 Grammars (The Context-Free Languages) • P contains rules of the following form: A→ where (V T)* • Type 2 (or context-free) grammars are typically used to describe the gross-level syntax of computer languages. They have been used as rudimentary forms of grammars for natural languages as well. • The context-free adjective comes by way of comparison with the context-sensitive grammars. Lecture #7 PLP Spring 2004, UF CISE 7

Type 1 Grammars (The Context-Sensitive Languages) • P contains rules of the following form:

Type 1 Grammars (The Context-Sensitive Languages) • P contains rules of the following form: A → where A V; , (V T)*, (V T)+ • Here, the context, or symbols surrounding a nonterminal can constrain the application of a rule. In a context-free language (CFL), the context of a nonterminal never determines whether or not a rule can be applied. • In addition, each rule must be nonreducing, that is, no string in a derivation can be shorter than the string from which it was derived. The special case of a language containing the empty string is handled by allowing the unique rule S→ε. Lecture #7 PLP Spring 2004, UF CISE 8

Type 0 languages • No restrictions are made on the form of the productions.

Type 0 languages • No restrictions are made on the form of the productions. Rules can reduce the length of a string. Terminal symbols can appear alone on the left hand side of a production. No restrictions whatsoever apply. Lecture #7 PLP Spring 2004, UF CISE 9

Association of Grammar Types to Machines • As you already know, regular (type 3)

Association of Grammar Types to Machines • As you already know, regular (type 3) languages can be recognized by finite state machines, or deterministic finite automata (DFAs). • Context-free languages can be recognized by a finite state machine augmented to have a pushdown stack. Such a machine is known as a push-down automaton (PDA). • Context-sensitive languages can be recognized by a PDA with an extra stack. Such a machine is known as a 2 -PDA • Unrestricted languages can be recognized by a Turing machine. Lecture #7 PLP Spring 2004, UF CISE 10

Context-Free Grammars and Parse Trees • Consider a grammar with the following productions: –

Context-Free Grammars and Parse Trees • Consider a grammar with the following productions: – – – – – sentence → noun-phrase verb-phrase noun-phrase → article noun article → a article → the noun → girl noun → dog verb-phrase → verb noun-phrase verb → sees verb → pets • We can abbreviate multiple occurrences of a rule using a vertical bar for alternation: article→a | the noun→girl | dog Lecture #7 PLP Spring 2004, UF CISE 11

Derivations and Parse Trees sentence noun-phrase verb-phrase article noun verb-phrase a girl verb noun-phrase

Derivations and Parse Trees sentence noun-phrase verb-phrase article noun verb-phrase a girl verb noun-phrase a girl sees noun-phrase sentence a girl sees article noun a girl sees the noun-phrase verb-phrase a girl sees the dog article noun verb noun-phrase a girl sees article the Lecture #7 PLP Spring 2004, UF CISE noun dog 12

Leftmost Derivations • The derivation used on the previous viewgraph was a leftmost derivation.

Leftmost Derivations • The derivation used on the previous viewgraph was a leftmost derivation. That is, each derivation step rewrote the leftmost nonterminal symbol. Clearly, two strings having the same leftmost derivation have the same parse tree. • We could have generated the same parse tree using a different derivation, a rightmost derivation, for example. Lecture #7 PLP Spring 2004, UF CISE 13

Grammar Rules and Set Equations • One can think of a grammar rule as

Grammar Rules and Set Equations • One can think of a grammar rule as being a set equation. • Consider this rule: expr→expr + expr | number • Let E represent the set of strings generated by the nonterminal expr, and let N represent the set generated by number. • The equation E = E + E N holds (where + represents the language {+} • This definition is recursive, but E clearly contains N. • Likewise, if E contains N, then it must contain N + N. Similarly, it must contain N+N+N. • Carrying this to its logical conclusion, E = N N+N+N+N . . . = N (+N)*. Lecture #7 PLP Spring 2004, UF CISE 14