Classification of Languages Strings and Regular Expression Alphabet

Classification of Languages

Strings and Regular Expression • Alphabet: The set A is not a set of numbers, but some set of symbols. • Strings (words) : The set A* consisting of all finite sequences of letters or other symbols from the set A, written without the commas, are referred as strings. • Empty String: The set A* contains empty string or empty sequence, containing no symbols, and we denote this string by Λ.

• Catenation: If w 1 = s 1 s 2…sn and w 2 = t 1 t 2…tk are elements of A* for some set A, then the catenation of w 1 and w 2 as the sequence s 1 s 2…snt 1 t 2…tk. i. e. w 1 * w 2 = w 1 w 2 = s 1 s 2…snt 1 t 2…tk.

Regular expression • A regular expression over A is a string constructed from the elements of A and the symbols (, ), *, Λ. 1. The symbol Λ is a regular expression. 2. If x Є A, the symbol x is a regular expression. 3. If α and β are regular expressions, then the expression αβ is regular. 4. If α and β are regular expressions, then the expression αvβ is regular. 5. If α is a regular expression, then (α)* is regular.

Example • Following expressions are all regular expressions over A. • (i) 0*(0 v 1)* (ii) 00*(0 v 1)*1

• Eg 1. Let A = {a, b, c}. Then the regular expression a* corresponds to the set of all finite sequence of a’s, such as aaaa, aaaa so on. The regular expression a(bvc) corresponds to the set {ab, ac} ⊆ A*. The regular expression ab(bc)* corresponds to abbc, abbcbcbc, and so on.

Regular Set • Associated with each regular expression over A, there is a corresponding subset A*. Such sets are called regular subsets of A*or just regular sets if no response of A is needed.

Languages • S: Set of words • S*: The collection of all possible sentences. • Syntax: The specification of the proper construction of sentences is called the syntax of a language. • Phase structure grammar: The syntax of a class of languages

Grammar • A Phase structure grammar G is defined to be a 4 –tuple (V, S, v 0 and ↦), where V is a finite set, S is a subset of V, v 0 Є V-S and ↦ is a finite relation on V*. • V consists of S together with some other symbols. • v 0: The element v 0 of V is a starting point of substitution. • The relation ↦ on V* specifies allowable replacements, in the sense that, if w ↦ w’, we may replace w by w’ whenever the string w occurs.

• Production of G: The statement w ↦ w’, is called a production of G. Then w and w’ are called left and right sides of the production, respectively. • Terminal and Non terminal symbols: If G = (V, S, v 0, ↦) is a phase structure grammar, then set S is the set of terminal symbols and N = V-S is the set of non terminal symbols. Note: V = S U N. • Language: The set of all properly constructed sentences that can be produced using a grammar G is called the language of G and is denoted by L(G).

• Eg 1: Let S = { Ramesh, Seema, drives, jogs, carelessly, rapidly, frequently} & N = { sentence, noun, verbphase, verb, adverb} and let V = S U N. Let v 0 = sentence, and suppose that the relation ↦ on V* is described by sentence ↦ noun verbphase noun ↦ Ramesh noun ↦ Seema verbphase ↦ verb adverb ↦ drives verb ↦ jogs adverb ↦ carelessly adverb ↦ rapidly adverb ↦ frequently The set S contains all the allowed words in the language; N consists of words that describe parts of sentences but that are not actually contained in the language. Write the derivation of the sentence “ Seema drives rapidly”. Also draw the derivation tree.

• Eg 2: Let V = { v 0, w, a, b, c}, S = { a, b, c} and let ↦ be the relation on V* given by 1. v 0 ↦ aw 2. w ↦ bbw 3. w ↦ c. Consider the phase structure grammar G = (V, S, v 0, ↦). (i) Derive the sentence ab 6 c. Also draw the derivation tree. (ii) Derive the sentence ab 4 c. Also draw the derivation tree.

• Eg 3: Let V = { v 0, w, a, b, c}, S = { a, b, c} and let ↦ be the relation on V* given by 1. v 0 ↦ aw 2. w ↦ bbw 3. w ↦ c. Consider the phase structure grammar G = (V, S, v 0, ↦). Determine the form of allowable sentences in L(G). Eg: In examples 4 and 5, a grammar G is specified. In each case describe precisely the language, L(G), produced by this grammar; that is, describe all syntactically correct “sentences”.

• 4. G = (v, S, v 0, ↦ ) V = {v 0, v 1, x, y, z}, S = {x, y, z} ↦: vo ↦ xvo, vo ↦ yv 1, v 1 ↦ z. • 5. G = (v, S, v 0, ↦ ) V = {v 0, a, b}, S = {a, b} ↦: vo ↦ aavo, vo ↦ a, v 0 ↦b Eg 6. Construct a phrase structure grammar G such that the language, L(G), of G is equal to the language L. (i) L = {anbn/ n≥ 3} (ii) L = { strings of 0’s and 1’s with an equal number n≥ 0}.

Classification of Phase structure Grammar • Let G = (V, S, v 0, ↦) be a phase structure grammar. Then we say that G is • (i) Type 0: If no restrictions are placed on the production of G. • Type 1. If for any production w 1 ↦ w 2, the length of w 1 is less than or equal to the length of w 2. • Type 2. If the left hand side of each production is a single, non-terminal symbol and the right hand side consists of one or more symbols. • Type 3. : If the left hand side of each production is a single, non-terminal symbol and the right hand side consists of one or more symbols, including at most one non-terminal symbol.

• Type-2 or Type-3 Language: A language will be called typa-2 or type-3 if there is a grammar of type-2 or type-3 that produces it. • BNF Notation: For Type-2 grammars, there are some useful alternative methods of displaying the productions. (Backus-Naur form) • Step 1: The symbol w, remains on the left and all RHS associated with w are listed together, separated by the symbol (∣). • Step 2. The relational symbol ↦ is replaced by the symbol (: : =). • Step 3. The non-terminal symbols, where ever they occur, are enclosed in pointed brackets < >.

• Eg: The following grammar describes the syntax of decimal numbers and can be viewed as a mini-grammar whose corresponding language consists precisely of all properly formed decimal numbers. Let S = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, . } Let V be the union of S with the set N= {decimal-number, decimal fraction, unsigned-integer, digit} Let G be a grammar with symbol sets V and S, with starting symbol “decimal-number” and with productions given in BNF forms as follows.

$• 1. <decimal-number> : : = <unsigned integer>∣ <decimal-fraction>∣ <unsigned integer> <decimal-fraction> •$

• 1. <decimal-number> : : = <unsigned integer>∣ <decimal-fraction>∣ <unsigned integer> <decimal-fraction> • 2. <decimal-fraction> : : =. <unsigned integer> • 3. <unsigned integer> : : = <digit>∣<digit> < unsigned integer> • 4. <digit> : : =0 ∣ 1 ∣ 2 ∣ 3 ∣ 4 ∣ 5 ∣ 6 ∣ 7 ∣ 8 ∣ 9 Example: By using above grammar draw derivation tree for 23. 14

• Syntax diagram: A second alternative method for displaying the production in some type-2 grammar is the syntax diagram

Finite State Machines • Suppose that we have a finite set S = {s 0, s 1, …, sn}, a finite set I, and for each x є I, a function fx: S→S. Let F = {fx/ xє I}. The triplet (S, I, F) is called a finite state machine, S is called the state set of the machine and the elements of S are called states. The set I is called the input set of the machine. Thus , if the machine is in the state si and input x occurs, the next state of the machine will be fx(si). • Since the next fx(si) is uniquely determined by the pair (si, x), there is a function F: S x I → S given by F(si, x) = fx(si).

Moore Machine (Recognition Machine) • It is defined as a sequence (S, I, F, s 0, T), where (S, I, F) constitute a finite-state machine, s 0 Є S and T⊆ S. The state s 0 is called the starting state of M, and it will be used to represent the condition of the machine before it receives any input. The set T is called the set of acceptance state of M. These states will be used in language recognition. • Note: In the digraph of Moore Machine, the acceptance states are indicated with two concentric circles.