Recursive Definitions Regular Expressions RE Recursive Language Definition

Recursive Language Definition l. A recursive definition is characteristically a three-step process: l l

Example: Consider the set P-EVEN, which is the set of positive even numbers. l

Example: l Example: Let PALINDROME be the set of all strings over the alphabet

Recursive Definition of PALINDROME A recursive definition for PALINDROME is as follows: Rule 1

Arithmetic Expressions(AE) l We recursively define AE using the following rules: l What are

Recursive Definition of AE l l Rule 1: Any number (positive, negative, or zero)

l l The above definition is the most natural, because it is the method

Arithmetic Expression AE l l l 9 Obviously, the following expressions are not valid:

Defining Languages Using Regular Expressions l Previously, we defined the languages: • L 1

Regular Expressions l l l l Regular expressions are written in bold face letters

Language-Defining Symbols We now introduce the use of the Kleene star, applied not to

The notation x* can be used to define languages by writing, say L 4

l Given the alphabet = {a, b}, suppose we wish to define the language

l We can apply the Kleene star to the whole string ab if we

l If we want to define the language L 1 = {x; xxx; …}

Plus Sign l Let us introduce another use of the plus sign. By the

Example l l 19 Consider the language T over the alphabet Σ = {a;

Example l Consider a finite language L that contains all the strings of a’s

Example l In general, if we want to refer to the set of all

Regular Expressions = {a, b} l a* = {Λ, a, aaa, aaaa, aaaaa, …}

Regular Expressions l The symbols that appear in the regular expressions are the letters

Formal Definition of Regular Expressions l The set of regular expression is defined by

Regular Expressions l Whether following are RE if so what languages do they generate

Regular Expressions l Write l All RE for the following languages words ending with

Language (Set) operations l If L 1 and L 2 are two languages (set

Product Set l If S and T are sets of strings of letters, we

Example l If S = {a, aaa} and T = {bb, bbb} then ST

Example l If M = {Λ, x, xx} and N = {Λ, y, yyy,

Regular Languages l The languages defined by a regular expression are called regular languages

Languages Associated with Regular Expressions

Definition l The following rules define the language associated with any regular expression: l

Definition contd. l Rule 2 (cont. ): (ii) The regular expression r 1 +

Languages associated with REs l r 1 = a, r 1 = b, r

Regular Languages l How to tell whether a language is regular l Define an

Regular Languages l Example l Consider the language L, defined over Σ = {a,

Regular Languages l All l finite languages are regular Example l Consider the language

Theorem l If L is a finite language (a language with only finitely many

More REs l EVEN-EVEN ( = {a, b}) l Language of all words having

Regular Expressions l EVEN-EVEN l RE ( = {a, b}) sets l (aa+bb)* l

Equivalent Regular Expressions Definition l Two regular expressions are said to be equivalent if

Equivalent Regular Expressions l Note If r 1 = (aa + bb) and r

Example l Consider the language defined by the expression (a + b)*a(a + b)*

Example contd. This language is the set of all words over the alphabet Σ

Example l The language of all words that have at least two a’s can

Example The language of all words that have at least one a and at

Example l We can add these specific exceptions. So, the language of all words

Example l In the above example, the language of all words that contain both

l Thus (a + b)* = (a + b)*a(a + b)*b(a + b)* +

Example l The following equivalences show that we should not treat expressions as algebraic

Example Let V be the language of all strings of a’s and b’s in

Example contd. l Hence, (Λ + a)b* = b* + ab* l Since b*

Product Set l If S and T are sets of strings of letters (whether

Slides: 56

Download presentation

Recursive Definitions & Regular Expressions (RE)

Recursive Language Definition l. A recursive definition is characteristically a three-step process: l l l 1. First, we specify some basic objects in the set. The number of basic objects specified must be finite. 2. Second, we give a finite number of rules for constructing more objects in the set from the ones we already know. 3. Third, we declare that no objects except those constructed in this way are allowed in the set.

Example: Consider the set P-EVEN, which is the set of positive even numbers. l We can define the set P-EVEN in several different ways: • We can define P-EVEN to be the set of all positive integers that are evenly divisible by 2. • P-EVEN is the set of all 2 n, where n = 1, 2, . . l P-EVEN is defined by these three rules: l Rule 1 2 is in P-EVEN. l Rule 2 If x is in P-EVEN, then so is x + 2. l Rule 3 The only elements in the set P-EVEN are those that can be produced from the two rules above. l

Example: l Example: Let PALINDROME be the set of all strings over the alphabet = {a, b} that are the same spelled forward as backwards; i. e. , PALINDROME = {w : w = reverse(w)} = {, a, b, aa, bb, aaa, aba, bab, bbb, aaaa, abba, . . . }.

Recursive Definition of PALINDROME A recursive definition for PALINDROME is as follows: Rule 1 , a, and b are in PALINDROME. l Rule 2 If w 2 PALINDROME, then so are awa and bwb. l Rule 3 No other string is in PALINDROME unless it can be produced by rules 1 and 2. l

Arithmetic Expressions(AE) l We recursively define AE using the following rules: l What are the rules?

Recursive Definition of AE l l Rule 1: Any number (positive, negative, or zero) is in AE. Rule 2: If x is in AE, then so are (i) (x) (ii) -x (provided that x does not already start with a minus sign) l Rule 3: If x and y are in AE, then so are (i) x + y (if the first symbol in y is not + or -) (ii) x - y (if the first symbol in y is not + or -) (iii) x * y (iv) x / y (v) x ** y (our notation for exponentiation) 7 Theory Of Automata

l l The above definition is the most natural, because it is the method we use to recognize valid arithmetic expressions in real life. For instance, we wish to determine if the following expression is valid: (2 + 4) * (7 * (9 - 3)/4)/4 * (2 + 8) - 1 We do not really scan over the string, looking forbidden substrings or count the parentheses. We actually imagine the expression in our mind broken down into components: Is (2 + 4) OK? Yes Is (9 - 3) OK? Yes 8 Theory Of Automata

Arithmetic Expression AE l l l 9 Obviously, the following expressions are not valid: (3 + 5) + 6) 2(/8 + 9) (3 + (4 -)8) The first contains unbalanced parentheses; the second contains the forbidden substring /; the third contains the forbidden substring -). Are there more rules? The substrings // and */ are also forbidden. Are there still more? The most natural way of defining a valid AE is by using a recursive definition, rather than a long list of forbidden substrings. Theory Of Automata

Regular Expressions

Defining Languages Using Regular Expressions l Previously, we defined the languages: • L 1 = {Xn for n = 1, 2, 3, . . . } • L 2 = {x, xxxxx, . . . } l But these are not very precise ways of defining languages. l So we now want to be very precise about how we define languages, and we will do this using regular expressions

Regular Expressions l l l l Regular expressions are written in bold face letters and are a way of specifying the language. Formal way to define the lexical specifications of a language Remove ambiguity altogether Called expressions on account of similarity with arithmetic expressions l Use *, + and () * shows repetition + presents choice or disjunction () used for grouping

Language-Defining Symbols We now introduce the use of the Kleene star, applied not to a set, but directly to the letter x and written as a superscript: x*. l This simple expression indicates some sequence of x’s (may be none at all): x* = Λ or x 2 or x 3… = xn for some n = 0, 1, 2, 3, … l l Letter x is intentionally written in boldface type to distinguish it from an alphabet character. l We can think of the star as an unknown power. That is, x* stands for a string of x’s, but we do not specify how many, and it may be the null string. 13

The notation x* can be used to define languages by writing, say L 4 = language (x*) l Since x* is any string of x’s, L 4 is then the language of all possible strings of x’s of any length (including Λ). l l 14 We should not confuse x* (which is a language -defining symbol) with L 4 (which is the name we have given to a certain language).

l Given the alphabet = {a, b}, suppose we wish to define the language L that contains all words of the form one a followed by some number of b’s (maybe no b’s at all); that is L = {a, abb, abbbb, …} l Using the language-defining symbol, we may write L = language (ab*) l This equation obviously means that L is the language in which the words are the concatenation of an initial a with some or no b’s. l From now on, for convenience, we will simply say some b’s to mean some or no b’s. When we want to mean some positive number of b’s, we will explicitly say so. 15

l We can apply the Kleene star to the whole string ab if we want: (ab)* = Λ or ababab… l Observe that (ab)* ≠ a*b* l because the language defined by the expression on the left contains the word abab, whereas the language defined by the expression on the right does not. 16

l If we want to define the language L 1 = {x; xxx; …} using the language-defining symbol, we can write L 1 = language(xx*) which means that each word of L 1 must start with an x followed by some (or no) x’s. l Note that we can also define L 1 using the notation + (as an exponent) introduced in Chapter 2: L 1 = language(x+) l which means that each word of L 1 is a string of some positive number of x’s. 17

Plus Sign l Let us introduce another use of the plus sign. By the expression x+y where x and y are strings of characters from an alphabet, we mean either x or y. l Care 18 should be taken so as not to confuse this notation with the notation + (as an exponent).

Example l l 19 Consider the language T over the alphabet Σ = {a; b; c}: T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb; cbbbb; …} In other words, all the words in T begin with either an a or a c and then are followed by some number of b’s. Using the above plus sign notation, we may write this as T = language((a+ c)b*)

Example l Consider a finite language L that contains all the strings of a’s and b’s of length three exactly: L = {aaa, aab, aba, abb, baa, bab, bba, bbb} l Thus, we may write L = language((a+ b)(a + b)) l or for short, L = language((a+ b)3) 20

Example l In general, if we want to refer to the set of all possible strings of a’s and b’s of any length whatsoever, we could write language((a+ b)*) l This is the set of all possible strings of letters from the alphabet Σ = {a, b}, including the null string. l This is powerful notation. For instance, we can describe all the words that begin with first an a, followed by anything (i. e. , as many choices as we want of either a or b) as a(a + b)* 21

Regular Expressions = {a, b} l a* = {Λ, a, aaa, aaaa, aaaaa, …} l ab* = {a, abb, abbbb, …} l a+b = {a, b} l (ab)* = {Λ, abab, ababab, …} l (a+b)* = {Λ, any string of as and bs} l Given

Regular Expressions l The symbols that appear in the regular expressions are the letters of the alphabet , the symbol for Λ, parentheses, the star operator, and the plus sign

Formal Definition of Regular Expressions l The set of regular expression is defined by following rules 1. Every letter of and Λ is a regular expression 2. If r 1 and r 2 are regular expressions, then so are l (r 1) l r 1 r 2 l r 1+r 2 l r 1 * 3. Nothing else is a regular expression

Regular Expressions l Whether following are RE if so what languages do they generate la (b + a)* l bb(a+b) l (a+b)(a+b) l (a+b)*ba l (a+b)*a(a+b)* l (a+b)*aa(a+b)*

Regular Expressions l Write l All RE for the following languages words ending with b l All words that start with a double letter l All words that contain at least one double letter l All words that start and end with a double letter l All words of length >=3 l All words that contain exactly one a or exactly one b l All words that don’t end at ba

Language (Set) operations l If L 1 and L 2 are two languages (set of words) l L 1 L 2 is a product set that contain all combinations of a string from L 1 concatenated with a string from L 2 l L 1+L 2 is the union set (equivalently L 1 U L 2) containing all words of L 1 and L 2 l Examples

Product Set l If S and T are sets of strings of letters, we define the product set of strings of letters to be ST = {all combinations of a string from S concatenated with a string from T in that order} 28

Example l If S = {a, aaa} and T = {bb, bbb} then ST = {abb, abbb, aabbb, aaabbb} l Using regular expression, we can write this example as (a + aaa)(bb + bbb) = abb + abbb + aabbb + aaabbb 29

Example l If M = {Λ, x, xx} and N = {Λ, y, yyy, yyyy, …} then l MN ={Λ, y, yyy, yyyy, …x, xyy, xyyyy, …xx, xxyy, xxyyyy, …} l Using regular expression (Λ + xx)(y*) = y* + xxy* 30

Regular Languages l The languages defined by a regular expression are called regular languages Or alternatively l Any language that can be represented by a regular expression is a regular language

Languages Associated with Regular Expressions

Definition l The following rules define the language associated with any regular expression: l Rule 1: The language associated with the regular expression that is just a single letter is that one-letter word alone, and the language associated with Λ is just {Λ}, a one-word language. l Rule 2: If r 1 is a regular expression associated with the language L 1 and r 2 is a regular expression associated with the language L 2, then: (i) The regular expression (r 1)(r 2) is associated with the product L 1 L 2, that is the language L 1 times the language L 2: 33 language(r 1 r 2) = L 1 L 2

Definition contd. l Rule 2 (cont. ): (ii) The regular expression r 1 + r 2 is associated with the language formed by the union of L 1 and L 2: language(r 1 + r 2) = L 1 + L 2 34 (iii) The language associated with the regular expression (r 1)* is L 1*, the Kleene closure of the set L 1 as a set of words: language(r 1*) = L 1*

Languages associated with REs l r 1 = a, r 1 = b, r 1 = Λ l If L 1 is associated with r 1 and L 2 is associated r 2 l Language(r 1 r 2) = L 1 L 2 l Language(r 1+r 2) = L 1+L 2 = L 1 U L 2 l Language(r 1*) = L 1* (Kleen’s Closure of L 1)

Regular Languages l How to tell whether a language is regular l Define an RE for it, if it is possible the language is Regular other wise non-regular l Definition l The language generated by any regular expression is called a regular language. l It is to be noted that if r 1, r 2 are regular expressions, corresponding to the languages L 1 and L 2 then the languages generated by r 1+ r 2, r 1 r 2( or r 2 r 1) and r 1*( or r 2*) are also regular languages.

Regular Languages l Example l Consider the language L, defined over Σ = {a, b}, of strings of length 2, starting with a, then l L = {aa, ab}, may be expressed by the regular expression aa+ab. Hence L, by definition, is a regular language.

Regular Languages l All l finite languages are regular Example l Consider the language L, defined over Σ = {a, b}, of strings of length 2, starting with a, then L = {aa, ab}, may be expressed by the regular expression aa+ab. Hence L, by definition, is a regular language.

Theorem l If L is a finite language (a language with only finitely many words), then L can be defined by a regular expression. In other words, all finite languages are regular. Proof l Let L be a finite language. To make one regular expression that defines L, we turn all the words in L into boldface type and insert plus signs between them. l For example, the regular expression that defines the language L = {baa, abbba, bababa} is baa + abbba + bababa l This algorithm only works for finite languages because an infinite language would become a regular expression that is infinitely long, which is forbidden. 39

More REs l EVEN-EVEN ( = {a, b}) l Language of all words having even number of as and even number of bs l Partitions/sets l Even as even bs (valid) l Even as odd bs (need to adjust bs) l Odd as odd bs (need to adjust as and bs) l Odd as even bs (need to adjust as)

Regular Expressions l EVEN-EVEN l RE ( = {a, b}) sets l (aa+bb)* l ((ab+ba))* l (aa + bb + (ab + ba )(aa + bb)* (ab + ba))* l This expression represents all the words that are made up of : type 1 = aa type 2 = bb type 3 = (ab + ba)(aa + bb)*(ab + ba)

Equivalent Regular Expressions Definition l Two regular expressions are said to be equivalent if they generate the same language. l Example l Consider the following regular expressions l r 1 = (a + b)* (aa + bb) l r 2 = (a + b)*aa + ( a + b)*bb then both regular expressions define the language of strings ending in aa or bb l

Equivalent Regular Expressions l Note If r 1 = (aa + bb) and r 2 = ( a + b) then l r 1+r 2 = (aa + bb) + (a + b) l r 1 r 2 = (aa + bb) (a + b)= (aaa + aab + bba + bbb) l (r 1)* = (aa + bb)* l

Example l Consider the language defined by the expression (a + b)*a(a + b)* l At the beginning of any word in this language we have (a + b)*, which is any string of a’s and b’s, then comes an a, then another any string. l For example, the word abbaab can be considered to come from this expression by 3 different choices: (Λ)a(bbaab) 44 or (abb)a(ab) or (abba)a(b)

Example contd. This language is the set of all words over the alphabet Σ = {a, b} that have at least one a. l The only words left out are those that have only b’s and the word Λ. These left out words are exactly the language defined by the expression b*. l If we combine this language, we should provide a language of all strings over the alphabet Σ = {a, b}. That is, (a + b)* = (a + b)*a(a + b)* + b* l 45

Example l The language of all words that have at least two a’s can be defined by the expression: (a + b)*a(a + b)* l Another expression that defines all the words with at least two a’s is b*ab*a(a + b)* l Hence, we can write (a + b)*a(a + b)* = b*ab*a(a + b)* 46 where by the equal sign we mean that these two expressions are equivalent in the sense that they describe the same language.

Example The language of all words that have at least one a and at least one b is somewhat trickier. If we write (a + b)*a(a + b)*b(a + b)* then we are requiring that an a must precede a b in the word. Such words as ba and bbaaaa are not included in this language. l l Since we know that either the a comes before the b or the b comes before the a, we can define the language by the expression (a + b)*a(a + b)*b(a + b)* + (a + b)*b(a + b)*a(a + b)* l 47 Note that the only words that are omitted by the first term (a + b)*a(a + b)*b(a + b)* are the words of the form some b’s followed by some a’s. They are defined by the expression bb*aa*

Example l We can add these specific exceptions. So, the language of all words over the alphabet Σ = {a, b} that contain at least one a and at least one b is defined by the expression: (a + b)a(a + b)b(a + b) + bb*aa* l Thus, we have proved that 48 (a + b)*a(a + b)*b(a + b)* + (a + b)*b(a + b)*a(a + b)* = (a + b)*a(a + b)*b(a + b)* + bb*aa*

Example l In the above example, the language of all words that contain both an a and ab is defined by the expression (a + b)*a(a + b)*b(a + b)* + bb*aa* l The only words that do not contain are the words of all a’s, all b’s, or Λ. l When these are included, we get everything. Hence, the expression (a + b)*a(a + b)*b(a + b)* + bb*aa* + b* defines all possible strings of a’s and b’s, including (accounted for in both a and b). 49

l Thus (a + b)* = (a + b)*a(a + b)*b(a + b)* + bb*aa* + b* 50

Example l The following equivalences show that we should not treat expressions as algebraic polynomials: (a + b)* = (a + b)* + a* (a + b)* = (a + b)* = a(a + b)* + b(a + b)* + Λ (a + b)* = (a + b)*ab(a + b)* + b*a* l 51 The last equivalence may need some explanation: l The first term in the right hand side, (a + b)*ab(a + b)*, describes all the words that contain the substring ab. l The second term, b*a* describes all the words that do not contain the substring ab (i. e. , all a’s, all b’s, Λ, or some b’s followed by some a’s).

Example Let V be the language of all strings of a’s and b’s in which either the strings are all b’s, or else an a followed by some b’s. Let V also contain the word Λ. Hence, V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …} l We can define V by the expression b* + ab* where Λ is included in b*. l Alternatively, we could define V by (Λ + a)b* which means that in front of the string of some b’s, we have 52 either an a or nothing. l

Example contd. l Hence, (Λ + a)b* = b* + ab* l Since b* = Λ b*, we have (Λ + a)b* = b* + ab* which appears to be distributive law at work. 53

Product Set l If S and T are sets of strings of letters (whether they are finite or infinite sets), we define the product set of strings of letters to be ST = {all combinations of a string from S concatenated with a string from T in that order} 54

Example l If S = {a, aaa} and T = {bb, bbb} then ST = {abb, abbb, aabbb, aaabbb} l Using regular expression, we can write this example as (a + aaa)(bb + bbb) = abb + abbb + aabbb + aaabbb 55

Example l If M = {Λ, x, xx} and N = {Λ, y, yyy, yyyy, …} then l MN ={Λ, y, yyy, yyyy, …x, xyy, xyyyy, …xx, xxyy, xxyyyy, …} l Using regular expression (Λ + xx)(y*) = y* + xxy* 56