Mathematical Foundations of Computer Science Chapter 3 Regular














































- Slides: 46
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars
Languages Ø A language (over an alphabet Σ) is any subset of the set of all possible strings over Σ. The set of all possible strings is written as Σ*. Ø Example: l l l Σ = {a, b, c} Σ* = { , a, b, c, ab, ac, ba, bc, ca, aaa, …} one language might be the set of strings of length less than or equal to 2. L = { , a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc}
Regular Languages Ø A regular language (over an alphabet Σ) is any language for which there exists a finite automaton that recognizes it.
Mathematical Models of Computation Ø This course studies a variety of mathematical models corresponding to notions of computation. Ø The finite automaton was our first example. Ø The finite automaton is an example of an automaton model. Ø There are other models as well.
Mathematical Models of Computation Ø Another important model is that of a grammar. Ø We will shortly look at regular grammars. Ø But first, a digression:
Regular Expressions Ø A regular expression is a mathematical model for describing a particular type of language. Ø Regular expressions are kind of like arithmetic expressions. Ø The regular expression is defined recursively.
empty set Regular Expressions Ø Given an alphabet Σ l emptyl string , λ and a Σ are all regular expressions. If r 1 and r 2 are regular expressions, then so are r 1 + r 2, r 1 r 2 , r 1* and (r 1). • Note: we usually write r 1 r 2 as r 1 r 2. l These are the only things that are regular expressions.
Regular Expressions Ø Meaning: l l l represents the empty language λ represents the language {λ} a represents the language {a} r 1 + r 2 represents the language L(r 1) L(r 2) r 1 r 2 represents L(r 1) L(r 2) r 1* represents (L(r 1))*
Regular Expressions Ø Example 1: l l l What does a*(a + b) represent? It represents zero or more a's followed by either an a or a b. {a, b, aa, ab, aaa, aab, aaaa, aaab …}
Regular Expressions Ø Example 2: l l l What does (a + b)*(a + bb) represent? It represents zero or more symbols, each of which can be an a or a b, followed by either a or bb. {a, bb, aa, abb, ba, bbb, aaa, aabb, aba, abbb, baa, babb, bba, bbbb, …}
Regular Expressions Ø Example 3: l l l What does (aa)*(bb)*b represent? All strings over {a, b} that start with an even number of a's which are then followed by an odd number of b's. It's important to understand the underlying meaning of a regular expression.
Regular Expressions Ø Example 4: l l l Find a regular expression for strings of 0's and 1's which have at least one pair of consecutive 0's. Each such string must have a 00 somewhere in it. It could have any string in front of it and any string after it, as long as it's there!!! Any string is represented by (0 + 1)* Answer: (0 + 1)*00(0 + 1)*
Regular Expressions Ø Example: l Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's. • It's a repetition of strings that are either 1's or, if a substring begins with 0, it must be followed by at least one 1. • (1 + 011*)* • or equivalently, (1 + 01)* • But such strings can't end in a 0.
Regular Expressions Ø Example: l Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's. • • • l (1 + 011*)* (1 + 01)* But such strings can't end in a 0. So we add (0 + λ) to the end to allow for this. (1 + 01)* (0 + λ) This is only one of many possible answers.
Regular Expressions Ø Why are they called regular expressions? Ø Because, as it turns out, the set of languages they describe is that of the regular languages. Ø That means that regular expressions are just another model for the same thing as finite automata.
Regular Expressions Ø Homework: l Chapter 3, Section 1 • Problems 1 -11, 17, 18
Regular Expressions and Regular Languages Ø As we have said, regular expressions and finite automata are really different ways of expressing the same thing. Ø Let's see why. Ø Given a regular expression, how can we build an equivalent finite automaton? Ø (We won't bother going the other way, although it can be done. )
Regular Expressions and Regular Languages Ø Clearly there are simple finite automata corresponding to the simple regular expressions: l l l λ a Note that each of these has an initial state and one accepting state.
Regular Expressions and Regular Languages Ø On the previous slide, we saw that the simplest regular expressions can be represented by a finite automaton with an initial state (duh!) and one isolated accepting state:
Regular Expressions and Regular Languages Ø We can build more complex automata for more complex regular expressions using this model:
Regular Expressions and Regular Languages Ø Here's how we build an nfa for r 1 + r 2: r 1 λ r 1 + r 2 λ λ λ r 2
Regular Expressions and Regular Languages Ø Here's how we build an nfa for r 1 r 2: λ r 1 λ λ r 2 r 1 r 2
Regular Expressions and Regular Languages Ø Here's how we build an nfa for (r 1)*: Note: the last state added is not in book. For safety, I do it to have only one arc going into the final state. λ λ r 1 λ (r 1)*
Building an nfa from a regular expression Ø Example: l Consider the regular expression (a + bb)(a+b)*(bb) λ a λ λ b λ λ a b λ λ λ b λ λ sometimes we just get tired and take an obvious shortcut b b
Building regular expression from a finite automaton Ø The book goes on to show that it works the other way around as well: we can find a corresponding regular expression for any finite automaton. Ø It's fairly easy in some cases and you can "just do it. " Ø However, it's generally complicated and not worth the bother studying. Ø You are not responsible for this material
Building regular expression from a finite automaton a c a, b Ø The above automaton clearly corresponds to a*(a+b)c*
Regular Expressions and nfa's Ø Homework: l Chapter 3, Section 2 • Problems 1 -5
Regular Grammars Ø Review: A grammar is a quadruple G = (V, T, S, P) where V is a finite set of variables l T is a finite set of symbols, called terminals l S is in V and is called the start symbol l P is a finite set of productions, which are rules of l the form α→β • where α and β are strings consisting of terminals and variables.
Regular Grammars Ø A grammar is said to be right-linear if every production in P is of the form l A → x. B or l A → x l l where A and B are variables (perhaps the same, perhaps the start symbol S) in V and x is any string of terminal symbols (including the empty string λ)
Regular Grammars Ø An alternate (and better) definition of a right- linear grammar says that every production in P is of the form l A → a. B or l A → a or l S → λ (to allow λ to be in the language) l l where A and B are variables (perhaps the same, but B can't be S) in V and a is any terminal symbol
Regular Grammars Ø The reason I prefer the second definition (although I accept the first one that happens to be used in the book) is l l It's easier to work with in proving things. It's the much more common definition.
Regular Grammars Ø A grammar is said to be left-linear if every production in P is of the form l A → Bx or l A → x l l where A and B are variables (perhaps the same, perhaps the start symbol S) in V and x is any string of terminal symbols (including the empty string λ)
Regular Grammars Ø The alternate definition of a left-linear grammar says that every production in P is of the form l A → Ba or l A → a or l S → λ l l where A and B are variables (perhaps the same, but B can't be S) in V and a is any terminal symbol
Regular Grammars Ø Any left-linear or right-linear grammar is called a regular grammar.
Regular Grammars Ø For brevity, we often write a set of productions such as l A → x 1 l A → x 2 l A → x 3 Ø As l A → x 1 | x 2 | x 3
Regular Grammars Ø A derivation in grammar G is any sequence of strings in V and T, l l l connected with starting with S and ending with a string containing no variables where each subsequent string is obtained by applying a production in P is called a derivation. Ø S x 1 x 2 x 3 . . . xn abbreviated as: * x ØS n
Regular Grammars Ø S x 1 x 2 x 3 . . . xn Ø abbreviated as: Ø Ø * S xn We say that xn is a sentence of the language generated by G, L(G). Ø We say that the other x's are sentential forms.
Regular Grammars Ø * x } L(G) = {w | w T* and S n We call L(G) the language generated by G Ø L(G) is the set of all sentences over grammar G Ø
Example 1 ØS → ab. S | a is an example of a right-linear grammar. Ø Can you figure out what language it generates? Ø L = {w {a, b}* | w contains alternating a's and b's , begins with an a, and ends with a b} {a} Ø L((ab)*a)
Example 2 ØS→ Aab A → Aab | a. B B→a is an example of a left-linear grammar. Ø Can you figure out what language it generates? Ø L = {w {a, b}* | w is aa followed by at least one set of alternating ab's} Ø L(aaab(ab)*)
Example 3 Ø Consider the grammar S→ A A → a. B | λ B → Ab Ø This grammar is NOT regular. Ø No "mixing and matching" left- and rightrecursive productions.
Regular Grammars and nfa's Ø It's not hard to show that regular grammars generate and nfa's accept the same class of languages: the regular languages! Ø It's a long proof, where we must show that l l any finite automaton has a corresponding left- or right-linear grammar, and any regular grammar has a corresponding nfa. Ø We won't bother with the details.
Regular Grammars and nfa's Ø We get a feel for this by example. l Let S → a. A A → ab. S | b S b a A a b
Regular Grammars and Regular Expressions Ø Example: L(aab*a) Ø We can easily construct a regular language for this expression: l S → a. A l A → a. B l B → b. B l B → a
Regular Languages regular expressions finite automata regular grammars
Regular Languages Ø Homework: l l Chapter 3, Section 3 Problems