Lecture 1 String and Language String string is

  • Slides: 36
Download presentation
Lecture 1 String and Language

Lecture 1 String and Language

String • string is a finite sequence of symbols. For example, string ( s,

String • string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS 4384 ( C, S, 4, 3, 8) 101001 (1, 0) • Symbols are given through alphabet. • An alphabet is a finite set of symbols.

Examples of Alphabet • {a, b, c, . . . , x, y, z}

Examples of Alphabet • {a, b, c, . . . , x, y, z} (Roman alphabet) • {0, 1, . . . , 9} • {0, 1} (binary alphabet)

Length of a String • The length of a string x is the number

Length of a String • The length of a string x is the number of symbols contained in the string x, denoted by |x|. • For example, | string | = 6, • |CS 5400| = 6, | 101001 | = 6. • The empty string is a string having no symbol, denoted by ε.

Equal • Two strings x 1 x 2···xn and y 1 y 2···ym are

Equal • Two strings x 1 x 2···xn and y 1 y 2···ym are equal if and only if (1) n=m and (2) xi=yi for all i. • For example, 01 ≠ 010 and 1010 ≠ 1110.

Substring • s is a substring of x if there exist strings y and

Substring • s is a substring of x if there exist strings y and z such that x = ysz. • In particular, when x = sz (y=ε), s is called a prefix of x; when x = ys (z=ε), s is called a suffix of x. For example, CS is a prefix of CS 5400 • and 5400 is a surfix of CS 5400.

Concatenation • The concatenation of two strings x and y is a string xy,

Concatenation • The concatenation of two strings x and y is a string xy, i. e. , x is followed by y. • For example, CS 5400 is a concatenation of CS and 5400. • In particular, we denote 4 2 3 xx = x, xxxx = x, . . . , and define 0 x =ε 3 0 • For example, 101010 = (10), (10) = ε

Solve equation 011 x=x 011 • • If x=ε, then ok. If |x|=1, then

Solve equation 011 x=x 011 • • If x=ε, then ok. If |x|=1, then no solution. If |x|=2, then no solution. If |x|>3, then x=011 y. Hence, 011 x=011 y 011. So, x=y 011. Hence, 011 y=y 011. k • x=(011) for k > 0

Language • A language is a set of strings. 0 1 2 For example,

Language • A language is a set of strings. 0 1 2 For example, {0, 1}, {all English words}, {0, 0, 0, . . . } are all languages. • The following are operations on sets and hence also on languages. Union: A U B Intersection: A ∩ B Difference: A _ B (A - B when B A) Complement: A = Σ* - A where Σ* is the set of all strings on alphabet Σ.

Concatenation of Languages • Concatenation: AB = {ab | a in A, b in

Concatenation of Languages • Concatenation: AB = {ab | a in A, b in B} • For example, {0, 1}{1, 2} = {01, 02, 11, 12}. • Especially, we denote A = A, A = AA, . . . , 1 2 and define A = {ε}. 0

If AB=B for any B, then A ={ε}. • Choose B = {ε }.

If AB=B for any B, then A ={ε}. • Choose B = {ε }. Then A ≠ empty and A cannot contain a nonempty string.

Examples 2 • For Σ = {0, 1}, Σ = {00, 01, 10, 11},

Examples 2 • For Σ = {0, 1}, Σ = {00, 01, 10, 11}, k • (Σ is the set of all strings of length k on Σ. ) Therefore, 0 1 2 • Σ* = Σ U Σ U ···.

Kleene Closure • Kleene closure: 0 1 2 A* = A U A U

Kleene Closure • Kleene closure: 0 1 2 A* = A U A U ··· • Notation: + 1 2 3 A = A U A U ···

 • A={grand, ε}, B={father, mother}. What is A*B? • A*B={father, mother, grandfather, grandmother,

• A={grand, ε}, B={father, mother}. What is A*B? • A*B={father, mother, grandfather, grandmother, …}

What is • Where ? ? ? is the empty language.

What is • Where ? ? ? is the empty language.

+ A* = A if and only if ε is in A + +

+ A* = A if and only if ε is in A + + • If ε is in A, then ε is in A. Hence A* = A. + • If ε is not in A, then ε is not in A. Hence A* ≠ A. +

{0, 10}* is the language of strings not containing substring 11 and not ending

{0, 10}* is the language of strings not containing substring 11 and not ending with 1. • What is the language of strings not containing substring 11 and ending with 0? + • {0, 10}

Puzzle • How many strings of length at most 40 are in the following

Puzzle • How many strings of length at most 40 are in the following language ?

Lecture 2 Regular Language and Regular Expression.

Lecture 2 Regular Language and Regular Expression.

Regular Languages • The concept of regular languages on an alphabet Σ is defined

Regular Languages • The concept of regular languages on an alphabet Σ is defined recursively as follows: (1) The empty language is regular. (2) For every symbol a Σ, {a} is regular. (3) If A and B are regular languages, then A U B, AB, and A* are regular. (4) Nothing else is a regular language.

{ε} is regular. • Because the empty language = {ε} is regular,

{ε} is regular. • Because the empty language = {ε} is regular,

For Σ={0, 1}, {011} is regular. • Since {0} and {1} are regular, {011}={0}{1}{1}

For Σ={0, 1}, {011} is regular. • Since {0} and {1} are regular, {011}={0}{1}{1} is regular • Remark: Every language containing only one string is regular.

{011, 100} is regular. • Because {011} and {100} are regular, {011, 100} =

{011, 100} is regular. • Because {011} and {100} are regular, {011, 100} = {011}U{100} is regular. • Remark: Every finite language is regular. • Remark: Every infinite regular language must be obtained with Kleene closure.

Operation Preference • ({0}*U{0}{1}{1}*){0}{0}{1}* • (1) Kleene closure has the higher preference over union

Operation Preference • ({0}*U{0}{1}{1}*){0}{0}{1}* • (1) Kleene closure has the higher preference over union and concatenation. • (2) Concatenation has the higher preference over union.

The language of all binary strings starting with 01 is regular. Proof. The string

The language of all binary strings starting with 01 is regular. Proof. The string in this language is in form 01 x 1··· xn where x 1··· xn {0, 1}*. Therefore, the language can be written as {01} {0, 1}* = ({0}{1})({0} U {1})*, which is regular.

The language of all binary strings ending at 01 is regular. Proof. The string

The language of all binary strings ending at 01 is regular. Proof. The string in this language is in form x 1··· xn 01 where x 1 ··· xn {0, 1}*. Therefore, the language can be written as {0, 1}*{01} = ({0} U {1})*({0}{1}), which is regular.

The language of all binary strings having substring 01 is regular. Proof. The string

The language of all binary strings having substring 01 is regular. Proof. The string in this language is in form x 1 ··· xn 01 y 1 ··· ym where x 1 ··· xn, y 1 ··· ym {0, 1}*. Therefore, the language can be written as {0, 1}* {01} {0, 1}* =({0}U{1})*({0}{1})({0}U{1})*, which is regular.

Question: Do you fell that the expression of the regular set in the above

Question: Do you fell that the expression of the regular set in the above example contains too many parentheses? • Here is a simple expression -- Regular Expression

Regular Expression • (1) is a regular expression of the empty language. • (2)

Regular Expression • (1) is a regular expression of the empty language. • (2) ε is a regular expression of {ε}. • (3) For any symbol a, a is a regular expression of {a}. • (4) If r. A and r. B are regular expressions of languages A and B, then r. A+r. B is a regular expression of A U B, r. Ar. B is a regular expression of AB, and r. A* is a regular expression of A*.

Examples • • 011 is a regular expression of {0}{1}{1}. 0+1 is a regular

Examples • • 011 is a regular expression of {0}{1}{1}. 0+1 is a regular expression of {0, 1}. (0+1)* is a regular expression of {0, 1}*. Remark: (0+1)+ is also considered to be + a regular expression of {0, 1}.

 • The language of all binary strings starting with 01 has a regular

• The language of all binary strings starting with 01 has a regular expression 01(0+1)*. • The language of all binary strings ending at 01 has a regular expression (0+1)*01. • The language of all binary strings having substring 01 has a regular expression (0+1)*01(0+1)*.

Induction Proof • Because the regular language is defined recursively, • we can prove

Induction Proof • Because the regular language is defined recursively, • we can prove the property of regular languages by • proving the following: (1) has the property. (2) For any symbol a Σ, {a} has the property. (3) If A and B has the property, then all A U B, AB, and A* have the property. • Actually, this is an induction proof. (1), (2) serve the basis step and (3) is the induction step.

 • For a string x=x 1 x 2…xn, x R =xn…x 2 x

• For a string x=x 1 x 2…xn, x R =xn…x 2 x 1. R • For a language A, A = {x R| x A}. R • Show that if A is regular, so is A. Proof. (1) is regular. R (2) For any symbol a, {a} = {a} is regular. R (3) Suppose that for regular languages A and B, A and B Rare regular. Then R R R (A U B) = A U B is regular, R R R (AB) = B A is regular. R R (A*) = (A )* is regular.

Find a regular expression for R {xwx | x (0+1)*, w (0+1)*} • {xwx

Find a regular expression for R {xwx | x (0+1)*, w (0+1)*} • {xwx R | x (0+1)*, w (0+1)*} = (0+1)*

Find a regular expression for R + {xwx | x (0+1), w (0+1)*} +

Find a regular expression for R + {xwx | x (0+1), w (0+1)*} + • x (0+1), w = 0(0+1)*0 + 1(0+1)*1 {xwx. R | (0+1)*}

Puzzle • How many regular expressions can a language have?

Puzzle • How many regular expressions can a language have?