Language Recognizer Connecting Type 3 languages and Finite

  • Slides: 23
Download presentation
Language Recognizer Connecting Type 3 languages and Finite State Automata Rosen 13. 4 Creative

Language Recognizer Connecting Type 3 languages and Finite State Automata Rosen 13. 4 Creative Commons License – Curt Hill

 • Introduction Kleene showed that a Finite State Automaton can recognize a class

• Introduction Kleene showed that a Finite State Automaton can recognize a class of languages • This is Kleene’s Theorem • This set may be built up using only the following: • • • The empty set The empty string All single characters from the alphabet Union Concatenation Kleene closure – Three operations, three starting points Creative Commons License – Curt Hill

Regular Sets • A regular set is any set that can be constructed using

Regular Sets • A regular set is any set that can be constructed using the three starting points and three operations just given • Thus, every regular set is the language accepted by a regular grammar (type 3) and an FSA • Another way to specify these regular sets is by using regular expressions Creative Commons License – Curt Hill

Regular Expressions • There are two common understandings of regular expressions – These two

Regular Expressions • There are two common understandings of regular expressions – These two are fundamentally related but have different purposes • A means of specifying a set of strings – This will be the principle meaning for this class • A means of specifying a string to be searched for within a document – Much more common, especially in UNIX Creative Commons License – Curt Hill

Sets of Strings • In the text are the : • Concatenation – Merely

Sets of Strings • In the text are the : • Concatenation – Merely the writing of two items next to each other • Union – Symbol: signifying that either of two sets may be used • Kleene Closure – Symbol: * signifying that zero or more copies may be concatenated together • Parentheses for grouping Creative Commons License – Curt Hill

Examples • An alphabet contains a, b, c • The string aac is the

Examples • An alphabet contains a, b, c • The string aac is the concatenation of three letters • The string a(b c) represents two strings ab and ac • The string a(b)* represents every string starting with an a and followed by zero or more bs • a(a b c)*c represents all the strings that start with and end with c • (a b c)* is the set of all strings Creative Commons License – Curt Hill

Search Strings • Fundamentally the same but modified to the task at hand –

Search Strings • Fundamentally the same but modified to the task at hand – Mathematics is not concerned with beginning and end of lines, special characters or characters not on a keyboard • The is replaced by the | • Concatenation and Kleene Closure is similar • Many special characters Creative Commons License – Curt Hill

Specials • The special characters include – [ ]^|*$. ? +(){} • Any other

Specials • The special characters include – [ ]^|*$. ? +(){} • Any other character just matches itself • Since many of these characters are valuable in strings the escape is used to match them • Most of these are for the special requirements of finding an element of this set in a much larger piece of text or a document Creative Commons License – Curt Hill

Escape • The backslash character is the escape • Thus, to look for an

Escape • The backslash character is the escape • Thus, to look for an asterisk, not closure, in a string it must be escaped: * – This allows a search to find the asterisk • The C family uses some of the same escape sequences: – n newline or linefeed – t tab – r carriage return Creative Commons License – Curt Hill

Positioning • There are two specials that force a position • ^ matches the

Positioning • There are two specials that force a position • ^ matches the beginning of the line • $ matches the end of the line • Both of these, match a position rather than a character • Without these a pattern could match anywhere within a string Creative Commons License – Curt Hill

Repetition • There are three repetition characters which are more general • Closure is

Repetition • There are three repetition characters which are more general • Closure is the * – It represents zero or more repetitions of the previous item – Kleene star • The + represents one or more repetitions of the previous item • The ? represent zero or one occurrences of the previous item Creative Commons License – Curt Hill

Examples • ~* matches any number (including zero) of successive tildes • -* matches

Examples • ~* matches any number (including zero) of successive tildes • -* matches zero or more dashes • . + matches one or more of any character • hats? matches either hat or hats Creative Commons License – Curt Hill

Grouping • The previous repetitions could only be applied to a single character •

Grouping • The previous repetitions could only be applied to a single character • What is next needed is some type of grouping • This is provided by the parenthesis • Enclosing a pattern in parenthesis makes it a group • This group can then be followed by a repetition character Creative Commons License – Curt Hill

Examples • (*-)* will match – *– *-*-*- etc • The * is greedy

Examples • (*-)* will match – *– *-*-*- etc • The * is greedy – it will try to match as many of these as is possible Creative Commons License – Curt Hill

More interesting patterns • A number is pretty easy to understand from our perspective

More interesting patterns • A number is pretty easy to understand from our perspective but not so easy to describe – Except in regular expressions • An integer is a string of digits – Possibly preceded by a plus or minus • So how is this done? • With sets and repetition Creative Commons License – Curt Hill

A set • A pair of brackets may be filled with characters • This

A set • A pair of brackets may be filled with characters • This will match any one of them • Thus, the digits could be done with: [0123456789] • An integer could then be: [-+]? [0123456789]+ • Any single vowel is: [aeiou. AEIOU] Creative Commons License – Curt Hill

Alternation • A set provides intuitive alternation • The match process may choose any

Alternation • A set provides intuitive alternation • The match process may choose any character within the set to use • The alternation is only applied to number of single characters • There is also an alternation character – The vertical bar | • This allows either simple or complicated patterns to alternate Creative Commons License – Curt Hill

Alternation • Thus: A|E|I|O|U is equivalent to [AEIOU] • However, more interesting alternations are

Alternation • Thus: A|E|I|O|U is equivalent to [AEIOU] • However, more interesting alternations are possible and useful – (abc)|(123) will match either of the two strings – ([-+]? d)+|(w+) will match any string of characters that looks like a number or word – d is short for a digit – w for letters Creative Commons License – Curt Hill

Audience Participation • Suppose the following expression: ^ab(cde)*f$ • Which of the following lines

Audience Participation • Suppose the following expression: ^ab(cde)*f$ • Which of the following lines match this? • abf • abcdecdef • abcdeaf • abcdecdef • acdef • abcdefa Creative Commons License – Curt Hill

Limitations • What kind of sets are not regular? • Consider the following language:

Limitations • What kind of sets are not regular? • Consider the following language: 0 n 1 n – The number of zeros and one are the same • We know that 0 m 1 n is regular, why is 0 n 1 nnot? Creative Commons License – Curt Hill

We Really Do Know 1 0 s 1 1 • This accepts 0 m

We Really Do Know 1 0 s 1 1 • This accepts 0 m 1 n and is clearly a FSA • Why is 0 n 1 n harder? • Counter-intuitive since 0 n 1 n is a subset of 0 m 1 n • Shouldn’t it be harder to generate a full set than a subset? Creative Commons License – Curt Hill

Memory • An FSA determines its next state only based on input and current

Memory • An FSA determines its next state only based on input and current state • Since it has no memory, it cannot remember how many zeros we processed so that we can process that many ones • Next, we consider those machines stronger than these Creative Commons License – Curt Hill

Exercises • 13. 4 – 3, 5, 15 Creative Commons License – Curt Hill

Exercises • 13. 4 – 3, 5, 15 Creative Commons License – Curt Hill