CIS 262 Automata Computability and Complexity Spring 2019
CIS 262 Automata, Computability, and Complexity Spring 2019 http: //www. seas. upenn. edu/~cse 262/ Instructor: Aaron Roth aaroth@cis. upenn. edu Lecture: February 13, 2019
Course Logistics Midterm Date: Wednesday, February 27. (In class) (oops -- never mind) Midterm Date: Monday, March 18. (In class) 2
Recap Regular languages are closed under: Union, Intersection, Complement Concatenation, Kleene-* To show that regular languages are closed under OP Consider two arbitrary DFAs M 1 and M 2 Show to construct a DFA (or NFA or e-NFA) M’ that accepts L(M 1 ) OP L(M 2) States/transitions of M’ are defined in terms of those of M 1 and M 2 3
Prefix Operation A string u is prefix of w if w = u. v for some string v Prefixes of 011 = e, 0, 011 Prefix(L) = Set of prefixes of all strings in L = { u | there exists a string v such that u. v is in L } For example, L = { w | w ends in a } Prefix(L) = S* L’ = { w | w does not contain any a symbols } Prefix(L’) = L’ 4
Closure Under Prefix Operation If L is regular, is Prefix(L) guaranteed to be regular ? Consider DFA M for L; goal: construct machine M’ for Prefix(L) M’ should act like M on its input w When should it accept ? As long as there exists an extension that can lead to an accepting state of M M’ has same states, initial state, and transition function as M State q is final in M’ if some state in F is reachable from q F’ = { q | there exists v such that d*(q, v) is in F } 5
Edit 1 Operation For two strings u and v, distance(u, v)=1 if they differ in exactly one symbol For S = { a, b }, strings at distance 1 from ab : aa, bb Edit 1(L) = { w | w in L or there is a string u in L with distance(u, w)=1 } For example, L = { w | w contains the substring “ACC” } for S={A, C, G, T} Edit 1(L) = { w | w contains ACC or CCC or GCC or TCC or AAC or AGC or ATC or ACA or ACG or ACT } 6
Closure Under Edit 1 Operation If L is regular, is Edit 1(L) guaranteed to be regular ? Consider DFA M for L; Goal: construct machine M’ for Edit 1(L) M’ should act like M on its input w, but at some step, while reading a symbol s from w, it can update the state of M using a transition on another symbol s’ Challenges: 1. When to replace and which symbol to use as a replacement ? 2. How to ensure that at most one symbol is replaced ? Solutions: 1. Use nondeterminism 2. Maintain, besides state of M, a bit to remember if a symbol has already been changed 7
Closure Under Edit 1 Operation Consider DFA M = (Q, S, q 0, F, d) Goal: construct NFA M’ for Edit 1(L(M)) State of M’ is of the form (q, b) where q is a state of M and b is 0/1 b=1 means that one symbol has been already replaced b is initially 0 and q is initially q 0 In state (q, b), on input symbol s, if b=0 then update q using s-transition of M keeping b=0 or update q using s’-transition of M keeping b=1 else update q using s-transition of M keeping b=1 Accept if state q is a final state of M (value of b does not matter) 8
Closure Under Edit 1 Operation Consider DFA M = (Q, S, q 0, F, d) Goal: construct NFA M’ for Edit 1(L(M)) Precise definition of M’ States of M’ : Q’ = Q x { 0, 1 } Initial state of M’ : (q 0, 0) Final states of M’ : F x { 0, 1 } Transition function of M’ : D’( (q, 0), s) = { (d(q, s), 0 ) } U { (d(q, s’), 1) | s’ != s } D’( (q, 1), s) = { (d(q, s), 1) } 9
Regular Expressions High-level specification language for expressing regular patterns Examples: S* ACC S* : Strings that contain the substring ACC S* 0 : Strings that end with symbol 0 Practical use: text search, spam filters, lexical analysis … Supported in many programming languages (awk, sed, perl, Java. Script) and text editors (emacs, Word …) We will focus on “core” regular expressions with a small set of basic operators, practical implementations support a rich set of operators (that can be defined in terms of basic ones) 10
Regular Expressions: Definition Let S be a finite alphabet Defining Syntax: Rules for constructing regular expressions Defining Semantics: Associating a language L(r) with each regular expression r L(r) is the set of strings that match the pattern r 11
Regular Expressions: Definition 1. e is a regular expression Only the empty string matches this reg-ex: L(e) = { e } 2. F is a regular expression No string matches this reg-ex: L(F) = { } 3. For each symbol s in S, s is a regular expression The only string matching reg-ex s is the string s itself: L(s) = { s } 4. If r is a regular expression, so is ( r ) Parantheses used only for parsing: L( ( r) ) = L(r) 12
Regular Expressions: Definition 5. If r and r’ are regular expressions, then so is r. r’ A string w matches r. r’ if it can be split in two parts w=u. v such that u matches r and v matches r’ That is, L(r. r’) = L(r). L(r’) 6. If r and r’ are regular expressions, then so is r U r’ A string matches r U r’ if it matches either r or r’ L(r U r’) = L(r) U L(r’) 7. If r is a regular expression, then so is r* A string w matches r* if w can be split into multiple (0 or more) parts such that each part matches r: L(r*) = L(r)* 13
Notational Conventions Many times “. ” is omitted: 01 stands for the reg-ex 0. 1 If S ={a, b}, then the regular expression (a U b) is abbreviated as S r* means 0 or more repetitions of r; r+ mean one or more repetitions of r, and is an abbreviation for r. r* Operator precedences: * highest then U ab* means a. (b)* ab U c means (a. b) U c a U b* means a U (b)* Parantheses used as needed: (ab)*, a (b U c), (a U b)* 14
Regular Expressions: Examples S = { a, b } a* b S* S* a S* b S* S* abaa S* aabb S* (S S)* (a U e) b* a* F F* 15
Regular Expressions: Examples S= { a, b } Write regular expressions for: { w | last symbol in w = first symbol in w } a U b U a S*a U b S*b { w | count(w, a) modulo 3 = 0 } b* (a b*)* 16
Regular Expression: Phone Numbers What are valid (US) phone numbers ? 1 -215 -200 -1091 2152001091 1. 215. 2001091 Should have 10 digits, with an optional 1 at the beginning Can optionally be split into three blocks, with. or – as separators ( 1 U 1 - U e ) D D D (. U – U e) D D where D stands for a digit: (0 U 1 U 2 … U 9) This allows 215. 200 -1091 Write a reg-ex that disallows this (i. e. both. and – are not used in same number) 17
Regular Expressions in Practice Additional operators and abbreviations useful in practice Intersection: r & r’ Example: constraints on a legal password (should have at least 8 characters, at least one numeral, at least one capital letter …) Negation/complementation: ~r Optional use: r ? means (r U e) Counting: D 4 means D. D Character ranges: [0 – 9], [a – p] Note: class of regular languages should be closed under such operators 18
From Regular Expressions to NFAs Goal: Given a regular expression r, construct an e-NFA M(r) that accepts the language L(r) Construction by induction on the structure of r 1. r equals e 2. r equals F 3. r equals a a 4. r equals ( r’ ) : M(r) is same as M(r’) 19
From Regular Expressions to NFA 5. r equals r 1. r 2 Build M(r 1) Build M(r 2) 20
From Regular Expressions to NFA 5. r equals r 1. r 2 Build M(r) from M(r 1) and M(r 2) using concatenation construction M(r 2) M(r 1) e e 21
From Regular Expressions to NFA 6. r equals r 1 U r 2 Build M(r 1) Build M(r 2) 22
From Regular Expressions to NFA 6. r equals r 1 U r 2 Build M(r) from M(r 1) and M(r 2) by adding a new initial state M(r 1) e e M(r 2) 23
From Regular Expressions to NFA 7. r equals r’ * Build M(r’) Apply Kleene-* construction e e e 24
Example Translation (a b)* (a U e ) (a b) a a b (a U e) (a b)* e a e e (a b)*(a U e) a e b e e a e 25
- Slides: 25