CIS 262 Automata Computability and Complexity http www
CIS 262 Automata, Computability, and Complexity http: //www. seas. upenn. edu/~cse 262/ Professor Aaron Roth
Class Overview What is theory of computation?
What is the Theory of Computation? • The science in computer science. • The formal study of what a computer can, and cannot do even in principle. • Independent of technology: Limitations of -any- physical device: e. g. your laptop, your brain, future technology – The physics of computation: Computation as a fundamental phenomenon
What is the Theory of Computation? • Once a problem is formalized as a “computational” problem with clearly specified inputs and outputs, before you write code, let’s understand: – Can the problem be solved at all by a computer? – How efficiently can it be solved? – How can we be convinced that the solution is correct?
Problem 1: Character Coverage in Documents Given a text document, check if each of the vowels a, e, i, o, u appears at least once in the document As the algorithm scans the document reading one character at a time, what information should it track? For each vowel, need to know whether or not it has appeared in the document read so far Ø Maintain one bit, initialized to 0, for each of the vowels Ø If the read character is a vowel, set the corresponding bit to 1 Ø In the end, check if all bits are 1 5
Problem 2: Character Count in Documents Given a text document, check if the number of occurrences of the characters a and e are the same As the algorithm scans the document reading one character at a time, what information should it track? Track the difference in number of times a and e have occurred so far Ø Count is initially 0 Ø Increment it if the read character is a Ø Decrement it if the read character is e Ø In the end, check if the count is 0 6
Finite-State Computation Problem 1: All five vowels appear at least once Ø Need to maintain 5 bits of memory (constant) Ø Computing device to solve this problem needs only 32 states Ø Number of states is a priori bounded, independent of input length Problem 2: Count of a’s = count of e’s Ø The value of the variable tracking the difference in the two counts is potentially unbounded Ø Memory needed for the computation depends on input length Ø If a computing device has only finitely many states, it cannot solve this problem! (How do we prove such a statement? ) 7
Part A: Regular Languages Finite automata: Formal model of finite-state computation Ø Class of problems that can be solved = regular languages Ø Define the model, study its properties Ø Different characterizations: regular expressions Ø Techniques for establishing “non-solvability” by finite automata Why study this topic? ? Ø Warm-up for defining models of computation Ø Beautiful theory with many results! Ø Special property: computational problems about automata are solvable (e. g. minimize the number of states needed) Ø Continues to have practical applications: regular expressions supported by all modern languages (e. g. Javascript), checking correctness of distributed protocols, … 8
Problem 3: Syntactic Checks for Programs Given a program (written in, say, C), check if its text adheres to all syntax rules (e. g. variables should be declared before use) Is variable a 1 declared earlier? 9
Problem 4: Finding Bugs in Programs Given a program (written in, say, C), verify that its execution cannot lead to a semantic error (e. g. absence of buffer overflow errors) Can value of index i be more than size of array it accesses ? 10
Decidable/Solvable vs Undecidable/Unsolvable Problem 3: Find syntactic errors in program text Ø Compiler checks program text for all such rules and either reports errors or compiles it into binary code Ø Problem is solvable! Problem 4: Decide whether or not a (semantic) error will arise when a program executes Ø Problem is provably unsolvable! Ø There does not exist a compiler that can certify this type of correctness of programs Turing Machines: Universal model of computation used to formalize what a computer can do and cannot do 11
Problem 5: Theorem Proving Given a mathematical statement expressed in a formal system, determine whether or not it is true. Undecidable. – What does this imply about the existence of proofs for true statements? 12
Problem 6: File Compression Given data, find the shortest description of the data (i. e. the shortest program that will generate the data). Undecidable. – What does this imply about Occam’s razor and learning from observation? 13
Alan Turing (British mathematician, 1912 -1954): Father of CS Turing Machines: Mathematical model of computation Fundamental theorem (Turing, 1936): Undecidability of halting problem for Turing machines It is not possible to construct a Turing machine that takes as input the description of a machine and decides whether or not the execution of the input machine terminates Turing also built the computer Enigma to decode encrypted messages during World War 2 See 2014 movie: The Imitation Game 14
Part B: Computability Turing Machines: Mathematical model of computation Ø Why is it universal? Can do what any known computer can do! Ø Insights into how such machines work and their properties Undecidable/Unsolvable problems: Ø Halting problem for Turing machines is undecidable Ø Problem reduction: unsolvability of one problem can imply unsolvability of another! Ø Different shades on unsolvability; Turing (un)recognizability 15
Problem 7: Finding Most Connected Person Given a graph of “friends” connections on Facebook, find the person who has maximum number of friends Suffices to check each person one by one and find number of his/her connections If n people with a total of m connections then “time complexity” of algorithm is roughly linearly proportional to n and m 16
Problem 8: Finding Largest Mutually Connected Group Given a graph of “friends” connections on Facebook, find the largest clique (group of people who are all friends of one another) For a given a group of people, easy to check if this forms a clique But for given k, number of possible groups of size k is about nk Only known bound on k is n, so number of groups is exponential in n. Checking all groups is too inefficient! 17
P vs NP P: Class of problems with polynomial-time solutions Ø Problem 7 belongs to this class Ø Precise definition uses time complexity of Turing machines Ø These are “efficiently solvable” or tractable problems NP-complete: A class of problems with no efficient solutions Ø Problem 8 belongs to this class Ø Precise definition uses “non-deterministic” Turing machines Ø No known proof that such a problem cannot be solved efficiently Ø A large class of commonly occurring problems that are all “equivalent” to one another (if you find a polynomial-time solution for one NP-complete problem, then every problem in NP has polytime solution) Cook’s Thorem (1971): Propositional satisfiability is NP-complete 18
Part C: Complexity Time and space complexity of problems Ø How much time and memory is needed to solve a given problem Ø Lower and upper bounds on complexity Complexity classes of problems ØP Ø NP Ø PSPACE Theory of NP-completeness Ø Cook’s theorem: SAT is NP-complete Ø Problem reductions THE open problem in computer science/mathematics: Is P = NP ? 19
Course Logistics • When and where: Here and now! (Monday/Wednesday 3: 00 -4: 30, Meyerson B 1) – Recitation: Monday 4: 30 -5: 30, Wu + Chen – First recitation: TBA • Lectures will cover new material, recitations will be optional
Course Staff • TAs: – – – Suyog Bobhate Shannie Cheng Kurt Convey Nathaniel Fessel Shenqi Hu Aime Igiraneza Seyoung Kim Ruijie Mao Andrew Martin Xuan Ru Ng Irene Zhang
The Textbook • “Introduction to the Theory of Computation”, 3 rd Edition – Michael Sipser.
Evaluation • • Bi-Weekly Problem Sets (40%) One in class midterm (25%) Final exam (30%) Participation on Piazza (5%)
Homework • We will have bi-weekly problem sets – Problems will be handed in and graded on Grade. Scope. Learn to use it! – (Digital submission, so homework must be typed or scanned) – Grades and comments also returned online – no paper.
Homework • Late policy: 20% deducted per day late. • Collaboration: Encouraged! – You can discuss homework solutions with classmates, but must write the solutions separately, and name everyone you spoke to at the top of the page. – Copying from any source, including other students, is plagiarism. Plagiarism will be reported to the Office of Student Conduct, and will result in a failing grade. • Digital submission makes this easy to find and document!
Exceptions: None • Making up a midterm or final exam requires medical documentation. • No exceptions to homework late policy, but lowest homework grade will be dropped.
Office Hours • Ask questions on Piazza first, so that everyone can benefit. http: //piazza. com/upenn/spring 2019/cis 262 Related: Answer questions for others on Piazza. This is how participation is graded. – TAs will also monitor Piazza and answer questions.
Office Hours • Every TA and myself will hold at least 1 office hour. (>12 office hours!) See Calendar on the website for details.
Lets Begin! • (Review Chapter 0 of Sipser for mathematical basics)
Encoding Problems Before we can define mathematical computation models corresponding to machines/computers, we need a way to encode computational problems Ø Mathematically precise Ø Simple Ø General Observation: an instance of a problem, whether it be a text document or Facebook friends graph, can always be encoded by a sequence of characters (0’s and 1’s) 30
Alphabet S : Finite set of symbols/characters used for encoding Examples: Ø S = { 0, 1 } Ø S = { a, b } Ø S = { A, C, G, T } Ø S = All characters in the Roman alphabet 31
Strings A string w over an alphabet S is a finite sequence of symbols in S Example strings over the alphabet S = { a, b } e : empty string, i. e. , string of length 0 a b abb baaabba ababababababab S* = Set of all strings over S If S has m symbols, how many strings of length k ? How many strings does S* contain? 32
Operations on Strings Concatenation of strings: Given strings u and v, u. v denotes their concatenation (sometimes. is omitted) A string u is a prefix of string w if there exists a string v such that w = u. v Prefixes of aab = e, a, aab A string u is a suffix of string w if there exists a string v such that w = v. u Suffixes of aab = e, b, aab A string u is a substring of string w if there exist strings v and v’ such that w = v. u. v’ Substrings of aaba = e, a, aab, aaba, b, ba 33
Languages A language L over an alphabet S is a set of strings over S L is a subset of S* A problem is encoded as a language L: Given an input string w, decide whether or not w is in L This definition is general enough to encode all “decision problems”, that is, problems where the output is either yes or no 34
Example Language S = { a, b } L = { w | w ends with the symbol b } = { b, ab, bb, aab, abb, bab, bbb, … } A machine for L needs to check, given an input string w, whether or not the last symbol of the input w is b 35
Example Language S = { A, C, G, T } L = { w | w contains “ACC” as a substring } Example string in the language: GTTACCGA Example string not in the language: ACTGCCATTGTCA 36
THE PARABLE OF THE ISLANDERS 100 logicians live in total isolation on an island. They have developed a quirky culture: 1. If any islander can deduce that he has blue eyes, he must kill himself on the beach at midnight that night. 2. No islander may tell another that she has blue eyes. Another quirk: All of the islanders are blue-eyed. Since there are no mirrors and the water is murky, no one has ever known their own eye color, and they have lived in harmony for hundreds of years with no suicides.
THE PARABLE OF THE ISLANDERS One day, an explorer arrives on the island addresses the islanders with a faux-pas. “At Least One of You Has Blue Eyes, ” he tells them. Due to the explorers terrible breach in manners, the islanders quickly dispatch of him, but the damage has already been done. Has anything changed? What happens? Try and solve this problem, and discuss on Piazza.
- Slides: 38