Info 2950 Mathematical Methods for Information Science Prof

  • Slides: 42
Download presentation
Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs. cornell. edu Introduction

Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs. cornell. edu Introduction Carla Gomes INFO 2950 1

Overview of this Lecture • Course Administration • What is it about? • Course

Overview of this Lecture • Course Administration • What is it about? • Course Themes, Goals, and Syllabus Carla Gomes INFO 2950 2

Course Administration Carla Gomes INFO 2950 3

Course Administration Carla Gomes INFO 2950 3

Lectures: Tuesdays and Thursdays --- 1: 25 – 2: 40 Location: 315 Upson Hall

Lectures: Tuesdays and Thursdays --- 1: 25 – 2: 40 Location: 315 Upson Hall Lecturer: Prof. Carla Gomes Office: 5133 Upson Hall Phone: 255 9189 Email: gomes@cs. cornell. edu Course Assistant: Megan Mc. Donald (mcdonald@cs. cornell. edu) Web Site: http: //www. infosci. cornell. edu/courses/info 2950/2011 sp/. Carla Gomes INFO 2950 4

Grades The final grade for the course will be determined as follows: Participation 5%

Grades The final grade for the course will be determined as follows: Participation 5% Homework: 30% Midterm: 30% (In-class, specific details TBA) Final: 35% (TBA) Note: The lowest homework grade will be dropped before the final grade is computed. 5 Carla Gomes INFO 2950

Homework • Homework is very important. It is the best way for you to

Homework • Homework is very important. It is the best way for you to learn the material. The midterm and the final will include one or two questions from the homework assignments. • Your lowest homework grade will be dropped before the final grade is computed. • You can discuss the problems with your classmates, but all work handed in should be original, written by you in your own words. • Homework should be handed in in class. • No late homework will be accepted. Carla Gomes INFO 2950 6

Textbook Discrete Mathematics and Its Applications by Kenneth H. Rosen Use lecture notes as

Textbook Discrete Mathematics and Its Applications by Kenneth H. Rosen Use lecture notes as study guide. Carla Gomes INFO 2950 7

Overview of this Lecture • Course Administration • What is INFO 2950 about? •

Overview of this Lecture • Course Administration • What is INFO 2950 about? • Course Themes, Goals, and Syllabus Carla Gomes INFO 2950 8

What is Info 2950 about? Discrete Mathematics and Its Applications Focus: Discrete Structures Why

What is Info 2950 about? Discrete Mathematics and Its Applications Focus: Discrete Structures Why is it relevant to information science? Carla Gomes INFO 2950 9

Information Science Main Focus : Information in digital form • studies the creation, representation,

Information Science Main Focus : Information in digital form • studies the creation, representation, organization, access, and analysis of information in digital form • examines the social, cultural, economic, historical, legal, and political contexts in which information systems are employed Carla Gomes INFO 2950

Continuous vs. Discrete Mathematics Continuous Mathematics It considers objects that vary continuously; Example: analog

Continuous vs. Discrete Mathematics Continuous Mathematics It considers objects that vary continuously; Example: analog wristwatch (separate hour, minute, and second hands). From an analog watch perspective, between 1 : 25 p. m. and 1 : 26 p. m. there are infinitely many possible different times as the second hand moves around the watch face. Real-number system --- core of continuous mathematics; Continuous mathematics --- models and tools for analyzing real-world phenomena that change smoothly over time. (Differential equations etc. ) http: //ja 0 hxv. calico. jp/pai/epivalue. html (one trillion digits below the decimal point) Carla Gomes INFO 2950 11

Discrete vs. Continuous Mathematics Discrete Mathematics It considers objects that vary in a discrete

Discrete vs. Continuous Mathematics Discrete Mathematics It considers objects that vary in a discrete way. Example: digital wristwatch. On a digital watch, there are only finitely many possible different times between 1 : 25 P. m. and 1: 27 P. m. A digital watch does not show split seconds: no time between 1 : 25: 03 and 1 : 25: 04. The watch moves from one time to the next. Integers --- core of discrete mathematics Discrete mathematics --- models and tools for analyzing real-world phenomena that change discretely over time and therefore ideal for studying computer science – computers are digital! (numbers as finite bit strings; data structures, all discrete! ) Carla Gomes INFO 2950 12

What is INFO 2950 about? Why is Discrete Mathematics relevant to computer science and

What is INFO 2950 about? Why is Discrete Mathematics relevant to computer science and information science? (examples) Carla Gomes INFO 2950 13

Logic: Web Page Searching - Boolean Searches George Boole Carla Gomes INFO 2950 14

Logic: Web Page Searching - Boolean Searches George Boole Carla Gomes INFO 2950 14

Logic: Hardware and software specifications Hardware: Digital Circuits Formal: Input_wire_A value in {0, 1}

Logic: Hardware and software specifications Hardware: Digital Circuits Formal: Input_wire_A value in {0, 1} Example 1: Adder One-bit Full Adder with Carry-In and Carry-Out 4 -bit full adder Logic – formal language for representing and reason about information: Syntax; Semantics; and Inference rules. Carla Gomes INFO 2950

Logic: Digital circuits One-bit full adder XOR gates Input bits to be added 1

Logic: Digital circuits One-bit full adder XOR gates Input bits to be added 1 0 1 Carry bit Sum 1 1 1 0 AND gates 0 Carry bit for next adder OR gate Carla Gomes INFO 2950

Logic: Software specifications Example 2: System Specification: –The router can send packets to the

Logic: Software specifications Example 2: System Specification: –The router can send packets to the edge system only if it supports the new address space. – For the router to support the new address space it’s necessary that the latest software release be installed. –The router can send packets to the edge system if the latest software release is installed. –The router does not support the new address space. How to write these specifications in a rigorous / formal way? Use Logic. Carla Gomes INFO 2950 17

Sudoku 9 8 4 1 6 3 3 9 7 5 2 3 1

Sudoku 9 8 4 1 6 3 3 9 7 5 2 3 1 5 8 4 9 6 2 7 3 4 9 5 6 1 5 3 3 2 2 2 8 5 4 4 2 1 4 6 9 8 8 9 2 3 6 7 8 9 1 3 How can we encode this problem and solve it? Use Logic!!!! 3 2 Carla Gomes INFO 2950

Sudoku 9 8 5 4 2 6 3 1 7 4 1 6 3

Sudoku 9 8 5 4 2 6 3 1 7 4 1 6 3 9 7 5 2 8 7 2 3 1 5 8 4 6 9 1 6 2 7 3 4 9 8 5 8 9 7 6 1 5 2 4 3 5 3 4 2 8 9 6 7 1 3 7 1 5 6 2 8 9 4 2 4 9 8 7 3 1 5 6 6 5 8 9 4 1 7 3 2 Carla Gomes INFO 2950

Automated Proofs: EQP - Robbin’s Algebras are all Boolean A mathematical conjecture (Robbins conjecture)

Automated Proofs: EQP - Robbin’s Algebras are all Boolean A mathematical conjecture (Robbins conjecture) unsolved for decades. First non-trivial mathematical theorem proved automatically. The Robbins problem was to determine whether one particular set of rules is powerful enough to capture all of the laws of Boolean algebra. One way to state the Robbins problem in mathematical terms is: Can the equation not(P))=P be derived from the following three equations? [1] P or Q = Q or P, [2] (P or Q) or R = P or (Q or R), [3] not(P or Q) or not(P or not(Q))) = P. [An Argonne lab program] has come up with a major mathematical proof that would have been called creative if a human had thought of it. New York Times, December, 1996 http: //www-unix. mcs. anl. gov/~mccune/papers/robbins/ Carla Gomes INFO 2950 20

Probability Importance of concepts from probability is rapidly increasing in CS and Information Science:

Probability Importance of concepts from probability is rapidly increasing in CS and Information Science: • Machine Learning / Data Mining: Find statistical regularities in large amounts of data. (e. g. Naïve Bayes alg. ) • Natural language understanding: dealing with the ambiguity of language (words have multiple meanings, sentences have multiple parsings --- key: find the most likely (i. e. , most probable) coherent interpretation of a sentence (the “holy grail” of NLU). • Randomized algorithms: e. g. Google’s Page. Rank, “just” a random walk on the web! Also primality testing; randomized search algorithms, such as simulated annealing. In computation, having a 21 few random bits really helps! Carla Gomes INFO 2950

Probability: Bayesian Reasoning Bayesian networks provide a means of expressing joint probability over many

Probability: Bayesian Reasoning Bayesian networks provide a means of expressing joint probability over many interrelated hypotheses and therefore reason about them. Bayesian networks have been successfully applied in diverse fields such as medical diagnosis, image recognition, language understanding, search algorithms, and many others. Example of Query: what is the most likely diagnosis for the infection given all the symptoms? Bayes Rule “ 18 th-century theory is new force in computing” CNET ’ 07 Carla Gomes INFO 2950 22

Naïve Bayes SPAM Filters Key idea: some words are more likely to appear in

Naïve Bayes SPAM Filters Key idea: some words are more likely to appear in spam email than in legitimate email. The filter doesn't know the probabilities of different words in advance, and must first be trained so it can build them up. Users manually flag SPAM email. Formula used by Spam filters derived from “Bayes Rule”, based on independence assumptions: Carla Gomes INFO 2950 23

Probability and Chance, cont. Back to checking proofs. . . Imagine a mathematical proof

Probability and Chance, cont. Back to checking proofs. . . Imagine a mathematical proof that is several thousands pages long. (e. g. , the classification of so-called finite simple groups, also called the enormous theorem, 5000+ pages). How would you check it to make sure it’s correct? Hmm… Carla Gomes INFO 2950 24

Probability and Chance, cont. Computer scientist have recently found a remarkable way to do

Probability and Chance, cont. Computer scientist have recently found a remarkable way to do this: “holographic proofs” Ask the author of the proof to write it down in a special encoding (size increases to, say, 50, 000 pages of 0 / 1 bits). You don’t need to see the encoding! Instead, you ask the author to give you the values of 50 randomly picked bits of the proof. (i. e. , “spot check the proof”). With almost absolute certainty, you can now determine whether the proof is correct of not! (works also for 100 trillion page proofs, use eg 100 bits. ) Aside: Do professors ever use “spot checking”? Started with results from the early nineties (Arora et al. ‘ 92) with recent refinements (Dinur ’ 06). Combines ideas from coding theory, probability, algebra, computation, and graph theory. It’s an example of one of the latest advances in discrete mathematics. Carla Gomes INFO 2950 See Bernard Chazelle, Nature ’ 07.

Graph Theory Carla Gomes INFO 2950 26

Graph Theory Carla Gomes INFO 2950 26

Graphs and Networks • Many problems can be represented by a graphical network representation.

Graphs and Networks • Many problems can be represented by a graphical network representation. • Examples: – Distribution problems – Routing problems – Maximum flow problems – Designing computer / phone / road networks – Equipment replacement – And of course the Internet Aside: finding the right problem representation is one of the key issues in this course. Carla Gomes INFO 2950 27

New Science of Networks are pervasive Utility Patent network 1972 -1999 (3 Million patents)

New Science of Networks are pervasive Utility Patent network 1972 -1999 (3 Million patents) Gomes and Lesser Neural network of the nematode worm C- elegans (Strogatz, Watts) NYS Electric Power Grid (Thorp, Strogatz, Watts) Network of computer scientists Referral. Web System (Kautz and Selman) 28 Cybercommunities (Automatically discovered) Kleinberg et al

Example: Coloring a Map How to color this map so that no two adjacent

Example: Coloring a Map How to color this map so that no two adjacent regions have the same color? What does it have to do with discrete math? Carla Gomes INFO 2950 30

Graph representation Abstract the essential info: Coloring the nodes of the graph: What’s the

Graph representation Abstract the essential info: Coloring the nodes of the graph: What’s the minimum number of colors such that any two nodes connected by an edge have different colors? Carla Gomes INFO 2950 31

Four Color Theorem The chromatic number of a graph is the least number of

Four Color Theorem The chromatic number of a graph is the least number of colors that are required to color a graph. The Four Color Theorem – the chromatic number of a planar graph is no greater than four. (quite surprising!) Four color map. Proof: Appel and Haken 1976; careful case analysis performed by computer; proof reduced the infinitude of possible maps to 1, 936 reducible configurations (later reduced to 1, 476) which had to be checked one by computer. The computer program ran for hundreds of hours. The first significant computer-assisted mathematical proof. Write-up was hundreds of pages including code! 32

Examples of Applications of Graph Coloring 33

Examples of Applications of Graph Coloring 33

Scheduling of Final Exams How can the final exams at Cornell be scheduled so

Scheduling of Final Exams How can the final exams at Cornell be scheduled so that no student has two exams at the same time? (Note not obvious this has anything to do with graphs or graph coloring!) 1 7 2 6 3 Graph: 5 4 A vertex correspond to a course. An edge between two vertices denotes that there is at least one common student in the courses they represent. Each time slot for a final exam is represented by a different color. A coloring of the graph corresponds to a valid schedule of the exams. 34

Scheduling of Final Exams 1 1 7 2 7 6 3 6 5 4

Scheduling of Final Exams 1 1 7 2 7 6 3 6 5 4 2 3 5 What are the constraints between courses? Find a valid coloring Time Courses Period I 1, 6 II 2 III 3, 5 IV 4, 7 4 Why is mimimum number of colors useful? 35

Example 2: Traveling Salesman Find a closed tour of minimum length visiting all the

Example 2: Traveling Salesman Find a closed tour of minimum length visiting all the cities. TSP lots of applications: Transportation related: scheduling deliveries Many others: e. g. , Scheduling of a machine to drill holes in a circuit board ; Genome sequencing; etc 38

13, 509 cities in the US 13508!= 1. 4759774188460148199751342753208 e+49936 39

13, 509 cities in the US 13508!= 1. 4759774188460148199751342753208 e+49936 39

The optimal tour!

The optimal tour!

Course Themes, Goals, and Course Outline Carla Gomes INFO 2950 41

Course Themes, Goals, and Course Outline Carla Gomes INFO 2950 41

Goals of Info 2950 Introduce students to a range of mathematical tools from discrete

Goals of Info 2950 Introduce students to a range of mathematical tools from discrete mathematics that are key in Information Science Mathematical Sophistication How to write statements rigorously How to read and write theorems, lemmas, etc. How to write rigorous proofs Areas we will cover: Practice works! Actually, only practice works! Note: Learning to do proofs from Logic and proofs Set Theory watching the slides is like trying to Counting and Probability Theory learn to play tennis from watching Graph Theory it on TV! So, do the exercises! Models of computationa Aside: We’re not after the shortest or most elegant proofs; verbose but rigorous is just fine!

Topics Info 2950 Logic and Methods of Propositional Logic --- SAT as an encoding

Topics Info 2950 Logic and Methods of Propositional Logic --- SAT as an encoding language! Predicates and Quantifiers Methods of Proofs Sets and Set operations Functions Counting Basics of counting Pigeonhole principle Permutations and Combinations

Topics CS 2950 Probability Axioms, events, random variable Independence, expectation, example distributions Birthday paradox

Topics CS 2950 Probability Axioms, events, random variable Independence, expectation, example distributions Birthday paradox Monte Carlo method Graphs and Trees Graph terminology Example of graph problems and algorithms: graph coloring TSP shortest path Automata theory Languages

The END Carla Gomes INFO 2950 45

The END Carla Gomes INFO 2950 45