Computational Intelligence II Lecturer Professor Pekka Toivanen Email

  • Slides: 75
Download presentation
Computational Intelligence II Lecturer: Professor Pekka Toivanen E-mail: Pekka. Toivanen@uef. fi Exercises: Nina Rogelj

Computational Intelligence II Lecturer: Professor Pekka Toivanen E-mail: Pekka. Toivanen@uef. fi Exercises: Nina Rogelj E-mail: nina. rogelj@uef. fi

Usefulness • Trees and graphs represent data in many domains in linguistics, vision, chemistry,

Usefulness • Trees and graphs represent data in many domains in linguistics, vision, chemistry, web. (Even sociology. ) • Tree and graphs searching algorithms are used to retrieve information from the data. PODS 2002 4

Tree Inclusion Book Chapter Editor Title ? Name Title Author Title John XML Mary

Tree Inclusion Book Chapter Editor Title ? Name Title Author Title John XML Mary OLAP XML (a) Chapter Author Jack (b) PODS 2002 5

PODS 2002 6

PODS 2002 6

Tree. BASE Search Engine PODS 2002 7

Tree. BASE Search Engine PODS 2002 7

Vision Application: Handwriting Characters Representation From pixels to a small attributed graph l 1

Vision Application: Handwriting Characters Representation From pixels to a small attributed graph l 1 e 1 l 5 D. Geiger, R. Giugno, D. Shasha, Ongoing work at New York University PODS 2002 e 3 l 2 e 5 e 4 l 3 l 4 8

Vision Application: Handwriting Characters Recognition QUERY l 4 e 3 l 2 l 3

Vision Application: Handwriting Characters Recognition QUERY l 4 e 3 l 2 l 3 e 4 e 5 l 5 D A T A B A S E l 1 Best Match l 1 e 1 l 5 e 2 e 3 l 2 e 5 e 4 l 3 l 4 l 2 e 7 l 5 e 3 e 5 e 2 l e 1 3 e 4 e 6 l 4 l 1 PODS 2002 l 2 e 6 l 5 e e 2 l 3 e 5 3 e 1 e 4 l 1 9

Vision Application: Region Adjacent Graphs J. Lladós and E. Martí and J. J. Villanueva,

Vision Application: Region Adjacent Graphs J. Lladós and E. Martí and J. J. Villanueva, Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 -10, 1137— 1143, 2001. PODS 2002 10

Chemistry Application • Protein Structure Search. http: //sss. berkeley. edu/ • Daylight (www. daylight.

Chemistry Application • Protein Structure Search. http: //sss. berkeley. edu/ • Daylight (www. daylight. com), • MDL http: //www. mdli. com/ • BCI (www. bci 1. demon. co. uk/) PODS 2002 11

Algorithmic Questions • Question: why can’t I search for trees or graphs at the

Algorithmic Questions • Question: why can’t I search for trees or graphs at the speed of keyword searches? (Proper data structure) • Why can’t I compare trees (or graphs) as easily as I can compare strings? PODS 2002 12

Different forms of exact matching • Graph isomorphism – A one-to-one correspondence must be

Different forms of exact matching • Graph isomorphism – A one-to-one correspondence must be found between each node of the first graph and each node of the second graph. – Graphs G(VG, EG) and H(VH, EH) are isomorphic if there is an invertible F from VG to VH such that for all nodes u and v in VG, (u, v)∈EG if and only if (F(u), F(v)) ∈EH. Zheng Zhang, 27. 05. 2010 Exact Matching slide 21

Different forms of exact matching • Subgraph isomorphism – It requires that an isomorphism

Different forms of exact matching • Subgraph isomorphism – It requires that an isomorphism holds between one of the two graphs and a node-induced subgraph of the other. • Monomorphism – It requires that each node of the first graph is mapped to a distinct node of the second one, and each edge of the first graph has a corresponding edge in the second one; the second graph, however, may have both extra nodes and extra edges. Zheng Zhang, 27. 05. 2010 Exact Matching slide 22

Different forms of exact matching • Homomorphism – It drops the condition that nodes

Different forms of exact matching • Homomorphism – It drops the condition that nodes in the first graph are to be mapped to distinct nodes of the other; hence, the correspondence can be many-to-one. – A graph homomorphism F from Graph G(VG, EG) and H(VH, EH), is a mapping F from VG to VH such that {x, y} ∈EG implies {F(x), F(y)} ∈EH. Zheng Zhang, 27. 05. 2010 Exact Matching slide 23

Different forms of exact matching • Maximum common subgraph (MCS) – A subgraph of

Different forms of exact matching • Maximum common subgraph (MCS) – A subgraph of the first graph is mapped to an isomorphic subgraph of the second one. – There are two possible definitions of the problem, depending on whether node-induced subgraphs or plain subgraphs are used. – The problem of finding the MCS of two graphs can be reduced to the problem of finding the maximum clique (i. e. a fully connected subgraph) in a suitably defined association graph. Zheng Zhang, 27. 05. 2010 Exact Matching slide 24

Different forms of exact matching • Properties – The matching problems are all NP-complete

Different forms of exact matching • Properties – The matching problems are all NP-complete except for graph isomorphism, which has not yet been demonstrated whether in NP or not. – Exact graph matching has exponential time complexity in the worst case. However, in many PR applications the actual computation time can be still acceptable. – Exact isomorphism is very seldom used in PR. Subgraph isomorphism and monomorphism can be effectively used in many contexts. – The MCS problem is receiving much attention. Zheng Zhang, 27. 05. 2010 Exact Matching slide 25

Necessary concepts • Epimorphism – an surjective homomorphism • Monomorphism – an injective homomorphism

Necessary concepts • Epimorphism – an surjective homomorphism • Monomorphism – an injective homomorphism • Endomoprphism – a homomorphism from an object to itself • Automoprphism – an endomorphism which is also an isomorphism with itself Zheng Zhang, 27. 05. 2010 Exact Matching slide 26

Algorithms for exact matching • Techniques based on tree search – mostly based on

Algorithms for exact matching • Techniques based on tree search – mostly based on some form of tree search with backtracking Ullmann’s algorithm Ghahraman’s algorithm VF and VF 2 algorithm Bron and Kerbosh’s algorithm Other algorithms for the MCS problem • Other techniques – based on A* algorithm Demko’s algorithm – based on group theory Nauty algorithm Zheng Zhang, 27. 05. 2010 Exact Matching slide 30

Markov Chains .

Markov Chains .

Finite Markov Chain An integer time stochastic process, consisting of a domain D of

Finite Markov Chain An integer time stochastic process, consisting of a domain D of m states {s 1, …, sm} and 1. An m dimensional initial distribution vector ( p(s 1), . . , p(sm)). 2. An m×m transition probabilities matrix M= (asisj) For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1 st letter in a sequence, and a. AG the probability that G follows A in a sequence. 36

Simple Model - Markov Chains • Markov Property: The state of the system at

Simple Model - Markov Chains • Markov Property: The state of the system at time t+1 only depends on the state of the system at time t X 1 X 2 X 3 37 X 4 X 5

Markov Chain (cont. ) X 1 X 2 Xn-1 Xn • For each integer

Markov Chain (cont. ) X 1 X 2 Xn-1 Xn • For each integer n, a Markov Chain assigns probability to sequences (x 1…xn) over D (i. e, xi D) as follows: Similarly, (X 1, …, Xi , …)is a sequence of probability distributions over D. 38

Matrix Representation A A B 0. 95 0 C 0. 05 D 0 B

Matrix Representation A A B 0. 95 0 C 0. 05 D 0 B 0. 2 0. 5 0 0. 3 C 0 0. 2 0 0. 8 0 0 1 0 D The transition probabilities Matrix M =(ast) M is a stochastic Matrix: The initial distribution vector (u 1…um) defines the distribution of X 1 (p(X 1=si)=ui). Then after one move, the distribution is changed to X 2 = X 1 M After i moves the distribution is Xi = X 1 Mi-1 39

Simple Example Weather: –raining today rain tomorrow p rr = 0. 4 –raining today

Simple Example Weather: –raining today rain tomorrow p rr = 0. 4 –raining today no rain tomorrow p rn = 0. 6 –no raining today rain tomorrow p nr = 0. 2 –no raining today no rain tomorrow p rr = 0. 8 40

Simple Example Transition Matrix for Example • Note that rows sum to 1 •

Simple Example Transition Matrix for Example • Note that rows sum to 1 • Such a matrix is called a Stochastic Matrix • If the rows of a matrix and the columns of a matrix all sum to 1, we have a Doubly Stochastic Matrix 41

Gambler’s Example – At each play we have the following: • Gambler wins $1

Gambler’s Example – At each play we have the following: • Gambler wins $1 with probability p or • Gambler loses $1 with probability 1 -p – Game ends when gambler goes broke, or gains a fortune of $100 – Both $0 and $100 are absorbing states p 0 1 1 -p p p N-1 2 1 -p p 1 -p Start 42 (10$) 1 -p N

Coke vs. Pepsi Given that a person’s last cola purchase was Coke, there is

Coke vs. Pepsi Given that a person’s last cola purchase was Coke, there is a 90% chance that her next cola purchase will also be Coke. If a person’s last cola purchase was Pepsi, there is an 80% chance that her next cola purchase will also be Pepsi. 0. 1 0. 9 coke 0. 8 pepsi 0. 2 43

Coke vs. Pepsi Given that a person is currently a Pepsi purchaser, what is

Coke vs. Pepsi Given that a person is currently a Pepsi purchaser, what is the probability that she will purchase Coke two purchases from now? The transition matrix is: (Corresponding to one purchase ahead) 44

Coke vs. Pepsi Given that a person is currently a Coke drinker, what is

Coke vs. Pepsi Given that a person is currently a Coke drinker, what is the probability that she will purchase Pepsi three purchases from now? 45

Coke vs. Pepsi Assume each person makes one cola purchase per week. Suppose 60%

Coke vs. Pepsi Assume each person makes one cola purchase per week. Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction of people will be drinking Coke three weeks from now? Let (Q 0, Q 1)=(0. 6, 0. 4) be the initial probabilities. We will regard Coke as 0 and Pepsi as 1 We want to find P(X 3=0) 46 P 00

“Good” Markov chains For certain Markov Chains, the distributions Xi , as i ∞:

“Good” Markov chains For certain Markov Chains, the distributions Xi , as i ∞: (1) converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. Call these Markov Chain “good”. We describe these “good” Markov Chains by considering Graph representation of Stochastic matrices. 47

Representation as a Digraph 0. 95 A 0. 95 B 0 C 0. 05

Representation as a Digraph 0. 95 A 0. 95 B 0 C 0. 05 D 0 0. 2 0. 5 0 0. 3 0 0. 2 0 0. 8 0 0 1 0 A A 0. 2 C 0. 3 0. 8 C 0. 5 B 0. 2 0. 05 B D 1 D Each directed edge A B is associated with the positive transition probability from A to B. We now define properties of this graph which guarantee: 1. Convergence to unique distribution: 2. In that distribution, each state has positive probability. 48

Examples of “Bad” Markov Chains Markov chains are not “good” if either : 1.

Examples of “Bad” Markov Chains Markov chains are not “good” if either : 1. They do not converge to a unique distribution. 2. They do converge to u. d. , but some states in this distribution have zero probability. 49

Bad case 1: Mutual Unreachabaility A B C D Consider two initial distributions: a)

Bad case 1: Mutual Unreachabaility A B C D Consider two initial distributions: a) p(X 1=A)=1 (p(X 1 = x)=0 if x≠A). b) p(X 1= C) = 1 In case a), the sequence will stay at A forever. In case b), it will stay in {C, D} for ever. Fact 1: If G has two states which are unreachable from each other, then {Xi} cannot converge to a distribution which is independent on the initial distribution. 50

Bad case 2: Transient States A C B Def: A state s is recurrent

Bad case 2: Transient States A C B Def: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient. D A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back. 51

Bad case 2: Transient States A C Fact 2: For each initial distribution, with

Bad case 2: Transient States A C Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. B D X 52

Bad case 3: Periodic States A C B A state s has a period

Bad case 3: Periodic States A C B A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. E D A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise. Example: Consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps. 53

Bad case 3: Periodic States A C B E D Fact 3: In a

Bad case 3: Periodic States A C B E D Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions Xi does not converge as i ∞. 54

Ergodic Markov Chains 0. 95 A 0. 2 B 0. 2 0. 05 0.

Ergodic Markov Chains 0. 95 A 0. 2 B 0. 2 0. 05 0. 3 A Markov chain is ergodic if : 1. All states are recurrent (ie, the graph is strongly connected) 2. It is not periodic 0. 8 C D 1 The Fundamental Theorem of Finite Markov Chains: If a Markov Chain is ergodic, then 1. It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix. 2. The distributions Xi , as i ∞, converges to V. 55

Use of Markov Chains in Genome search: Modeling Cp. G Islands In human genomes

Use of Markov Chains in Genome search: Modeling Cp. G Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called Cp. G islands (p denotes “pair”). 56

Example: Cp. G Island (Cont. ) We consider two questions (and some variants): Question

Example: Cp. G Island (Cont. ) We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a Cp. G island ? Question 2: Given a long piece of genomic data, does it contain Cp. G islands in it, where, what length ? We “solve” the first question by modeling strings with and without Cp. G islands as Markov Chains over the same states {A, C, G, T} but different transition probabilities: 57

Example: Cp. G Island (Cont. ) The “+” model: Use transition matrix A+ =

Example: Cp. G Island (Cont. ) The “+” model: Use transition matrix A+ = (a+st), Where: a+st = (the probability that t follows s in a Cp. G island) The “-” model: Use transition matrix A- = (a-st), Where: a-st = (the probability that t follows s in a non Cp. G island) 58

Example: Cp. G Island (Cont. ) With this model, to solve Question 1 we

Example: Cp. G Island (Cont. ) With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain. [to solve Question 2 we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model. This is done by using the Hidden Markov Model, to be defined later. ] We start with Question 1: 59

Question 1: Using two Markov chains A+ (For Cp. G islands): We need to

Question 1: Using two Markov chains A+ (For Cp. G islands): We need to specify p+(xi | xi-1) where + stands for Cp. G Island. From Durbin et al we have: Xi Xi-1 A C G T A C G 0. 18 0. 27 0. 43 0. 12 0. 17 p+(C | C) 0. 274 p+(T|C) 0. 16 p+(C|G) p+(G|G) p+(T|G) T 0. 08 p+(C |T) p+(G|T) p+(T|T) (Recall: rows must add up to one; columns need not. ) 60

Question 1: Using two Markov chains A- (For non-Cp. G islands): …and for p-(xi

Question 1: Using two Markov chains A- (For non-Cp. G islands): …and for p-(xi | xi-1) (where “-” stands for Non Cp. G island) we have: Xi Xi-1 A C G T A C G 0. 3 0. 29 0. 21 0. 32 p-(C|C) 0. 078 p-(T|C) 0. 25 p-(C|G) p-(G|G) p-(T|G) T 0. 18 p-(C|T) p-(G|T) p-(T|T) 61

Discriminating between the two models X 1 X 2 XL-1 Given a string x=(x

Discriminating between the two models X 1 X 2 XL-1 Given a string x=(x 1…. x. L), now compute the ratio If RATIO>1, Cp. G island is more likely. Actually – the log of this ratio is computed: Note: p+(x 1|x 0) is defined for convenience as p+(x 1). p-(x 1|x 0) is defined for convenience as p-(x 1). 62 XL

Log Odds-Ratio test Taking logarithm yields If log. Q > 0, then + is

Log Odds-Ratio test Taking logarithm yields If log. Q > 0, then + is more likely (Cp. G island). If log. Q < 0, then - is more likely (non-Cp. G island). 63

Where do the parameters (transitionprobabilities) come from ? Learning from complete data, namely, when

Where do the parameters (transitionprobabilities) come from ? Learning from complete data, namely, when the label is given and every xi is measured: Source: A collection of sequences from Cp. G islands, and a collection of sequences from non-Cp. G islands. Input: Tuples of the form (x 1, …, x. L, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (Xi=a, Xi-1=b) with label +, and with label -, say the numbers are Nba, + and Nba, -. 64

Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) X 1 X 2

Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) X 1 X 2 XL-1 XL The needed parameters are: P+(x 1), p+ (xi | xi-1), p-(xi | xi-1) The ML estimates are given by: Where Na, + is the number of times letter a appear in Cp. G islands in the dataset. Where Nba, + is the number of times letter b appears after letter a in Cp. G islands in the dataset. 65