CH 14 Graphical Models 1 Examples of graphical

  • Slides: 41
Download presentation
CH. 14: Graphical Models 1 Examples of graphical models: Bayesian networks, belief networks, probabilistic

CH. 14: Graphical Models 1 Examples of graphical models: Bayesian networks, belief networks, probabilistic networks. A graphical model is composed of nodes and arcs between the nodes. Each node corresponds to a random varable X with a probability P(X). 1

 A directed arc from nodes X Y (X: parent, Y: child) is specified

A directed arc from nodes X Y (X: parent, Y: child) is specified by the conditional probability P(Y | X). 14. 1 Introduction Graphical models help to infer over a large number of variables by decomposing the inference into a set of local calculations each involving a small number of variables 2

 Given the graphical model of the random variables their joint probability is calculated

Given the graphical model of the random variables their joint probability is calculated by 3 Example 1: 3

Example 2: Causal and Diagnostic inferences 4 Given: The joint probability: 4

Example 2: Causal and Diagnostic inferences 4 Given: The joint probability: 4

Causal inference : explains the cause of W is R Diagnostic inference From Bayes’

Causal inference : explains the cause of W is R Diagnostic inference From Bayes’ rule, 5

 X and Y are independent if P(X, Y) = P(X)P(Y) 6 X and

X and Y are independent if P(X, Y) = P(X)P(Y) 6 X and Y are conditionally independent given Z if P(X, Y|Z) = P(X|Z)P(Y|Z) or P(X|Y, Z) = P(X|Z) Forming a graphical model by adding nodes (according to random variables) and arcs (according to dependencies of variables). 6

7 14. 2 Three Canonical Cases for Conditional Independence Case 1: Head-to-Tail -- X

7 14. 2 Three Canonical Cases for Conditional Independence Case 1: Head-to-Tail -- X is the parent of Y and Y is the parent of Z. Probability: P(X, Y, Z) = P(X)P(Y|X)P(Z|Y) Joint X and Z are conditional independent given Y 7

Example: 8 8

Example: 8 8

9 Case 2: Tail-to-Tail -- X is the parent of two nodes Y and

9 Case 2: Tail-to-Tail -- X is the parent of two nodes Y and Z. Joint Probability: Given X, Y and Z become independent. 9

Example: 10 10

Example: 10 10

11 11

11 11

i. e. , Knowing that the sprinkler is on decreases 12 the probability that

i. e. , Knowing that the sprinkler is on decreases 12 the probability that it rained. i. e. , If the sprinkler is off, the probability of rain increases. 12

Case 3: Head-to-Head 13 -- There are two parents X and Y to a

Case 3: Head-to-Head 13 -- There are two parents X and Y to a single node Z. X and Y are independent; they become dependent when Z is known. Joint Probability: P(X, Y, Z) = P(X)P(Y)P(Z|X, Y) 13

Example: 14 Not knowing anything else, the probability that grass is wet. 14

Example: 14 Not knowing anything else, the probability that grass is wet. 14

Causal (predictive) inferences: 15 i) If the sprinkler is on, what is the probability

Causal (predictive) inferences: 15 i) If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R, S)P(R|S) + P(W|~R, S) P(~R|S) = P(W|R, S)P(R) + P(W|~R, S) P(~R) = 0. 95 x 0. 4 + 0. 9 x 0. 6 = 0. 92 ii) If it is rained, what is the probability that the grass is wet? P(W|R) = P(W|R, S)P(S|R) + P(W|~R, S) P(~S|R) = P(W|R, S)P(S) + P(W|~R, S) P(~S) = 0. 95 x 0. 2 + 0. 9 x 0. 8 = 0. 91 15

Diagnostic inference: 16 i) If the grass is wet, what is the probability that

Diagnostic inference: 16 i) If the grass is wet, what is the probability that the sprinkler is on? Knowing that the grass is wet increased the probability that the sprinkler is on. ii) If the grass is wet, what is the probability that the sprinkler is on and it is rained? 16

17 Knowing that it has rained decreases the probability that the sprinkler is on.

17 Knowing that it has rained decreases the probability that the sprinkler is on. iii) P(W|~R) = P(W|~R, S)P(S|~R) + P(W|~R, ~S) P(~S|~R) = P(W|~R, S)P(S) + P(W|~R, ~S) P(~S) = 0. 9 x 0. 2 + 0. 1 x 0. 8 = 0. 26 17

Knowing that it hasn’t rained increases the probability that the sprinkler is on. Construct

Knowing that it hasn’t rained increases the probability that the sprinkler is on. Construct larger graphs by combining subgraphs Joint probability: Causal inference: Calculate the probability, P(W|C), of having wet grass if it is cloudy. 18

19 19

19 19

20 Diagnostic inference: Calculate the probability 14. 3 Generative Models -- Represent the processes

20 Diagnostic inference: Calculate the probability 14. 3 Generative Models -- Represent the processes that create data 20

 • Model for classification: Pick first a class C from P(C). Then, fix

• Model for classification: Pick first a class C from P(C). Then, fix C, pick x from p(x|C). Bayes’ rule inverts the arc: • Naive Bayes’ classifier: ignores the dependencies among xi’s p(x|C) = p(x 1|C) p(x 2|C). . . p(xd|C) 21

 • Linear Regression 22 22

• Linear Regression 22 22

14. 4 d-Separation -- Generalization of blocking and separation • Let A, B and

14. 4 d-Separation -- Generalization of blocking and separation • Let A, B and C be arbitrary subsets of nodes. Check if A and B are d-separated (independent) given C. A path from node A to node B is blocked if i) The edges on the path meet head-to-tail or tail-to-tail and the node is in C, or ii) The edges meet head-to-head and neither that node nor any of its descendants is in C. If all paths are blocked, A and B are d-separated given C. 23

Example: 24 (1) BCDF is blocked given C. The edges on the path meet

Example: 24 (1) BCDF is blocked given C. The edges on the path meet head-to-tail or tail-to-tail (C is a tail-to-tail node. ) (2) BEFG is blocked by F. The edges meet head-to-head (F is a head-to-head node. ) (3) BEFD is blocked. (F is a head-to-head node. ) 24

14. 5 Belief Propagation 25 -- Answer P(X | E) where X is any

14. 5 Belief Propagation 25 -- Answer P(X | E) where X is any query node and E is any subset of evidence nodes whose values have been given 14. 5. 1 Chain Each node X calculates two values: from child from parent 25

26 where 26

26 where 26

27 27

27 27

14. 5. 2 Trees A query node X receives different from its children and

14. 5. 2 Trees A query node X receives different from its children and sends different to its children Two parts of evidence nodes of node X: those in the subtree rooted at X those elsewhere Therefore 28

The evidence in the subtree rooted at X: 29 where In general case, is

The evidence in the subtree rooted at X: 29 where In general case, is next propagated up to the parent of X. 29

30 The evidence past on to X: accumulated in is is next propagated down

30 The evidence past on to X: accumulated in is is next propagated down to the children of X. In general case, 30

14. 5. 3 Polytrees 31 -- A node may have multiple parents. If node

14. 5. 3 Polytrees 31 -- A node may have multiple parents. If node X is removed, the graph splits into two parts: 31

32 32

32 32

33 33

33 33

34 34

34 34

14. 5. 4 Junction Trees 35 -- There is a loop in the undirected

14. 5. 4 Junction Trees 35 -- There is a loop in the undirected graph There are more than one path to propagate evidence Removing the query node X does not split the graph into two independent parts: E+ and E-. Idea: convert the graph into a polytree by introducing clique node 35

14. 6 Markov Random Fields (MRF) 36 -- Undirected graphical models In an undirected

14. 6 Markov Random Fields (MRF) 36 -- Undirected graphical models In an undirected graph, A and B are independent if removing C makes them unconnected. Clique: a set of nodes s. t. there exits a link between any two nodes in the set Maximal clique: the clique has the maximum number of nodes clique < --- > parent, child; conditional probability < --- > potential function 36

Let be the potential function of , which is the set of variables in

Let be the potential function of , which is the set of variables in clique C. Potential functions express local constraints, e. g. local configurations. Example: In a color image, a pairwise potential function between neighboring pixels can be defined, which takes a high value if their colors are similar. The joint distribution is defined in terms of the clique 37 potentials 37

Factor Graphs (FG) 38 An FG, G = (X, F, E), is a bipartite

Factor Graphs (FG) 38 An FG, G = (X, F, E), is a bipartite graph representing the factorization of a function where Example: 38

 39 Define new factor nodes and write the joint in terms of them

39 Define new factor nodes and write the joint in terms of them Example: 39

40 14. 7 Learning the Structure of a Graphical Model Two parts of learning

40 14. 7 Learning the Structure of a Graphical Model Two parts of learning a graphical model: 1) Learning the parameters of the graph e. g. , ML, MAP 2) Learning the structure of the graph -- Perform a state-space search over a score function that uses both goodness of fit to data and some measure of complexity. e. g. , learning the structure of an artificial neural network 40

14. 8 Influence Diagrams (ID) An ID contains chance and decision nodes and a

14. 8 Influence Diagrams (ID) An ID contains chance and decision nodes and a utility node. Chance node < --- > variable Decision node < --- > action Utility node < --- > calculation of utility Example: 41