Bayes Net Perspectives on Causation and Causal Inference
Bayes Net Perspectives on Causation and Causal Inference Peter Spirtes 1
Example Problems � Genetic regulatory networks � Yeast – ~5000 genes, ~2, 500, 000 potential edges A gene regulatory network in mouse embryonic stem cells http: //www. pnas. org/content/104/ 42/16438/F 3. expansion. html 2
Causal Models → Predictions � Probabilistic – Among the cells that have active Oct 4 what percentage have active Rcor 2? � Causal – If I experimentally set a cell to have active Oct 4, what percentage will have active Rcor 2? � 3
Causal Models → Predictions � Counterfactual – Among the cells that did not have active Oct 4 at t-1, what percentage would have active Rcor 2 if I had experimentally set a cell to have active Oct 4 at t-1? � 4
Data → Causal Models � Large number of variables � Small observed sample size � Overlapping variables � Small number of experiments � Feedback � Hidden common causes � Selection bias � Many kinds of entities causally interacting 5
Outline � Bayesian Networks � Search � Limitations and Extensions of Bayesian Networks � Dynamic � Relational � Cycles � Counterfactual 6
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Directed Acyclic Graph (DAG) SES – Socioeconomic Status PE – Parental Encouragement SEX CP – College Plans IQ – Intelligence Quotient IQ SEX – Sex � The vertices are random variables. PE CP � All edges are directed. � There are no directed cycles. 7
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Population SES SEX IQ SES PE CP SEX IQ Counterfactual SES PE CP SEX PE CP IQ Independent, identically distributed 8
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual P Factoring According to G � P(SES, SEX, PE, IQ, CP) = SES P(SEX)P(SES)P(IQ|SES) SEX PE P(PE|SES, SEX, IQ) P(CP|PE, SES, IQ) IQ � If then P factors according to G � G represents all of the distributions that factor according to G CP 9
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Conditional Independence � X is independent of Y conditional on Z (denoted IP(X, Y|Z)) iff P(X|Y, Z) = P(X|Z). � IP(CP, SEX|{SES, IQ, PE}) iff P(CP|{SES, IQ, PE, SEX}) = P(CP|{SES, IQ, PE}) 10
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Graphical Entailment � If for every P that factors SES SEX IQ PE according to G, I , P(X, Y|Z) holds, then G entails I(X, Y|Z). CP� Examples: G entails � I(IQ, SEX|∅ ) � I(IQ, SEX|SES) � Can read entailments off of graph through d-separation 11
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual D-separation and D-connection SES � X d-separated from Y conditional on Z in G iff G entails X independent of Y conditional on Z SEX PE CP � D-separation between X and Y conditional on Z holds when certain kinds of paths do not exist IQ between X and Y � D-connection (the negation of d-separation) between X and Y conditional on Z holds when certain kinds of paths do exist between X and Y 12
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Definition of D-connection SES � A node X is active on a path U conditional on Z iff � X is a collider (→ X ←) and SEX PE CP there is a directed path from X to a member of Z or X is in Z; or � X is not a collider and X is not in IQ Z. � SES → IQ → PE ← SEX is a path U. � PE is active on U conditional on {CP, IQ}. � IQ is inactive on U conditional on {CP, IQ}. 13
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Definition of D-connection SES � A path U is active conditional on Z iff every vertex on U is active relative to Z. SEX PE CP � X is d-connected to Y conditional on Z iff there is an active path IQ between X and Y conditional on Z. � SES → IQ → PE ← SEX is inactive conditional on {CP, IQ}. � SES is d-connected to SEX conditional on {CP, IQ} because SES → PE ← SEX is active conditional on {CP, IQ} 14
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational If I is Not Entailed by G SES SEX IQ • Counterfactual � If conditional independence PE CP relation I is not entailed by G, then I may hold in some (but not every) distribution P that factors according to G. � Example: There are P and P’ that factor according to G such that ~I such that ~ P(SES, CP|∅ ) and IP’(SES, CP|∅ ). P’ is said to be unfaithful to G. 15
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Manipulations � An ideal manipulation assigns a density to a set X of properties (random variables) as a function of the values of a set Z of properties (random variables) � Directly affects only the variables in X � Successful � Example – randomized experiment 16
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Manipulations and Causal Graph � There is an edge SES → CP SES SEX IQ PE CP in G because there are two ways of manipulating {SES, SEX, IQ, PE} that differ only in the value they assign to SES that changes the probability of CP. Stable Unit Treatment Value Assumption 17
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Causal Sufficiency � A set S of variables is SES SEX IQ PE causally sufficient if there are no variables not in S that CP are direct causes of more than one variable in S. � S = {SES, IQ} is causally sufficient. � S = {SES, PE, CP} is not causally sufficient. 18
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Causal Markov Assumption � In a population Pop with SES SEX IQ PE CP distribution P and causal graph G, if V is causally sufficient, P(V) factors according to G. � P(SES, SEX, PE, IQ, CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES, IQ) P(CP|PE, SES, IQ) 19
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Representation of Manipulation P(SES, SEX, PE=1, IQ, CP||PE=1) = P(SEX)P(SES)P(IQ|SES) * 1 * P(CP|PE, SES, IQ) = P(SES, SEX, PE=1, IQ, CP)/P(PE|SEX, SES, IQ) SES SEX PE CP IQ 20
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual FCI Algorithm � Looks for set of DAGs (possibly with latent variables and selection bias) that entail all and only the conditional independence relations that hold in the data according to statistical tests. 21
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Markov Equivalence �Two DAGs G 1 and G 2 are Markov equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G 1 if and only if X is entailed to be independent from Y conditional on Z in G 2 22
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Markov Equivalence Class SES SEX PE CP IQ IQ DAG G’ 23
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Causal Faithfulness Assumption � In a population Pop SES SEX IQ PE CP with causal graph G and distribution P(V), if V is causally sufficient, IP(X, Y|Z) only if G entails I(X, Y|Z). � ~IP(SES, CP|∅ ) because I(SES, CP|∅ )is not entailed by G � +… 24
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Causal Faithfulness Assumption � Causal Faithfulness is too SES strong because SEX PE CP � can prove consistency with assumptions about fewer conditional independencies IQ � is unlikely to hold, especially when there are many variables. � Causal Faithfulness is too weak because it is not sufficient to prove uniform consistency (put error bounds at finite sample sizes. ) 25
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Good Features of FCI Algorithm � Is pointwise consistent: As sample size → ∞, P(error in output pattern) → 0. � Can be applied to distributions where tests of conditional independence are known � Can be applied to hidden variable models (and selection bias models) 26
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Bad Features of FCI Algorithm � There is no reliable way to set error bounds on the pattern without making stronger assumptions. � Can only get set of Markov equivalent DAGs, not a single DAG � Doesn’t allow for comparing how much better one model is than another � Need to assume some version of Causal Faithfulness Assumption 27
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Non Independence Constraints �Depending on the parametric family, a DAG can entail constraints that are not conditional independence constraints � Assuming linearity and non-Gaussian error terms, if a distribution is compatible with X → Y it is not compatible with X ← Y, even though they are Markov equivalent. 28
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Score-Based Search Strategy � Assign score to graph and sample based on � maximum likelihood of data given graph � simplicity of model � Do search over graph space for highest score 29
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Advantages of Score-Based Search Strategy � Get more information about graph � Additive noise models, unique DAG � Doesn’t rely on binary decisions � Local mistakes don’t propagate 30
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Disadvantages of Score-Based Search Strategy � Often slower to calculate or not known how to calculate exactly if include � unmeasured variables � selection bias � unusual distributions � Search over graph space is often heuristic 31
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Dynamic Bayes Nets � If measure same variable at different times, then the samples from the variable are not i. i. d. � Solution: index each variable by time (time series) 32
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Dynamic Bayes Nets � Make a template for the causal structure that can be filled in with actual times Xt-2 Xt-1 Xt Yt-2 Yt-1 Yt � Continuous time or differential equations? 33
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search Population parent-of IQ • parent-of SES SEX • Relational parent-of SES PE CP SEX IQ Counterfactual PE CP SEX PE CP IQ 34
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search Population • Relational • Counterfactual SES parent-of SEX PE CP IQ � Not i. i. d. distribution � Violations of SUTVA � Causal relations between relations (e. g. sibling causes rivalry) 35
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Extended Manipulation Specification � A manipulation assigns a density to � a set of properties or relations � at a set of times (measurable set of times T) � for a set of units � as a function of the values of � a set of properties of relations � at a set of times (measurable set of times T) � for a set of units 36
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Extended Factorization Assumption Alice&Jim parent-of SES SEX PE CP IQ Sue Bob P([Alice&Jim. SES, Sue. SEX, Sue. PE, Sue. IQ, Sue. CP, Alice&Jim. SES, Bob. SEX, Bob. PE, Bob. IQ, Bob. CP) = 37
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual Extended Factorization Assumption P(Sue. SEX) P(Alice&Jim. SES) P(Sue. IQ|Alice&Jim. SES) P(Sue. PE|Alice&Jim. SES, Sue. SEX, Sue. IQ) P(Sue. CP|Sue. PE, Alice&Jim. SES, Sue. IQ) P(Bob. SEX) P(Alice&Jim. SES) P(Bob. IQ|Alice&Jim. SES) P(Bob. PE|Alice&Jim. SES, Bob. SEX, Bob. IQ) P(Bob. CP|Bob. PE, Alice&Jim. SES, Bob. IQ) 38
• Bayesian Networks • Limitations and Extensions • Dynamic • Cycles • Search • Relational • Counterfactual 3 Interpretation of Cycles: PE ⇆ CP � Equilibrium values of PE SES SEX IQ PE and CP cause each other. � Average of values of PE and CP while reaching CP equilibrium influence each other. � Mixture of PE → CP and CP → PE 39
- Slides: 39