Causality and inferring causal models Chitta Baral and

Causality and inferring causal models Chitta Baral and Xin Zhang

Key thoughts (I) • Correlation does not mean causality. • Causality is often inferred using temporal data. However, temporal data alone is not enough. Example: Barometer falls before it rains, but it does not cause rain. • Statistical and philosophical literature warn: ``Unless one knows in advance all causally relevant factors or unless one carefully manipulates some variables, no genuine causal influences are possible''.

Key Thoughts (II) • Neither of the conditions is realizable in most environments. • Moreover, we have much more steady state data (e. g. from microarrays) than time series or temporal data. • Can we infer some causal information from steady state data? • Answer: To some extent.

Example Suppose we have 3 variables A, B, and C obtained from the data that: – – – A and B are dependent. B and C are dependent. A and C are independent. Think of A, B and C that satisfy the above. Most likely your interpretation of A, B and C would satisfy the causal relations A B C as shown below. A C B

Conditional independence and d-separation • X and Y are said to be conditionally independent given Z if P(x | y, z) = P(x | z) whenever P(y, z) > 0. • d-separation: A path p is said to be d-separated by a set of nodes Z if – p contains i m j or i m j and m is in Z or – p contains i m j and neither m nor any of its descendant is in Z. • Z is said to d-separate X and Y if every path between a node in X and a node in Y is d-separated by Z

Observationally equivalent • Two directed acyclic graphs (DAGs) are observationally equivalent if they have the same set of independencies. Alternative Definition: • Two DAGs are observationally equivalent if they have the same skeleton and the same set of v-structures – V-structures are structures of the form a x b such that there is no arrow between a and b.

Observationally equivalent networks • Two networks that are observationally equivalent can not be distinguished without resorting to manipulative experimentation or temporal information.

Causal model • Causal structure: a directed acyclic graph (DAG) • Causal model: Causal structure with parameters (functions for each variables with parents, and probabilities for the variables without parents)

Preference • Ordering between DAGs: G 1 is preferable to G 2, if every distribution that can be simulated using G 1 (and some parameter) can also be simulated using G 2 (and some parameter). • In the absence of hidden variables, tests of preference and (observational) equivalence can be reduced to tests of induced dependencies, which can be determined directly from the topology of the DAG without considering about the parameters.

Stability/faithfulness • Stability/faithfulness: A DAG and distribution are faithful to each other if they exhibit the same set of independencies. A distribution is said to be faithful if it is faithful to some DAG. • With the added assumption of stability, every distribution has a unique minimal causal structure (up to d-separation equivalence), as long as there are no hidden variables.

IC algorithm and faithfulness • Given a faithful distribution the IC and IC* algorithms can find the set of DAGs that are faithful to this distribution, in absence and in presence of hidden variables, respectively

IC Algorithm: Step 1 • For each pair of variables a and b in V, search for a set Sab such that (a╨b | Sab) holds in P – in other words, a and b should be independent in P, conditioned on Sab. • Construct an undirected graph G such that vertices a and b are connected with an edge if and only if no set Sab can be found. Sab a ╨ b a b Sab Not Sab

IC Algorithm: Step 2 • For each pair of nonadjacent variables a and b with a common neighbor c, check if c Sab. • If it is, then continue; • Else add arrowheads at c • i. e a→ c ← b Yes a c b a ╨ b C No a c b