1. Introduction
 You’ve discovered that 
 $X$
 and
$X$
 and 
 $Y$
 are non-spuriously correlated and are thus sure that either
$Y$
 are non-spuriously correlated and are thus sure that either 
 $X$
 causes
$X$
 causes 
 $Y$
 or that
$Y$
 or that 
 $Y$
 causes
$Y$
 causes 
 $X$
. But you aren’t sure which. How should you go about figuring this out?
$X$
. But you aren’t sure which. How should you go about figuring this out?
 It is prima facie attractive to maintain that we can infer the direction of causation between 
 $X$
 and
$X$
 and 
 $Y$
 from the temporal order of
$Y$
 from the temporal order of 
 $X$
 and
$X$
 and 
 $Y$
 (such that we infer that
$Y$
 (such that we infer that 
 $X \to Y$
 if
$X \to Y$
 if 
 $X$
 temporally precedes
$X$
 temporally precedes 
 $Y$
 and infer that
$Y$
 and infer that 
 $Y \to X$
 if
$Y \to X$
 if 
 $Y$
 temporally precedes
$Y$
 temporally precedes 
 $X$
). But there are at least two reasons that we may not endorse this as a fully general strategy. First, it relies on the controversial claim that causes always temporally precede their effects.Footnote 
1
 Second, no matter what we say about the conceptual relationship between causation and time, there are contexts in which this strategy is not helpful because the variables under consideration aren’t presented with a temporal ordering.
$X$
). But there are at least two reasons that we may not endorse this as a fully general strategy. First, it relies on the controversial claim that causes always temporally precede their effects.Footnote 
1
 Second, no matter what we say about the conceptual relationship between causation and time, there are contexts in which this strategy is not helpful because the variables under consideration aren’t presented with a temporal ordering.
 What to do? A number of authors working within the graphical approach to causal modeling maintain that we can tackle this problem by checking what happens when we identify some exogenous source of variation for one of the variables under consideration.Footnote 
2
 According to this line of reasoning, if an exogenous cause of 
 $X$
 has an effect on
$X$
 has an effect on 
 $Y$
, then we can infer that
$Y$
, then we can infer that 
 $X \to Y$
, whereas if an exogenous cause of
$X \to Y$
, whereas if an exogenous cause of 
 $X$
 has no effect on
$X$
 has no effect on 
 $Y$
, then we can infer that
$Y$
, then we can infer that 
 $Y \to X$
.
$Y \to X$
.
 In what follows, we raise a problem for this approach to inferring the direction of causation. Specifically, we point out that there are cases where an exogenous cause of 
 $X$
 (
$X$
 (
 ${E_x}$
) has no probabilistic influence on
${E_x}$
) has no probabilistic influence on 
 $Y$
 no matter which way the arrow of causation points—namely, cases where
$Y$
 no matter which way the arrow of causation points—namely, cases where 
 ${E_x} \to X \to Y$
 and
${E_x} \to X \to Y$
 and 
 ${E_x} \to X \leftarrow Y$
 are probabilistically indistinguishable. We then assess the philosophical significance of the problem and survey multiple ways that we might try to persist in maintaining that the arrow of causation is somehow grounded in probabilistic (in)dependencies.
${E_x} \to X \leftarrow Y$
 are probabilistically indistinguishable. We then assess the philosophical significance of the problem and survey multiple ways that we might try to persist in maintaining that the arrow of causation is somehow grounded in probabilistic (in)dependencies.
2. Unidentifiable colliders and intransitive chains
The justification for inferring causal direction from exogenous variation often goes by way of an axiom that characterizes causal Bayes nets—namely, the Causal Markov Condition.Footnote 3
 Causal Markov Condition (CMC): Given a causal graph 
 $G$
 over variable set V and probability distribution
$G$
 over variable set V and probability distribution 
 $\mathbb{P}$
 over V,
$\mathbb{P}$
 over V, 
 $G$
 and
$G$
 and 
 $\mathbb{P}$
 satisfy the Causal Markov Condition if and only if any variable
$\mathbb{P}$
 satisfy the Causal Markov Condition if and only if any variable 
 $X$
 in V is probabilistically independent of its non-descendants given its parents.Footnote 
4
$X$
 in V is probabilistically independent of its non-descendants given its parents.Footnote 
4
 The CMC implies that 
 ${E_x} \to X \to Y$
 is compatible with the unconditional probabilistic dependence of
${E_x} \to X \to Y$
 is compatible with the unconditional probabilistic dependence of 
 ${E_x}$
 and
${E_x}$
 and 
 $Y$
, but that
$Y$
, but that 
 ${E_x} \to X \leftarrow Y$
 is incompatible with the unconditional probabilistic dependence of
${E_x} \to X \leftarrow Y$
 is incompatible with the unconditional probabilistic dependence of 
 ${E_x}$
 and
${E_x}$
 and 
 $Y$
. Thus we can conclude that
$Y$
. Thus we can conclude that 
 ${E_x} \to X \to Y$
 when we are confronted with our inference problem and observe a probabilistic dependence between the exogenous cause
${E_x} \to X \to Y$
 when we are confronted with our inference problem and observe a probabilistic dependence between the exogenous cause 
 ${E_x}$
 and
${E_x}$
 and 
 $Y$
. But while the CMC successfully licenses this inference, it does not by itself license us to infer that
$Y$
. But while the CMC successfully licenses this inference, it does not by itself license us to infer that 
 $Y \to X$
 when an exogenous cause of
$Y \to X$
 when an exogenous cause of 
 $X$
 has no probabilistic effect on
$X$
 has no probabilistic effect on 
 $Y$
. This is because the CMC is in the business of saying what causal dependencies are implied by what probabilistic dependencies (or, contrapositively, what probabilistic independencies are implied by what causal independencies), rather than what probabilistic dependencies are implied by what causal dependencies. As far as the CMC is concerned, then, the unconditional probabilistic independence of
$Y$
. This is because the CMC is in the business of saying what causal dependencies are implied by what probabilistic dependencies (or, contrapositively, what probabilistic independencies are implied by what causal independencies), rather than what probabilistic dependencies are implied by what causal dependencies. As far as the CMC is concerned, then, the unconditional probabilistic independence of 
 ${E_x}$
 and
${E_x}$
 and 
 $Y$
 is compatible with both
$Y$
 is compatible with both 
 ${E_x} \to X \to Y$
 and
${E_x} \to X \to Y$
 and 
 ${E_x} \to X \leftarrow Y$
.
${E_x} \to X \leftarrow Y$
.
 In order to rule out that 
 $X \to Y$
 on the grounds that some exogenous cause of
$X \to Y$
 on the grounds that some exogenous cause of 
 $X$
 has no probabilistic effect on
$X$
 has no probabilistic effect on 
 $Y$
, we must additionally help ourselves to some condition that licenses us to infer that
$Y$
, we must additionally help ourselves to some condition that licenses us to infer that 
 ${E_x}$
 and
${E_x}$
 and 
 $Y$
 are unconditionally probabilistically dependent (or correlated)Footnote 
5
 when
$Y$
 are unconditionally probabilistically dependent (or correlated)Footnote 
5
 when 
 ${E_x} \to X \to Y$
. The much discussed Causal Faithfulness Condition (CFC) is up to the task since it says that a causal graph
${E_x} \to X \to Y$
. The much discussed Causal Faithfulness Condition (CFC) is up to the task since it says that a causal graph 
 $G$
 over V is compatible with a probability distribution
$G$
 over V is compatible with a probability distribution 
 $\mathbb{P}$
 over V if and only if there are no conditional independencies in
$\mathbb{P}$
 over V if and only if there are no conditional independencies in 
 $\mathbb{P}$
 that are not entailed by
$\mathbb{P}$
 that are not entailed by 
 $G$
 using the CMC. But while the CFC is often deployed as a simplifying assumption in causal inference,Footnote 
6
 it is universally acknowledged to fall prey to counterexamples. Indeed, there are some probability distributions that cannot be paired with any causal graph to satisfy both the CMC and the CFC.Footnote 
7
 Thus it would seem that we can’t rely on the CFC in our justification of any fully general account of causal direction in causal Bayes nets.
$G$
 using the CMC. But while the CFC is often deployed as a simplifying assumption in causal inference,Footnote 
6
 it is universally acknowledged to fall prey to counterexamples. Indeed, there are some probability distributions that cannot be paired with any causal graph to satisfy both the CMC and the CFC.Footnote 
7
 Thus it would seem that we can’t rely on the CFC in our justification of any fully general account of causal direction in causal Bayes nets.
 Of particular interest here is a class of distributions defined over three variables {
 ${V_1},{V_2},{V_3}$
} in which the only independencies that obtain are
${V_1},{V_2},{V_3}$
} in which the only independencies that obtain are 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}$
 and
${V_1} \bot \hskip-0.5pc\bot {V_3}$
 and 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
.Footnote 
8
 Though there is no causal graph that can be paired with such a distribution to satisfy both the CMC and the CFC,Footnote 
9
 it is easy to imagine multiple causal scenarios that give rise to such distributions—even when we stipulate that
${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
.Footnote 
8
 Though there is no causal graph that can be paired with such a distribution to satisfy both the CMC and the CFC,Footnote 
9
 it is easy to imagine multiple causal scenarios that give rise to such distributions—even when we stipulate that 
 ${V_1}$
 is exogenous in order to mirror the assumptions of our inference problem in which
${V_1}$
 is exogenous in order to mirror the assumptions of our inference problem in which 
 ${E_x}$
 is known to be exogenous. To make things concrete, let us first identify a specific probability distribution according to which these independencies obtain, depicted in table 1.
${E_x}$
 is known to be exogenous. To make things concrete, let us first identify a specific probability distribution according to which these independencies obtain, depicted in table 1.
Note that this distribution satisfies the following properties:
 • 
 $\neg \left( {{V_1} \bot \hskip-0.5pc\bot {V_2}} \right)$
, since
$\neg \left( {{V_1} \bot \hskip-0.5pc\bot {V_2}} \right)$
, since 
 ${P({V_2} = 1|{V_1} = 0) = {{{10}}\over{{33}}} \ne {{{20}}\over{{33}}} = P({V_2} = 1|{V_1} = 1)}$
.
${P({V_2} = 1|{V_1} = 0) = {{{10}}\over{{33}}} \ne {{{20}}\over{{33}}} = P({V_2} = 1|{V_1} = 1)}$
.
 • 
 $\neg \left( {{V_2} \bot \hskip-0.5pc\bot {V_3}} \right)$
, since
$\neg \left( {{V_2} \bot \hskip-0.5pc\bot {V_3}} \right)$
, since 
 $P({V_3} = 1|{V_2} = 2) ={ {9}\over{{10}}} \ne P\left( {{V_3} = 1} \right) = {{{56}}\over{{66}}}$
.
$P({V_3} = 1|{V_2} = 2) ={ {9}\over{{10}}} \ne P\left( {{V_3} = 1} \right) = {{{56}}\over{{66}}}$
.
 • 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}$
, since
${V_1} \bot \hskip-0.5pc\bot {V_3}$
, since 
 $P\left( {{V_3} = 1} \right) = {{{28}}\over{{33}}} = P({V_3} = 1|{V_1} = 0)$
.
$P\left( {{V_3} = 1} \right) = {{{28}}\over{{33}}} = P({V_3} = 1|{V_1} = 0)$
.
 • 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, since
${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, since
 
 $P({V_3} = 1|{V_1} = 0,{V_2} = 0) = {{1}\over{3}} = P({V_3} = 1|{V_1} = 1,{V_2} = 0),$
$P({V_3} = 1|{V_1} = 0,{V_2} = 0) = {{1}\over{3}} = P({V_3} = 1|{V_1} = 1,{V_2} = 0),$
 
 $P({V_3} = 1|{V_1} = 0,{V_2} = 1) = {{9}\over{{10}}} = P({V_3} = 1|{V_1} = 1,{V_2} = 1),$
$P({V_3} = 1|{V_1} = 0,{V_2} = 1) = {{9}\over{{10}}} = P({V_3} = 1|{V_1} = 1,{V_2} = 1),$
 
 $P({V_3} = 1|{V_1} = 0,{V_2} = 2) = {{9}\over{{10}}} = P({V_3} = 1|{V_1} = 1,{V_2} = 2).$
$P({V_3} = 1|{V_1} = 0,{V_2} = 2) = {{9}\over{{10}}} = P({V_3} = 1|{V_1} = 1,{V_2} = 2).$
Table 1. Example probability distribution
|  ${V_1}$ |  ${V_2}$ |  ${V_3}$ |  $P$ | 
|---|---|---|---|
|  $0$ |  $0$ |  $0$ |  ${2}\over{{66}}$ | 
|  $0$ |  $0$ |  $1$ |  ${1}\over{{66}}$ | 
|  $0$ |  $1$ |  $0$ |  ${1}\over{{66}}$ | 
|  $0$ |  $1$ |  $1$ |  ${9}\over{{66}}$ | 
|  $0$ |  $2$ |  $0$ |  ${2}\over{{66}}$ | 
|  $0$ |  $2$ |  $1$ |  ${{18}}\over{{66}}$ | 
|  $1$ |  $0$ |  $0$ |  ${2}\over{{66}}$ | 
|  $1$ |  $0$ |  $1$ |  ${1}\over{{66}}$ | 
|  $1$ |  $1$ |  $0$ |  ${2}\over{{66}}$ | 
|  $1$ |  $1$ |  $1$ |  ${{18}}\over{{66}}$ | 
|  $1$ |  $2$ |  $0$ |  ${1}\over{{66}}$ | 
|  $1$ |  $2$ |  $1$ |  ${9}\over{{66}}$ | 
 The interesting thing about this distribution is that it represents 
 ${V_1}$
 as being correlated with
${V_1}$
 as being correlated with 
 ${V_2}$
,
${V_2}$
, 
 ${V_2}$
 as being correlated with
${V_2}$
 as being correlated with 
 ${V_3}$
, and
${V_3}$
, and 
 ${V_1}$
 being independent of
${V_1}$
 being independent of 
 ${V_3}$
 both unconditionally and conditional on any value of
${V_3}$
 both unconditionally and conditional on any value of 
 ${V_2}$
. To see how this happens, note that while
${V_2}$
. To see how this happens, note that while 
 ${V_1}$
 and
${V_1}$
 and 
 ${V_3}$
 are both binary variables,
${V_3}$
 are both binary variables, 
 ${V_2}$
 is ternary. Conditioning on
${V_2}$
 is ternary. Conditioning on 
 ${V_1} = 0$
 makes no difference to the probability that
${V_1} = 0$
 makes no difference to the probability that 
 ${V_2} = 0$
, but it does make a difference to the probabilities of
${V_2} = 0$
, but it does make a difference to the probabilities of 
 ${V_2} = 1$
 and
${V_2} = 1$
 and 
 ${V_2} = 2$
. However, the only way in which
${V_2} = 2$
. However, the only way in which 
 ${V_2}$
 is relevant to
${V_2}$
 is relevant to 
 ${V_3}$
 is that
${V_3}$
 is that 
 ${V_3}$
’s probability changes depending on whether
${V_3}$
’s probability changes depending on whether 
 ${V_2} = 0$
—i.e., if
${V_2} = 0$
—i.e., if 
 ${V_2} \ne 0$
, it makes no difference to
${V_2} \ne 0$
, it makes no difference to 
 ${V_3}$
 whether
${V_3}$
 whether 
 ${V_2} = 1$
 or
${V_2} = 1$
 or 
 ${V_2} = 2$
.
${V_2} = 2$
.
 Because this distribution has these properties, it is possibly realized by a slight variant of McDermott’s (Reference McDermott1995) famous “dog bite” counterexample to the transitivity of causation.Footnote 
10
 In this scenario, a terrorist is contemplating pressing a button that will probably detonate a bomb, which will probably blow up a football stadium, but there is a dog present who threatens to bite the terrorist’s right hand. As things turn out, the probability that the terrorist detonates the bomb is not affected by whether the dog bites, but the probability that the terrorist uses her right or left hand (in the event that she pushes the button) is affected. If we allow 
 ${V_1}$
 to represent whether the dog bites,
${V_1}$
 to represent whether the dog bites, 
 ${V_2}$
 to represent whether the button is pressed, and if so with which hand, and
${V_2}$
 to represent whether the button is pressed, and if so with which hand, and 
 ${V_3}$
 to represent whether the explosion occurs, then our example distribution provides a natural model of this situation.Footnote 
11
${V_3}$
 to represent whether the explosion occurs, then our example distribution provides a natural model of this situation.Footnote 
11
 
 ${V_1}$
 is correlated with
${V_1}$
 is correlated with 
 ${V_2}$
 because the dog biting makes a difference to which hand the terrorist uses (if they push).
${V_2}$
 because the dog biting makes a difference to which hand the terrorist uses (if they push). 
 ${V_2}$
 is correlated with
${V_2}$
 is correlated with 
 ${V_3}$
 because pushing (with either hand) obviously increases the probability of an explosion. And
${V_3}$
 because pushing (with either hand) obviously increases the probability of an explosion. And 
 ${V_3}$
 is probabilistically independent of
${V_3}$
 is probabilistically independent of 
 ${V_1}$
 both unconditionally and conditional on any value of
${V_1}$
 both unconditionally and conditional on any value of 
 ${V_2}$
 because
${V_2}$
 because 
 ${V_1}$
 can only make a difference to which hand is used, which is irrelevant to whether there’s an explosion (which depends only on whether the button is pushed at all). Figure 1 represents the direct causal relationships that intuitively obtain in the example.
${V_1}$
 can only make a difference to which hand is used, which is irrelevant to whether there’s an explosion (which depends only on whether the button is pushed at all). Figure 1 represents the direct causal relationships that intuitively obtain in the example.

Figure 1. An intransitive chain.
 Importantly, this probability distribution can likewise be realized by a somewhat similar scenario where the arrow between 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
 goes in the opposite direction. Let
${V_3}$
 goes in the opposite direction. Let 
 ${V_1}$
 again represent whether the dog bites,
${V_1}$
 again represent whether the dog bites, 
 ${V_2}$
 again represent whether the button is pressed, and if so with which hand, but allow
${V_2}$
 again represent whether the button is pressed, and if so with which hand, but allow 
 ${V_3}$
 to now represent whether the terrorist receives orders from her superior to push the button. This scenario likewise is intuitively consistent with the probability distribution under consideration since nothing seems to block us from treating the explosion from the first example as probabilistically indistinguishable from receiving orders in the second example—i.e., it could be that when the terrorist pushes the button, it is probably because she received orders to do so, but also that the probability that she received orders is not at all further affected by whether she pushes with her right hand or her left hand.Footnote 
12
 But in this case, the intuitive causal graph over the very same distribution is now as shown in Figure 2.Footnote 
13
${V_3}$
 to now represent whether the terrorist receives orders from her superior to push the button. This scenario likewise is intuitively consistent with the probability distribution under consideration since nothing seems to block us from treating the explosion from the first example as probabilistically indistinguishable from receiving orders in the second example—i.e., it could be that when the terrorist pushes the button, it is probably because she received orders to do so, but also that the probability that she received orders is not at all further affected by whether she pushes with her right hand or her left hand.Footnote 
12
 But in this case, the intuitive causal graph over the very same distribution is now as shown in Figure 2.Footnote 
13

Figure 2. An unidentifiable collider.
 At this stage, it’s useful to think through the way in which both of these candidate causal structures generate violations of the CFC. The structure in Figure 1 violates the CFC with respect to our distribution because it does not entail (via the CMC) that 
 ${V_3} \bot \hskip-0.5pc\bot {V_1}$
, even though
${V_3} \bot \hskip-0.5pc\bot {V_1}$
, even though 
 ${V_3}$
 is in fact unconditionally independent of
${V_3}$
 is in fact unconditionally independent of 
 ${V_1}$
 in the distribution. The structure in Figure 2 violates the CFC with respect to our distribution because it does not entail (via the CMC) that
${V_1}$
 in the distribution. The structure in Figure 2 violates the CFC with respect to our distribution because it does not entail (via the CMC) that 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, even though
${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, even though 
 ${V_1}$
 is in fact conditionally independent of
${V_1}$
 is in fact conditionally independent of 
 ${V_3}$
 given
${V_3}$
 given 
 ${V_2}$
 in the distribution. In the parlance of graphical causal models, this is a case where conditioning on a collider fails to induce a correlation between two unconditionally independent causes. The examples here illustrate that intransitive causal chains bear the same probabilistic signature as cases where conditioning on a collider fails to induce a correlation, and that both of these phenomena yield violations of the CFC.Footnote 
14
 Now, since both of the candidate causal structures are ruled out by the CFC, one might hope to appeal to a less demanding condition as we consider these cases. Enter the significantly weaker and arguably axiomatic Causal Minimality Condition (CMIN).
${V_2}$
 in the distribution. In the parlance of graphical causal models, this is a case where conditioning on a collider fails to induce a correlation between two unconditionally independent causes. The examples here illustrate that intransitive causal chains bear the same probabilistic signature as cases where conditioning on a collider fails to induce a correlation, and that both of these phenomena yield violations of the CFC.Footnote 
14
 Now, since both of the candidate causal structures are ruled out by the CFC, one might hope to appeal to a less demanding condition as we consider these cases. Enter the significantly weaker and arguably axiomatic Causal Minimality Condition (CMIN).
 The CMIN says that a causal graph 
 $G$
 over V is compatible with a probability distribution
$G$
 over V is compatible with a probability distribution 
 $\mathbb{P}$
 over V exactly when there exists no proper sub-graph of
$\mathbb{P}$
 over V exactly when there exists no proper sub-graph of 
 $G$
 that satisfies the CMC with
$G$
 that satisfies the CMC with 
 $\mathbb{P}$
, where a graph qualifies as a proper sub-graph of
$\mathbb{P}$
, where a graph qualifies as a proper sub-graph of 
 $G$
 if and only if (i) it excludes some arrow(s) in
$G$
 if and only if (i) it excludes some arrow(s) in 
 $G$
, and (ii) nothing else is changed (such that all remaining arrows are oriented in the same direction as in
$G$
, and (ii) nothing else is changed (such that all remaining arrows are oriented in the same direction as in 
 $G$
). Intuitively, the CMIN requires that a causal graph should include just enough arrows to guarantee satisfaction of the CMC, and should not include any other arrows that are not necessary for that purpose. It’s not hard to see that the structures in Figures 1 and 2 both satisfy the CMIN with respect to the given distribution. Since both
$G$
). Intuitively, the CMIN requires that a causal graph should include just enough arrows to guarantee satisfaction of the CMC, and should not include any other arrows that are not necessary for that purpose. It’s not hard to see that the structures in Figures 1 and 2 both satisfy the CMIN with respect to the given distribution. Since both 
 ${V_1}$
 and
${V_1}$
 and 
 ${V_3}$
 are pairwise unconditionally correlated with
${V_3}$
 are pairwise unconditionally correlated with 
 ${V_2}$
, deleting either of the arrows in either structure will yield a violation of the CMC by entailing the unconditional independence of
${V_2}$
, deleting either of the arrows in either structure will yield a violation of the CMC by entailing the unconditional independence of 
 ${V_2}$
 with one of the other two variables. Ergo, the CMIN manages to capture the sentiment that both of these causal structures are compatible with the given the probability distribution, since both satisfy both the CMC and the CMIN when paired with the given distribution.Footnote 
15
${V_2}$
 with one of the other two variables. Ergo, the CMIN manages to capture the sentiment that both of these causal structures are compatible with the given the probability distribution, since both satisfy both the CMC and the CMIN when paired with the given distribution.Footnote 
15
 The upshot of this is that we can’t identify the direction of the arrow between 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
 simply by consulting the probability distribution over
${V_3}$
 simply by consulting the probability distribution over 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
, and
${V_2}$
, and 
 ${V_3}$
 when
${V_3}$
 when 
 ${V_1}$
 is known to be exogenous. More generally, when the only conditional independencies that obtain are
${V_1}$
 is known to be exogenous. More generally, when the only conditional independencies that obtain are 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}$
 and
${V_1} \bot \hskip-0.5pc\bot {V_3}$
 and 
 ${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, then the set of causal graphs that can be paired with the distribution to satisfy the CMC and the CMIN are
${V_1} \bot \hskip-0.5pc\bot {V_3}|{V_2}$
, then the set of causal graphs that can be paired with the distribution to satisfy the CMC and the CMIN are 
 ${V_1} \to {V_2}, \to {V_3}$
,
${V_1} \to {V_2}, \to {V_3}$
, 
 ${V_1} \leftarrow {V_2}, \to {V_3}$
,
${V_1} \leftarrow {V_2}, \to {V_3}$
, 
 ${V_1} \leftarrow {V_2}, \leftarrow {V_3}$
, and
${V_1} \leftarrow {V_2}, \leftarrow {V_3}$
, and 
 ${V_1} \to {V_2}, \leftarrow {V_3}$
.Footnote 
16
 Thus, while the CMC and the CMIN combine to tell us something about how many arrows there must be (since any graph with three arrows would fail to qualify as minimal and any graph with one or fewer arrows would fail to qualify as Markovian), they jointly tell us nothing about the direction of the arrows that there are.
${V_1} \to {V_2}, \leftarrow {V_3}$
.Footnote 
16
 Thus, while the CMC and the CMIN combine to tell us something about how many arrows there must be (since any graph with three arrows would fail to qualify as minimal and any graph with one or fewer arrows would fail to qualify as Markovian), they jointly tell us nothing about the direction of the arrows that there are.
 Of course, it’s old news that we cannot derive causal order from merely probabilistic information since, e.g., 
 ${V_1} \to {V_2}, \to {V_3}$
,
${V_1} \to {V_2}, \to {V_3}$
, 
 ${V_1} \leftarrow {V_2}, \to {V_3}$
, and
${V_1} \leftarrow {V_2}, \to {V_3}$
, and 
 ${V_1} \leftarrow {V_2}, \leftarrow {V_3}$
 form a “Markov equivalence class” (meaning that they entail the same independencies via the CMC) and are therefore indistinguishable even given the CFC. But the kind of underdetermination at play here is of an especially vexing variety since causal graphs that are not Markov equivalent end up being indistinguishable, even when they contain a fixed number of arrows.Footnote 
17
 In the present context, this is pressing because the assumption that the source of variation
${V_1} \leftarrow {V_2}, \leftarrow {V_3}$
 form a “Markov equivalence class” (meaning that they entail the same independencies via the CMC) and are therefore indistinguishable even given the CFC. But the kind of underdetermination at play here is of an especially vexing variety since causal graphs that are not Markov equivalent end up being indistinguishable, even when they contain a fixed number of arrows.Footnote 
17
 In the present context, this is pressing because the assumption that the source of variation 
 ${V_1}$
 is exogenous is sufficient to rule out all but
${V_1}$
 is exogenous is sufficient to rule out all but 
 ${V_1} \to {V_2}, \to {V_3}$
 when
${V_1} \to {V_2}, \to {V_3}$
 when 
 ${V_1}$
 is probabilistically independent of
${V_1}$
 is probabilistically independent of 
 ${V_3}$
, given the CFC. But when we relax the CFC so that we can model our examples (and instead assume the CMIN), both
${V_3}$
, given the CFC. But when we relax the CFC so that we can model our examples (and instead assume the CMIN), both 
 ${V_1} \to {V_2}, \to {V_3}$
 and
${V_1} \to {V_2}, \to {V_3}$
 and 
 ${V_1} \to {V_2}, \leftarrow {V_3}$
 are compatible with
${V_1} \to {V_2}, \leftarrow {V_3}$
 are compatible with 
 ${V_1}$
’s exogeneity and probabilistic independence with
${V_1}$
’s exogeneity and probabilistic independence with 
 ${V_3}$
. Thus, when the setting is general enough to incorporate examples like these, we cannot always infer causal direction from exogenous variation.
${V_3}$
. Thus, when the setting is general enough to incorporate examples like these, we cannot always infer causal direction from exogenous variation.
3. A blocked escape route
 Might you always be able to solve the problem posed by these cases by additionally looking at what happens when there is an exogenous source of variation of 
 $Y$
 (or
$Y$
 (or 
 ${V_3}$
)? No. Here’s why. Recall the example where
${V_3}$
)? No. Here’s why. Recall the example where 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
, and
${V_2}$
, and 
 ${V_3}$
 refer to the dog bite / button pushing / explosion variables. Now fine-grain the values of
${V_3}$
 refer to the dog bite / button pushing / explosion variables. Now fine-grain the values of 
 ${V_3}$
 further so that instead of simply distinguishing between the explode / don’t explode possibilities,
${V_3}$
 further so that instead of simply distinguishing between the explode / don’t explode possibilities, 
 ${V_3}$
 now distinguishes between three possibilities, namely (i) stadium explosion, (ii) no stadium explosion and tomorrow’s football match happens as scheduled, and (iii) no stadium explosion and tomorrow’s football match is canceled. Now, let
${V_3}$
 now distinguishes between three possibilities, namely (i) stadium explosion, (ii) no stadium explosion and tomorrow’s football match happens as scheduled, and (iii) no stadium explosion and tomorrow’s football match is canceled. Now, let 
 ${V_4}$
 be an exogenous cause of
${V_4}$
 be an exogenous cause of 
 ${V_3}$
 that denotes whether there are severe thunderstorms tomorrow. Here,
${V_3}$
 that denotes whether there are severe thunderstorms tomorrow. Here, 
 ${V_4}$
 is pairwise correlated with
${V_4}$
 is pairwise correlated with 
 ${V_3}$
 and
${V_3}$
 and 
 ${V_3}$
 is pairwise correlated with
${V_3}$
 is pairwise correlated with 
 ${V_2}$
, but we can imagine that
${V_2}$
, but we can imagine that 
 ${V_4}$
 is both conditionally and unconditionally uncorrelated with
${V_4}$
 is both conditionally and unconditionally uncorrelated with 
 ${V_2}$
 since tomorrow’s weather isn’t predictive of how and whether the button gets pushed, no matter whether we condition on any of the values of
${V_2}$
 since tomorrow’s weather isn’t predictive of how and whether the button gets pushed, no matter whether we condition on any of the values of 
 ${V_3}$
.Footnote 
18
${V_3}$
.Footnote 
18
 This means that if our problem is to infer the direction of causation between 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
, then we are in the same position as before. For when we include an exogenous source of variation of
${V_3}$
, then we are in the same position as before. For when we include an exogenous source of variation of 
 ${V_3}$
 (namely,
${V_3}$
 (namely, 
 ${V_4}$
) in the causal graph, it turns out that the CMC and the CMIN sanction both
${V_4}$
) in the causal graph, it turns out that the CMC and the CMIN sanction both 
 ${V_2} \to {V_3} \to {V_4}$
 and
${V_2} \to {V_3} \to {V_4}$
 and 
 ${V_2} \to {V_3} \leftarrow {V_4}$
 as compatible with the distribution at hand.
${V_2} \to {V_3} \leftarrow {V_4}$
 as compatible with the distribution at hand.
 The upshot of this is that there are cases where looking at exogenous sources of variation of both 
 $X$
 and
$X$
 and 
 $Y$
 is not sufficient for unveiling the direction of the causal relationship between
$Y$
 is not sufficient for unveiling the direction of the causal relationship between 
 $X$
 and
$X$
 and 
 $Y$
. To see this, it is helpful to consider all four of the variables
$Y$
. To see this, it is helpful to consider all four of the variables 
 ${E_X}$
,
${E_X}$
, 
 $X$
,
$X$
, 
 $Y$
, and
$Y$
, and 
 ${E_Y}$
 as a single variable set (rather than just considering the subsets
${E_Y}$
 as a single variable set (rather than just considering the subsets 
 $\left\{ {{E_X},X,Y} \right\}$
 and
$\left\{ {{E_X},X,Y} \right\}$
 and 
 $\left\{ {{E_Y},X,Y} \right\}$
 in isolation. In our running example, this means looking at a distribution that is defined over
$\left\{ {{E_Y},X,Y} \right\}$
 in isolation. In our running example, this means looking at a distribution that is defined over 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
,
${V_2}$
, 
 ${V_3}$
, and
${V_3}$
, and 
 ${V_4}$
 (where
${V_4}$
 (where 
 ${V_3}$
 is ternary). In this distribution, (i)
${V_3}$
 is ternary). In this distribution, (i) 
 ${V_2}$
 is unconditionally pairwise correlated with both
${V_2}$
 is unconditionally pairwise correlated with both 
 ${V_1}$
 and
${V_1}$
 and 
 ${V_3}$
, (ii)
${V_3}$
, (ii) 
 ${V_3}$
 is unconditionally pairwise correlated with both
${V_3}$
 is unconditionally pairwise correlated with both 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_4}$
, (iii)
${V_4}$
, (iii) 
 ${V_1}$
 is independent of
${V_1}$
 is independent of 
 ${V_4}$
 both unconditionally and conditional on both
${V_4}$
 both unconditionally and conditional on both 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
, (iv)
${V_3}$
, (iv) 
 ${V_1}$
 is independent of
${V_1}$
 is independent of 
 ${V_3}$
 both unconditionally and conditional on
${V_3}$
 both unconditionally and conditional on 
 ${V_2}$
, and (v)
${V_2}$
, and (v) 
 ${V_4}$
 is independent of
${V_4}$
 is independent of 
 ${V_2}$
 both unconditionally and conditional on
${V_2}$
 both unconditionally and conditional on 
 ${V_3}$
.Footnote 
19
 Now consider the possible causal structures over this four variable set, as depicted in Figures 3 and 4.
${V_3}$
.Footnote 
19
 Now consider the possible causal structures over this four variable set, as depicted in Figures 3 and 4.

Figure 3. Four variable structure 1.

Figure 4. Four variable structure 2.
 Given the probability distribution we just described, both of these causal structures satisfy both the CMC and the CMIN since they don’t entail any false independencies via the CMC, but deleting any arrows would force them to do so. Now, in the case where we are considering the variable set 
 $\left\{ {{E_Y},{E_Y},X,Y} \right\}$
 with the hope of discerning the direction of causation between
$\left\{ {{E_Y},{E_Y},X,Y} \right\}$
 with the hope of discerning the direction of causation between 
 $X$
 and
$X$
 and 
 $Y$
, we need to distinguish between the structures
$Y$
, we need to distinguish between the structures 
 ${E_X} \to X \to Y \leftarrow {E_Y}$
 and
${E_X} \to X \to Y \leftarrow {E_Y}$
 and 
 ${E_X} \to X \leftarrow Y \leftarrow {E_Y}$
. But if we’re in a situation like the one just described, we won’t be able to distinguish between these two structures using the CMC and the CMIN alone (since doing so requires being able to discern between the structures depicted in Figures 3 and 4). Put differently, when our problem is to infer the direction of non-spurious causation between
${E_X} \to X \leftarrow Y \leftarrow {E_Y}$
. But if we’re in a situation like the one just described, we won’t be able to distinguish between these two structures using the CMC and the CMIN alone (since doing so requires being able to discern between the structures depicted in Figures 3 and 4). Put differently, when our problem is to infer the direction of non-spurious causation between 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
, there is no guarantee that we’ll be able to solve this problem simply by embedding
${V_3}$
, there is no guarantee that we’ll be able to solve this problem simply by embedding 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
 in a distribution that includes exogenous sources of variation for both, since this leaves open the possibility that the distribution at hand will include the conditional independencies that we’ve described over
${V_3}$
 in a distribution that includes exogenous sources of variation for both, since this leaves open the possibility that the distribution at hand will include the conditional independencies that we’ve described over 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
,
${V_2}$
, 
 ${V_3}$
, and
${V_3}$
, and 
 ${V_4}$
—in which case both Figures 3 and 4 will be admissible. So overall, there is simply no guarantee that considering an exogenous cause of
${V_4}$
—in which case both Figures 3 and 4 will be admissible. So overall, there is simply no guarantee that considering an exogenous cause of 
 $Y$
 as well as an exogenous cause of
$Y$
 as well as an exogenous cause of 
 $X$
 will help us to determine the direction of causation between
$X$
 will help us to determine the direction of causation between 
 $X$
 and
$X$
 and 
 $Y$
.
$Y$
.
4. Possible escape routes
Does this spell doom for this approach to understanding causal direction in causal Bayes nets?
 Maybe. If so, then perhaps we should revisit the temporal order strategy. We can show that if causes always temporally precede their effects, then if 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
, and
${V_2}$
, and 
 ${V_3}$
 are temporally ordered, there is a uniquely admissible causal graph given the CMC and the CMIN in the cases that concern us here.Footnote 
20
 For instance, if
${V_3}$
 are temporally ordered, there is a uniquely admissible causal graph given the CMC and the CMIN in the cases that concern us here.Footnote 
20
 For instance, if 
 ${V_2}$
 temporally precedes
${V_2}$
 temporally precedes 
 ${V_3}$
 and we assume that (i)
${V_3}$
 and we assume that (i) 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
 are non-spuriously correlated, (ii)
${V_3}$
 are non-spuriously correlated, (ii) 
 ${V_2}$
 temporally precedes
${V_2}$
 temporally precedes 
 ${V_3}$
, and (iii) causes always temporally precede their effects, it follows that
${V_3}$
, and (iii) causes always temporally precede their effects, it follows that 
 ${V_1} \to {V_2} \to {V_3}$
 is admissible but
${V_1} \to {V_2} \to {V_3}$
 is admissible but 
 ${V_1} \to {V_2} \leftarrow {V_3}$
 is not—i.e., we can infer the direction of causation between
${V_1} \to {V_2} \leftarrow {V_3}$
 is not—i.e., we can infer the direction of causation between 
 ${V_2}$
 and
${V_2}$
 and 
 ${V_3}$
. But when we opt for this route, we abandon the project of understanding the asymmetry of causation in terms of any probabilistic (in)dependencies since the inferential work is ultimately accomplished by the temporal ordering, not any probabilistic (in)dependence that is secured by the CMC or the CMIN. Moreover, this would seem to involve maintaining that causes temporally precede their effects as a matter of conceptual necessity. As we mentioned at the outset, it is controversial whether this is true. But even if we grant this conceptual claim to the defender of temporal precedence, it doesn’t resolve the current issue in every realistic inference problem since we are often confronted with variable sets whose temporal ordering is unknown to us. Thus, even if relying on temporal orderings is a good strategy whenever said orderings are available, the question of what to do when they are unavailable remains.
${V_3}$
. But when we opt for this route, we abandon the project of understanding the asymmetry of causation in terms of any probabilistic (in)dependencies since the inferential work is ultimately accomplished by the temporal ordering, not any probabilistic (in)dependence that is secured by the CMC or the CMIN. Moreover, this would seem to involve maintaining that causes temporally precede their effects as a matter of conceptual necessity. As we mentioned at the outset, it is controversial whether this is true. But even if we grant this conceptual claim to the defender of temporal precedence, it doesn’t resolve the current issue in every realistic inference problem since we are often confronted with variable sets whose temporal ordering is unknown to us. Thus, even if relying on temporal orderings is a good strategy whenever said orderings are available, the question of what to do when they are unavailable remains.
 There may also be other ways to solve the problem that do not invoke temporal order. One possibility is that we can avoid the problematic examples discussed here by restricting the domain of the causal relation to binary variables. This is not frequently discussed in the philosophical literature (Spohn Reference Spohn, Galavotti, Suppes and Constantini2001; Reference Spohn2012 are exceptions), but it’s easy to see how this question arises against the backdrop of a contrastivist account of causation.Footnote 
21
 Toward this end, Schaffer (Reference Schaffer2010) has argued that when we treat variables as the relata of the causal relation, we take a step toward agreeing with the contrastivist that the causal relation is at least a four-place relation, since saying that 
 $X$
 causes
$X$
 causes 
 $Y$
 may just be another way of saying that one value of
$Y$
 may just be another way of saying that one value of 
 $X$
 rather than another causes one value of
$X$
 rather than another causes one value of 
 $Y$
 rather than another. But if variables with more than two values can be causal relata, then as Schaffer (Reference Schaffer2010, fn. 28) notes, the contrastivist framework must be extended to allow for sets of causal contrasts and sets of effectual contrasts. One way to defend using exogenous variation as a general strategy for inferring causal direction may involve arguing that any such extension would be unprincipled.
$Y$
 rather than another. But if variables with more than two values can be causal relata, then as Schaffer (Reference Schaffer2010, fn. 28) notes, the contrastivist framework must be extended to allow for sets of causal contrasts and sets of effectual contrasts. One way to defend using exogenous variation as a general strategy for inferring causal direction may involve arguing that any such extension would be unprincipled.
 Finally, a third possible escape route that one could explore would be to identify a special sub-class of exogenous causes which have properties that make it impossible for the kinds of examples we’ve discussed here to arise. In particular, one could try to show that whenever 
 ${E_X}$
 is a certain kind of exogenous cause of
${E_X}$
 is a certain kind of exogenous cause of 
 $X$
 and
$X$
 and 
 $X$
 is non-spuriously correlated with
$X$
 is non-spuriously correlated with 
 $Y$
, it will be impossible for
$Y$
, it will be impossible for 
 ${E_X}$
,
${E_X}$
, 
 $X$
, and
$X$
, and 
 $Y$
 to stand in the kind of probabilistic relationship that
$Y$
 to stand in the kind of probabilistic relationship that 
 ${V_1}$
,
${V_1}$
, 
 ${V_2}$
, and
${V_2}$
, and 
 ${V_3}$
 stand in in our earlier examples. A natural starting place here would be to consider idealized exogenous interventions that deterministically set the values of their direct effects. It may turn out that such interventions do indeed avoid these kinds of problematic examples.Footnote 
22
 But even if that is the case, the question remains of whether there exists any less idealized subset of exogenous causes that allow one to avoid the problem. That is, since it is acknowledged by many that there are examples where no ideal “hard” intervention is adequate to the inference problem at issue,Footnote 
23
 it would be good to know exactly what kinds of “soft” interventions are suitable for inferring the direction of causation in cases like the ones described in this paper.
${V_3}$
 stand in in our earlier examples. A natural starting place here would be to consider idealized exogenous interventions that deterministically set the values of their direct effects. It may turn out that such interventions do indeed avoid these kinds of problematic examples.Footnote 
22
 But even if that is the case, the question remains of whether there exists any less idealized subset of exogenous causes that allow one to avoid the problem. That is, since it is acknowledged by many that there are examples where no ideal “hard” intervention is adequate to the inference problem at issue,Footnote 
23
 it would be good to know exactly what kinds of “soft” interventions are suitable for inferring the direction of causation in cases like the ones described in this paper.
5. Conclusion
In summary, we’ve pointed out:
(1) that there can exist Markov inequivalent graphs with the same number of arrows that cannot be distinguished by the combination of the CMC and any extant weakening of CFC;
(2) that this causes fundamental problems for inferring the direction of causation in causal Bayes nets from probabilistic (in)dependence with exogenous causes;Footnote 24 and
(3) that one might hope to resolve the problem by either (i) restricting one’s attention exclusively to binary variables, (ii) relying on temporal orderings, or (iii) identifying special sub-classes of exogenous causes that preclude the possibility of the examples discussed here.
Acknowledgments
For helpful discussion and comments, we are grateful to Malcolm Forster, Olav Vassend, Jiji Zhang, and the audience at the 2024 meeting of the Society for the Philosophy of Causation.
Funding and declarations
None to declare.
 
 








 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 





