# Correlation and Causation

#### tepples (727027) writes | about a year and a half ago

23tepples wrote:

Correlation implies 25% likelihood of causation. Either A causes B, B causes A, C causes A and B, or chance.

In this post, Immerman wrote:

tepples wrote:Correlation implies 25% likelihood of causation. Either A causes B, B causes A, C causes A and B, or chance.

In this post, Immerman wrote:

I *hate* seeing statistics abused. A 25% likelihood of causation is *not* implied. Yes, one of the four outcomes must be the case, but you don't know the relative probabilities of each. It's like grabbing a marble out of a bag containing red, green, blue, and yellow marbles - there's only four possibilities as to which color your marble is, but for all you know I filled the bag with blue marbles and just threw in a handful of the other colors, in which case it would be preposterous to claim a 25% chance of getting a red one.

I'm aware of the hyperbole in my illustration. They're probably not equally probable, but absent other evidence, one has to assume so. My point is that just because the probability isn't 100 percent doesn't mean it can always be treated as 0 percent. So if you want to plead false cause more effectively, explain why they're not equally probable. Be willing to discuss what further observations would be needed to show which of the four possibilities is most likely. But don't say "correlation does not imply causation" as if it were "correlation implies lack of causation" without providing evidence, as that's close to the fallacy fallacy and the black or white fallacy.

**This discussion has been automatically archived.** Discussion continues in Daniel Dvorkin's journal.

## Still wrong. (1)

## commisaro (1007549) | about a year and a half ago | (#40906731)

## Four infinities (1)

## tepples (727027) | about a year and a half ago | (#40907147)

## Re:Four infinities (1)

## spazdor (902907) | about a year and a half ago | (#40920535)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever. Read up on Hilbert's Hotel for an intuitive exploration of why this is so.

## Re:Four infinities (1)

## tepples (727027) | about a year and a half ago | (#40920771)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever.

There is a bijection between rational numbers in lowest terms and the positive integers. Therefore, they are equally infinite. Calling the cardinalities equal in the sense that their ratio is 1 is hyperbole, as I have already admitted. But to more directly address the point: How else would you recommend colorfully expressing "just because causation hasn't already been proved doesn't necessarily mean we should drop the investigation of causation"?

## Re:Four infinities (1)

## spazdor (902907) | about a year and a half ago | (#40920929)

I think your current sig does so pretty succinctly, actually.

## Logical Fallacy (1)

## lymang (207777) | about a year and a half ago | (#40906831)

I can't say I thought I'd actually learn something from clicking here to your journal, but I did. Honestly, I expected some more arguments but perhaps they'd be more interesting than the usual, as it seemed this was about an interesting topic. Instead, you've posted that excellent website which is now on my favorites. TYVM, and have a good day.

## o rly? (1)

## voltorb (2668983) | about a year and a half ago | (#40906877)

## Re:o rly? (1)

## tepples (727027) | about a year and a half ago | (#40907117)

## Re:o rly? (1)

## voltorb (2668983) | about a year and a half ago | (#40907247)

## Less precise than 25% yet fits in 120 characters (1)

## tepples (727027) | about a year and a half ago | (#40907469)

you gotta figure it out by guessing the theory and do experiments to question it's validity

And one needs to do the same thing to establish a lack of causation. But a lot of the arguments I've seen take the form "You haven't

alreadyproved causation; therefore, working one's ass off to prove it one way or the other is futile."Saying "given a physical correlation and a randomly picked theory that fits the phenomenon, the theory will involve physical cause with x chance." is just bad science.

So is "92.7 percent of statistics are pulled out of someone's large intestine", despite it being ironically self-demonstrating. So what should I say that's less precise than "25%" but greater than zero? The intended meaning, "not greater or less than the other possibilities unless shown otherwise", is too long to fit in a 120-character signature.

## Re:Less precise than 25% yet fits in 120 character (1)

## Immerman (2627577) | about a year and a half ago | (#40907573)

How about ...

Correlation implies one of four possibilities:

because really, that's all you can honestly say.

## Re:Less precise than 25% yet fits in 120 character (1)

## voltorb (2668983) | about a year and a half ago | (#40907701)

## Perhaps the right question is burden of proof (1)

## tepples (727027) | about a year and a half ago | (#40907877)

## Re:Perhaps the right question is burden of proof (1)

## voltorb (2668983) | about a year and a half ago | (#40907991)

## qualitative versus quantitative dishonesty (1)

## Immerman (2627577) | about a year and a half ago | (#40907259)

Oh, I completely agree that the use of "correlation does not imply causation" to dismiss the possibility of causation is a *huge* fallacy, and deserves to be called out. However, it's a qualitative fallacy, whereas yours is quantitative one. To assign a numerical probability to something when you have absolutely zero understanding of what the actual probabilities are is to be intellectually dishonest in a manner that brings nothing meaningful to the discussion and is likely to confuse the issue even further. There's a reason Twain liked the quote "There are lies, damned lies, and statistics" - as soon as you start throwing numbers around like they mean something people become more credulous, to do so *knowing* your numbers are almost certainly false is straight-up abuse of that fact.

As to arguing the specific probabilities - that can't be done except in the context of a specific correlation. NOTHING can be meaningfully said about the relative probabilities of the four unexamined possibilities, except that they are extreme unlikely to be either 0% or 100% - even if causation is shown to be the case it's unlikely to be 100% causative in any but the most trivial situations(which typically get codified as physical laws)

## some notes (1)

## retchdog (1319261) | about a year and a half ago | (#40910757)

the most obvious problem with your postulate is that it doesn't take the p-value into account. if i find correlation with p-value 0.00001, then the "likelihood" of it being chance should be lower than if the p-value was 0.1.

anyway, you're not really saying anything new. if you thought through what you are saying, you'd probably end up with bayesian inference or a more esoteric variant such as the dempster-shafer theory of evidence [wikipedia.org] .

in short, you need to establish the prior probability of each of your hypotheses and then evaluate the posterior probability of each, given the data.

and might i add there is at least one more category: that A and B each cause one another.

## Probabilities pulled from posterior (1)

## tepples (727027) | about a year and a half ago | (#40918655)

the most obvious problem with your postulate is that it doesn't take the p-value into account.

Anything quantitative about it (the "25%") is hyperbole, I admit. It's mostly directed at people who abuse "correlation does not imply causation" to imply "if causation has not already been proved, and if investigating it costs more than zero, then it should not be investigated". In addition, news sources that aren't paywalled tend to forget to report p-values.

anyway, you're not really saying anything new

I'm aware of that. Sometimes I have to repeat old things because new users haven't yet seen the old works.

and then evaluate the posterior probability

Which a lot of people unfamiliar with Bayesian inference might confuse with "you pulled the probabilities out of your posterior".

and might i add there is at least one more category: that A and B each cause one another.

I'm aware of chicken-and-egg loops in causality; see any [slashdot.org] of my posts about the lack of home theater PC games over the past several years.

## Re:Probabilities pulled from posterior (1)

## retchdog (1319261) | about a year and a half ago | (#40923919)

re posterior: people familiar with bayesian inference have the same objection. still, it's at least slightly better to establish prior probabilities which are then updated by seeing the evidence. what you're doing is saying that, whatever the data was, it's 25% across the board. if you ever want to get past this, you'll need something like bayesian inference or dempster-shafer.

re causality: my only point was that you have "A causes B" and "B causes A" as mutually exclusive categories. they aren't.

in total, i really don't see what you're getting at. you've interpreted "total ignorance" as a uniform probability distribution, which is not

completelyunreasonable (it adheres to the maximum entropy principle for example), except for the fact that given the experiment probably doesn't leave you withtotalignorance.just because it's slightly relevant, there's a modest paradox presented i think by bertrand russell. suppose you have a flared glass with some unknown amount of liquid in it. you can either be totally ignorant about the volume of liquid (at which point, due to the geometry of the glass, you know that the median depth of the liquid is slightly over the middle of the glass); or you can be totally ignorant about the depth of the liquid (at which point you know that the glass is probably less than half-full, again because it's flared out), but not both at the same time. the point is, it's hard to interpret ignorance as a probability.

## Re:Probabilities pulled from posterior (1)

## retchdog (1319261) | about a year and a half ago | (#40927857)

i see you've changed your sig to something more reasonable; thank you.

i still don't like "chance," since the whole point of statistics is to rule out certain kinds of chance. there are also details like "A causes Z which causes B," and so on, and i think "A causes B and B causes A" is also possible.

## Where can I find the original? (1)

## Taco Cowboy (5327) | about a year and a half ago | (#40914681)

I'm here but I'm confused

Where's the original discussion that led to this thread?

Thanks in advance !!

## Read the summary (1)

## tepples (727027) | about a year and a half ago | (#40918513)

In this post [slashdot.org] , Immerman wrote

Taco Cowboy wrote:

Where's the original discussion that led to this thread?

It was a reply to a signature, and I had installed the signature after having seen numerous abuses of "correlation does not imply causation" in Slashdot comments. I apologize that I can't provide the URLs of all these comments.

## Not enough data (1)

## geekoid (135745) | about a year and a half ago | (#41070133)

"but absent other evidence, one has to assume so"

and that assumption has been the downfall of many papers. It's also the same argument used to prop up things like acupuncture, homeopathy, chiropractors, and perpetual motion machines. "We observe X, can't explain it, therefore are pet solution must e the answer."

You simply to not have enough data to make any percentage guess.

## Or 0 percent (1)

## tepples (727027) | about a year and a half ago | (#41070737)