Beta

Slashdot: News for Nerds

×

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Correlation and Causation

tepples (727027) writes | about a year and a half ago

23

tepples wrote:

Correlation implies 25% likelihood of causation. Either A causes B, B causes A, C causes A and B, or chance.

In this post, Immerman wrote:

tepples wrote:

Correlation implies 25% likelihood of causation. Either A causes B, B causes A, C causes A and B, or chance.

In this post, Immerman wrote:

I *hate* seeing statistics abused. A 25% likelihood of causation is *not* implied. Yes, one of the four outcomes must be the case, but you don't know the relative probabilities of each. It's like grabbing a marble out of a bag containing red, green, blue, and yellow marbles - there's only four possibilities as to which color your marble is, but for all you know I filled the bag with blue marbles and just threw in a handful of the other colors, in which case it would be preposterous to claim a 25% chance of getting a red one.

I'm aware of the hyperbole in my illustration. They're probably not equally probable, but absent other evidence, one has to assume so. My point is that just because the probability isn't 100 percent doesn't mean it can always be treated as 0 percent. So if you want to plead false cause more effectively, explain why they're not equally probable. Be willing to discuss what further observations would be needed to show which of the four possibilities is most likely. But don't say "correlation does not imply causation" as if it were "correlation implies lack of causation" without providing evidence, as that's close to the fallacy fallacy and the black or white fallacy.

This discussion has been automatically archived. Discussion continues in Daniel Dvorkin's journal.

Still wrong. (1)

commisaro (1007549) | about a year and a half ago | (#40906731)

C is an infinite class of possible "third-causes". Therefore there are infinitely more than 4 possible outcomes.

Four infinities (1)

tepples (727027) | about a year and a half ago | (#40907147)

And there are infinite classes of sets of intermediate steps through which A causes B or B causes A. There are also infinite chance mechanisms. Therefore, all four kinds of outcome are still infinite. I could be wrong about their being equally infinite, however, if one is countable [wikipedia.org] and the other not, like integers and reals [wikipedia.org] .

Re:Four infinities (1)

spazdor (902907) | about a year and a half ago | (#40920535)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever. Read up on Hilbert's Hotel for an intuitive exploration of why this is so.

Re:Four infinities (1)

tepples (727027) | about a year and a half ago | (#40920771)

I'm afraid you cannot divide one infinity by another and get back a fraction, not ever.

There is a bijection between rational numbers in lowest terms and the positive integers. Therefore, they are equally infinite. Calling the cardinalities equal in the sense that their ratio is 1 is hyperbole, as I have already admitted. But to more directly address the point: How else would you recommend colorfully expressing "just because causation hasn't already been proved doesn't necessarily mean we should drop the investigation of causation"?

Re:Four infinities (1)

spazdor (902907) | about a year and a half ago | (#40920929)

I think your current sig does so pretty succinctly, actually.

Logical Fallacy (1)

lymang (207777) | about a year and a half ago | (#40906831)

I can't say I thought I'd actually learn something from clicking here to your journal, but I did. Honestly, I expected some more arguments but perhaps they'd be more interesting than the usual, as it seemed this was about an interesting topic. Instead, you've posted that excellent website which is now on my favorites. TYVM, and have a good day.

o rly? (1)

voltorb (2668983) | about a year and a half ago | (#40906877)

"probably not equally probable" "one has to assume so" Statistical theory of theories? Get out of here kid, you don't have a clue what you're talking about. The world doesn't run on bad logic and philosophy since Galileo. Wanna be useful? Tell me about the hidden link between quantum correlations (pick any) and physical cause and get a paper out and shake the foundations of modern physics.

Re:o rly? (1)

tepples (727027) | about a year and a half ago | (#40907117)

So what's the correct way to illustrate that "correlation does not imply causation" does not imply "one should not investigate the likelihood of causation"?

Re:o rly? (1)

voltorb (2668983) | about a year and a half ago | (#40907247)

Look son, if you have some sort of correlation ---and I'm not talking about an imaginary one that exists in a fantasy world, a real physical correlation--- you gotta figure it out by guessing the theory and do experiments to question it's validity. Then you need to work your ass off and work out whether your theory has anything to do with causation. There are no such shortcuts. Saying "given a physical correlation and a randomly picked theory that fits the phenomenon, the theory will involve physical cause with x chance." is just bad science.

Less precise than 25% yet fits in 120 characters (1)

tepples (727027) | about a year and a half ago | (#40907469)

you gotta figure it out by guessing the theory and do experiments to question it's validity

And one needs to do the same thing to establish a lack of causation. But a lot of the arguments I've seen take the form "You haven't already proved causation; therefore, working one's ass off to prove it one way or the other is futile."

Saying "given a physical correlation and a randomly picked theory that fits the phenomenon, the theory will involve physical cause with x chance." is just bad science.

So is "92.7 percent of statistics are pulled out of someone's large intestine", despite it being ironically self-demonstrating. So what should I say that's less precise than "25%" but greater than zero? The intended meaning, "not greater or less than the other possibilities unless shown otherwise", is too long to fit in a 120-character signature.

Re:Less precise than 25% yet fits in 120 character (1)

Immerman (2627577) | about a year and a half ago | (#40907573)

Correlation implies one of four possibilities: ...
because really, that's all you can honestly say.

Re:Less precise than 25% yet fits in 120 character (1)

voltorb (2668983) | about a year and a half ago | (#40907701)

You're missing the whole point by insisting on talking about the wrong question, there is no such thing as "statistical theory of correlation theories"! It doesn't matter whether you state a weaker condition and say nonzero, physicists (or anyone who's doing real world science) just don't work like this! There are no-go theorems in physics and they're useful that's because they exactly say something can't happen in a theory (zero chance). But saying "X may or may not imply Y in a theory" is just useless, plain useless and it contains absolutely no information!

Perhaps the right question is burden of proof (1)

tepples (727027) | about a year and a half ago | (#40907877)

I'll grant that I have likely been talking about the wrong question. Perhaps the right question is where the burden of proof should lie. In a lot of Slashdot stories about studies showing correlation, the attitude I see in several comments is "It should be treated as chance until proven otherwise, and I refuse to endorse committing resources to prove otherwise." The former is innocent until proven guilty, which I'll grant for now. The latter corresponds to a desire to shut down the police and the prosecution.

Re:Perhaps the right question is burden of proof (1)

voltorb (2668983) | about a year and a half ago | (#40907991)

I'm not much familiar with Slashdot, but now I see your point. It's the tenet of pseudo-science and it is unfortunately everywhere, even in this age.

qualitative versus quantitative dishonesty (1)

Immerman (2627577) | about a year and a half ago | (#40907259)

Oh, I completely agree that the use of "correlation does not imply causation" to dismiss the possibility of causation is a *huge* fallacy, and deserves to be called out. However, it's a qualitative fallacy, whereas yours is quantitative one. To assign a numerical probability to something when you have absolutely zero understanding of what the actual probabilities are is to be intellectually dishonest in a manner that brings nothing meaningful to the discussion and is likely to confuse the issue even further. There's a reason Twain liked the quote "There are lies, damned lies, and statistics" - as soon as you start throwing numbers around like they mean something people become more credulous, to do so *knowing* your numbers are almost certainly false is straight-up abuse of that fact.

As to arguing the specific probabilities - that can't be done except in the context of a specific correlation. NOTHING can be meaningfully said about the relative probabilities of the four unexamined possibilities, except that they are extreme unlikely to be either 0% or 100% - even if causation is shown to be the case it's unlikely to be 100% causative in any but the most trivial situations(which typically get codified as physical laws)

some notes (1)

retchdog (1319261) | about a year and a half ago | (#40910757)

the most obvious problem with your postulate is that it doesn't take the p-value into account. if i find correlation with p-value 0.00001, then the "likelihood" of it being chance should be lower than if the p-value was 0.1.

anyway, you're not really saying anything new. if you thought through what you are saying, you'd probably end up with bayesian inference or a more esoteric variant such as the dempster-shafer theory of evidence [wikipedia.org] .

in short, you need to establish the prior probability of each of your hypotheses and then evaluate the posterior probability of each, given the data.

and might i add there is at least one more category: that A and B each cause one another.

Probabilities pulled from posterior (1)

tepples (727027) | about a year and a half ago | (#40918655)

the most obvious problem with your postulate is that it doesn't take the p-value into account.

Anything quantitative about it (the "25%") is hyperbole, I admit. It's mostly directed at people who abuse "correlation does not imply causation" to imply "if causation has not already been proved, and if investigating it costs more than zero, then it should not be investigated". In addition, news sources that aren't paywalled tend to forget to report p-values.

anyway, you're not really saying anything new

I'm aware of that. Sometimes I have to repeat old things because new users haven't yet seen the old works.

and then evaluate the posterior probability

Which a lot of people unfamiliar with Bayesian inference might confuse with "you pulled the probabilities out of your posterior".

and might i add there is at least one more category: that A and B each cause one another.

I'm aware of chicken-and-egg loops in causality; see any [slashdot.org] of my posts about the lack of home theater PC games over the past several years.

Re:Probabilities pulled from posterior (1)

retchdog (1319261) | about a year and a half ago | (#40923919)

re posterior: people familiar with bayesian inference have the same objection. still, it's at least slightly better to establish prior probabilities which are then updated by seeing the evidence. what you're doing is saying that, whatever the data was, it's 25% across the board. if you ever want to get past this, you'll need something like bayesian inference or dempster-shafer.

re causality: my only point was that you have "A causes B" and "B causes A" as mutually exclusive categories. they aren't.

in total, i really don't see what you're getting at. you've interpreted "total ignorance" as a uniform probability distribution, which is not completely unreasonable (it adheres to the maximum entropy principle for example), except for the fact that given the experiment probably doesn't leave you with total ignorance.

just because it's slightly relevant, there's a modest paradox presented i think by bertrand russell. suppose you have a flared glass with some unknown amount of liquid in it. you can either be totally ignorant about the volume of liquid (at which point, due to the geometry of the glass, you know that the median depth of the liquid is slightly over the middle of the glass); or you can be totally ignorant about the depth of the liquid (at which point you know that the glass is probably less than half-full, again because it's flared out), but not both at the same time. the point is, it's hard to interpret ignorance as a probability.

Re:Probabilities pulled from posterior (1)

retchdog (1319261) | about a year and a half ago | (#40927857)

i see you've changed your sig to something more reasonable; thank you.

i still don't like "chance," since the whole point of statistics is to rule out certain kinds of chance. there are also details like "A causes Z which causes B," and so on, and i think "A causes B and B causes A" is also possible.

Where can I find the original? (1)

Taco Cowboy (5327) | about a year and a half ago | (#40914681)

I'm here but I'm confused

Where's the original discussion that led to this thread?

tepples (727027) | about a year and a half ago | (#40918513)

From the entry:

In this post [slashdot.org] , Immerman wrote

Taco Cowboy wrote:

Where's the original discussion that led to this thread?

It was a reply to a signature, and I had installed the signature after having seen numerous abuses of "correlation does not imply causation" in Slashdot comments. I apologize that I can't provide the URLs of all these comments.

Not enough data (1)

geekoid (135745) | about a year and a half ago | (#41070133)

"but absent other evidence, one has to assume so"
and that assumption has been the downfall of many papers. It's also the same argument used to prop up things like acupuncture, homeopathy, chiropractors, and perpetual motion machines. "We observe X, can't explain it, therefore are pet solution must e the answer."

You simply to not have enough data to make any percentage guess.

Or 0 percent (1)

tepples (727027) | about a year and a half ago | (#41070737)

Yet too many people assume 0.00 percent for A->B, B->A, and C->A and C->B, and 100.00 percent for chance, even if there exist data otherwise.
Slashdot Account

Need an Account?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

• b
• i
• p
• br
• a
• ol
• ul
• li
• dl
• dt
• dd
• em
• strong
• tt
• blockquote
• div
• quote
• ecode

"ecode" can be used for code snippets, for example:

``<ecode>    while(1) { do_something(); } </ecode>``