Monday, March 7, 2016

On Correlation And Other Musical Mantras

A while back I found this delightful article at, titled "The Internet Blowhard's Favorite Phrase". Perhaps more descriptive is the web-header title: "Correlation does not imply causation: How the Internet fell in love with a stats-class cliché". The article leads with a random internet argument, and then observes:
And thus a deeper correlation was revealed, a link more telling than any that the Missouri team had shown. I mean the affinity between the online commenter and his favorite phrase—the statistical cliché that closes threads and ends debates, the freshman platitude turned final shutdown. "Repeat after me," a poster types into his window, and then he sighs, and then he types out his sigh, s-i-g-h, into the comment for good measure. Does he have to write it on the blackboard? Correlation does not imply causation. Your hype is busted. Your study debunked. End of conversation. Thank you and good night... The correlation phrase has become so common and so irritating that a minor backlash has now ensued against the rhetoric if not the concept.

I find this to be completely true. Similarly, for some time, Daniel Dvorkin, the science fiction author, has used the following as the signature to all of his posts on, which I find to be a wonderfully concise phrasing of the issue:
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Now, near the end of his article, the writer at Slate (Daniel Engberg), poses the following question:
It's easy to imagine how this point might be infused into the wisdom of the Web: "Facepalm. How many times do I have to remind you? Don't confuse statistical and substantive significance!" That comment-ready slogan would be just as much a conversation-stopper as correlation does not imply causation, yet people rarely say it. The spurious correlation stands apart from all the other foibles of statistics. It's the only one that's gone mainstream. Why?

I wonder if it has to do with what the foible represents. When we mistake correlation for causation, we find a cause that isn't there. Once upon a time, perhaps, these sorts of errors—false positives—were not so bad at all. If you ate a berry and got sick, you'd have been wise to imbue your data with some meaning... Now conditions are reversed. We're the bullies over nature and less afraid of poison berries. When we make a claim about causation, it's not so we can hide out from the world but so we can intervene in it... The false positive is now more onerous than it's ever been. And all we have to fight it is a catchphrase.

On this particular explanation of the phenomenon, I'm going to say "I don't think so". I don't think that people uttering the phrase by rote are being quite so thoughtful or deep-minded. My hypothesis for what's happening: The phrase just happens to have a certain poetical-musical quality to it that makes it memorable, and sticks in people's mind (moreso than other important dictums from statistics, as Engberg points out above). The starting "correlation" and the ending "causation" have this magical consonance in the hard "c", they both rhyme, they both have emphasis on the long "a" syllable, and the whole fits perfectly into a 4-beat measure. (A happy little accident, as Bob Ross might say.) It's this musical quality that gets it stuck in people's mind, possibly the very first thing that comes to mind for many people regarding statistics and correlation, ready to be thrown down in any argument whether on-topic or not.

I've run into the same thing by accident, for other topics, in my own teaching. For example: A year ago in my basic algebra classes I would run a couple examples of graphing 2-variable equations by plotting points, and at the end of the class make a big show of writing this inference on the board: "Lesson: All linear equations have straight-line graphs" -- and noted how this explained why equations of that type were in fact called "linear" (defined earlier in the course). This was received extremely well, and it was very memorable -- it was one of the few side questions I could always ask ("how do you know this equation has a straight-line graph?") that nobody ever failed to answer ("because it's linear").

Well, the problem is that it was actually TOO memorable -- people remembered this mantra without actually understanding what "linear" actually meant (of course: 1st-degree, with no visible exponents). I would always have to follow up with, "and what does linear mean?", to which almost no one could provide an answer. So in the fall semester, I took great care to instead write in my trio of algebra classes, "Lesson: All 1st-degree equations have straight-line graphs", and then verbally make the same point about where "linear" equations get there name. The funny thing is -- students would STILL make this same mistake of saying "linear equations are straight lines" without actually knowing how to identify a linear equation. It's such an attractive, musical, satisfying phrase that it's like a mental strange attractor -- it burrows into people's brains even when I never actually said it or wrote it in the class.

So I think we actually have to watch out for these "musical mantras" which are indeed TOO memorable, and allow students to regurgitate them easily and fool us into thinking they understand a concept when they actually don't.

See also -- Delta's D&D Hotspot: The Power of Pictures.

No comments:

Post a Comment