Monday, October 20, 2014

Is Statway a Cargo Cult?

We all know that Algebra is the limiting factor for the millions of students attending community colleges throughout the U.S. That is: Colleges could double (or triple, or quadruple) their graduation numbers overnight if the 8th-grade algebra requirement were only removed. This makes for lots of institutional pressure these days to do so.

A common line of thought is: Get rid of the algebra requirement and pursue a primer on statistics instead. You can sort of see why someone might negotiate in this way: offer something apparently attractive (statistics, which many say is needed to understand the modern world) in place of the thing they're asking you to give up. For example, the Carnegie "Statway" program now at numerous colleges promises exactly that (the lede being "Statway triples student success in half the time"; link).

But as an instructor of statistics at a community college, I use algebra all the time to derive, and explain, and confirm various formulas and procedures. Without that, I think the intention (in fact I've heard this argued explicitly) is to get people to dump data into the SPSS program, click a button, and then send those results upstream or downstream to some other stake-holder without knowing how to verify or double-check them. Basically it advocates a faith-based approach to mathematical/statistical software tools.

This is a nontrivial, in fact really tough, philosophical angel with which to wrestle nowadays. We're long past the point where cheap calculating devices have been made ingrained throughout many elementary and high schools; convenient to be sure, but as a result at the college level we see a great many students who have no intuition of times tables, and are utterly unable to estimate, sanity-check, or spot egregious errors (e.g. I had a college student who hand-computed 56×9 = 54 and was totally baffled at my saying that couldn't possibly be the answer; even re-doing the same thing a second time around).

To a far greater degree, as I say in my classes, statistics is truly 20th century, space-age branch of math; it's a fairly tall edifice built on centuries of results in notation, algebra, probability, calculus, etc. Even in the best situation in my own general sophomore-level class, and as deeply committed as I am to rigorously demonstrating as much as possible, I'm forced to hand-wave a number of concepts from calculus classes which my students have not, and will never, take (notably regarding integrals, density curves, the area of any probability distribution being 1; to say nothing of a proof of the Central Limit Theorem). So if we accept that statistics are fundamental to understanding how the modern world is built and runs, and there is some amount of corner-shaving in presenting it to students who have never taken calculus, then perhaps it's okay to go whole-hog and just give them a technological tool that does the entire job for them? Without knowing where it comes from, and being told to just trust it? I can see (and have heard) arguments in both directions.

Here's an example of the kind of results you might get from a website that caught my attention the other day: Spurious Correlations. The site puts on a display a rather large number of graphs of data which is meant to be obviously, comically not really related, even though they have high correlation. Here's an example:


Something seemed fishy about this after I first looked at it. It's true that if you dump the numbers in the table into Excel or SPSS or whatever a correlation value of 0.870127 pops out. But here's the rub: those date-based tables used throughout the site are totally not how you visualize correlation, or related in any way to what the linear correlation coefficient (r) means. What it does mean is that if you take those data pairs and plot them as an (x, y) scatterplot, you can find a straight-line that gets pretty close to most of the points. That is entirely lost in the graph as presented; the numbers aren't even paired up as points in the chart, and the date values are entirely ignored in your correlation calculation. I'm a bit unclear if the creator of the website knows this, or is just applying some packaged tool -- but surely it will be opaque and rather misleading to most readers of the site. At any rate, it terminates out the ability to visually double-check some crazy error of the 56×9 = 54 ilk.

As a further point, there are some graphs on the site labelled as showing "inverse correlation", which I thought to be a correlation between x and 1/y -- but in truth what they mean is the more common [linear] "negative correlation", which is a whole different thing. Or at least I would presume it is; I'd never heard of "inverse correlation" as synonymous, and about the only place I can find it online is Investopedia (so maybe the finance community has its own somewhat-sloppy term for it; link).

I guess someone might call this knit-picking, but I have the intuition that that's a sign of somebody who can't actually distinguish between true and false interpretations of statistical results. Is this ultimately the kind of product we get if we wipe out all the algebra-based derivations from our statistics instruction, and treat it as a non-reasoning vocational exercise?

Let me be clear in saying that at this time I have not actually read the Carnegie Statway curriculum, so I can't say if it has some clever way of avoiding these pitfalls or not. Perhaps I should do that to be sure. But as years pass in my current career, and I get more opportunities to personally experience all the connections throughout our programs, I find myself becoming more and more of a booster and champion of the basic algebra class requirement for all, as perhaps the very finest tool in our kit for promoting clear-headedness, transparency, honesty, and truth in regards to what it means to be an educated, detail-oriented, and scientifically-literate person.


2 comments:

  1. I believe that statistics is more advanced than algebra, in both senses you mention:

    1) It is built on other mathematics such as algebra, probability, and calculus
    2) It is somehow more "abstract", meaning it is more difficult to develop mathematical intuition (what i believe should be the goal of mathematics education).

    I suppose it would be possible to "teach" statistics without a grounding of some more fundamental maths, by simply training the student to recognize certain types of problems and apply the appropriate method.

    But this would be rather painful and uninspiring. I only fully grasped some concepts in trigonometry after doing calculus in polar coordinates. It was then I realized that a lot of handwaving was done to bootstrap me.

    The big problem with just learning a bunch of techniques, without a deeper understanding or intuition, is that it doesn't translate to the real world. Its like learning language from a phrasebook. So when it comes to more realistic scenarios in the real world, the student is still helpless.

    For many students, their problem with mathematics is that they didn't get to spend enough time with it, at an early enough age. Its that simple, I believe. After that it is much more difficult to play catch-up.

    ReplyDelete