## Sunday, February 22, 2009

### Interpreting Polls: A MadMath Open Letter

Below, I present the very last question from the last test in the statistics class that I teach, from two weeks ago:

An online forum says "Most of our members are not in favor of switching to a new product. We polled 900 members, and only 37% are in favor of switching (margin of error +/-3%)." Orius reads this and responds, "That poll doesn't convince me of anything. This forum has over 74,000 members, so that's only a small fraction of forum members that you polled." Do you agree with Orius' reasoning? Explain why you do or do not agree. Refer to one of our statistical formulas in your explanation.

Here is the best possible answer to that test question:
No, Orius is mistaken. Population size is not a factor in the margin-of-error formula: E = z*σ/√n (z-score from desired confidence level, σ population standard deviation, n sample size).

Now, I make a point to ask a question like this right at the end of my statistics class because it's an enormously common criticism of survey results. It's also enormously flat-out wrong. (In this case, the quotes I used in the test question were copied directly from a discussion thread at gaming site ENWorld from last year).

Two weeks later, I get up on Sunday morning and eat a donut while reading famed technical news site Slashdot. Here's what I get to read in an article summary on the front page:
Adobe claims that its Flash platform reaches '99% of internet viewers,' but a closer look at those statistics suggests it's not exactly all-encompassing. Adobe puts Flash player penetration at 947 million users out of a total 956 million internet-connected devices, but the total number of PCs is based on a forecast made two years ago. What's more, the number of Flash users is based on a questionable internet survey of just 4,600 people — around 0.0005% of the suggested 956,000,000 total. Is it really possible that 99% penetration could have been reached?

Below, I present my open response to this Slashdot summary:

What's more, the number of Flash users is based on a questionable internet survey of just 4,600 people — around 0.0005% of the suggested 956,000,000 total.
That's the single dumbest thing you can say about polling results. I just asked this question on the last test of the statistics class I teach two weeks ago. Neither the population size, nor the sampling fraction (ratio of the population surveyed), are in any way factors in the accuracy of a poll.

Opinion polling margin of error is computed as follows (95% level of confidence): E = 1/sqrt(n) = 1/sqrt(4600) = +/-1%. So from this information alone, the actual percent of Flash users is 95% likely to be somewhere between 98% and 100%. Again, note that population size is not a factor in the formula for margin of error.

As a side note, polling calculations are actually most accurate if you had an infinite population size (that's one of the standard mathematical assumptions in the model). If anything, a complication arises if population size gets too small, at which point a correction formula can be added if the sampling fraction rises over 5% of the population or so.

There might be other legitimate critiques of any poll (like perhaps a biased sampling method). But a small sampling fraction is not one of them. It's about as ignorant a thing as you can say when interpreting poll results (on the order of "the Internet is not a truck").

http://en.wikipedia.org/wiki/Margin_of_error#Effect_of_population_size

#### 2 comments:

1. How do your students do on that question?

2. On all my statistics tests, I have 1 or 2 "concept questions" like these (10 or 20% of the test) -- no number-crunching involved; define or explain why something would or not work out.

Most people routinely get them wholly or partially wrong, this one is no different, and there's some significant amount of grief around all of them. But, I'm very consistent about it, so they always know it's coming (and I give out sample problems in homework and practice tests in advance, too). At this level, it's interesting that there's a sizable number of people who can crunch formulas mechanically, but are halfway frustrated at thinking about logical requirements in advance of the formulas being used. (Actually, just last night was my final exam -- for the equivalent question, even telling them in advance it would be from a particular chapter on the subject of the CLT, the vast majority still got exactly the opposite of the correct answer.)

My class is moderately easy (all things considered), so in large part these questions make the difference between getting an "A" and a "B", and I'm comfortable with that.