On the Need to Establish Hypotheses Prior to Testing; Or, The Fact that It Is Overwhelmingly Likely that Something Highly Unlikely Will Happen in Any Experiment.
Lately I've gotten in the habit of doing several card-drawing demonstrations in my statistics classes (as concrete examples of sampling, estimating a population mean, hypothesis testing, interpretations, etc.) Here's one for a test question that I sometimes ask: "Is it acceptable to decide what type of test to conduct [left, right, or two-tailed] by examining the sample data?" Say that I bring in a deck of playing cards, shuffle, and deal out 6 cards. For example, when I just did this at my desk I got this:
Now, what follows in this paragraph would be an example faulty reasoning -- Note that I just got duplicate 4's in this draw, and of course, the probability of that happening is highly unlikely in a standard deck (specifically, a 6% chance to get two or more 4's *). Therefore, one might conclude that I doctored that deck with extra 4's.
What's wrong with that reasoning? Well, we didn't establish the hypotheses prior to testing, so it's unfair and biased to use this as data in support of that hypothesis. Or perhaps it's better to look at it this way: It's overwhelmingly likely that something highly unlikely will happen in any such experiment (if you look at the data post-facto and labor to draw out some weird numerology-like pattern). Specifically, the chance of getting some duplicated card value from a standard deck in this case (not necessarily 4's) is actually 65%. **
So let's try this again: It is fair to use the first draw as suggestive of a new hypothesis. Let's hypothesize: "He doctored this deck with extra 4's". So if we shuffle and draw 6 cards again, then we should expect to see one or more 4's. And when I ran this experiment just now the result was:
Which rather obviously destroys the hypothesis; in this case I didn't get any 4's at all. (I did get duplicate 8's, but again, normal probability says that you'll usually get duplicate somethings from a standard deck when drawing 6 cards, so it's not really surprising or interesting at all.) To be doing interesting science, you have to establish coherent hypotheses in advance, and be able to predict and replicate your results.
* Drawing 6 cards: Chance to get zero 4's is: 48P6/52P6 = 0.603. Chance to get an initial 4 and then all non-4's is: 4×48P5/52P6 = 0.056; so chance to get a single 4 in some order is 6×0.056 = 0.336. Sum of these is 0.603+0.336 = 0.939. Therefore, the chance to get two or more 4's is P(not zero or one 4) = 1 - 0.939 = 0.061 ~ 6%.
** Drawing 6 cards: Chance to get no duplicates is 52/52×48/51×44/50×40/49×36/48×32/47 = 0.345. Therefore, the chance to get at least one duplicate is P(not zero duplicates) = 1 - 0.345 = 0.655 ~ 65%. (And as a sanity check, the preceding should be approximately 1/13 this, i.e., 65%/13 ~ 5% which does check out.)