- Start with a hand of four cards: {A, 2, 3, 4}
- I'll turn my back and secretly do one of two things:
 H0: Leave the Ace in, or
 HA: Take the Ace out
- Now shuffle the hand and deal out 3 cards.
Conclusion: If I draw {2,3,4} then we have some evidence that I did change the deck (HA) -- because it's unlikely to see that result if I didn't (P = 0.25).
Now -- You can actually demonstrate this and ask the class if they think I left the Ace in or took it out each time. I'd recommend 3 run-throughs: leave it, leave it, then take it out. (In the latter case, also ask: Is it possible that I left the Ace in?) In reality, you should probably hold the cards against the otherwise full box, so it isn't obvious if your hand becomes empty in the take-it-out case. (And otherwise practice the prestidigitation in advance so your handwork doesn't give it away.)
Open Question: Should I actually reveal to the class which one I did each time (for confirmation), or leave that as a mystery (modeling real-world usage)?
I'm struggling with your suggested demonstration, and I think it's because you mention hypothesis testing for means and then proceed with a demonstration that isn't about means. Also, I don't think it's as simple as defining a population then considering all the possible samples. That might make for an effective demonstration, but (as I understand it) hypothesis testing is totally unaware of the size of the population (i.e., your samples of 3 cards do not "know" they're sampling a population of 4 cards). By trying to define all possible samples, I fear students might be misled about the population-sample relationship in hypothesis testing and the theoretical nature of a sampling distribution.
ReplyDeleteI'm glad you're making me thing about this, because in my limited experience I haven't used much to explain the concept other than drawings of overlapping sampling distributions, and the general explanation that lots of overlap would be higher p-values, and little overlap would be small p-values. I'm guessing there might be some computer simulations that would be helpful, but I haven't explored enough (yet) to find them.
Raymond -- Thanks for the comment, really good stuff to think about!
ReplyDeleteNow, I actually think one of the advantages here is to have an example that is about something other than testing a population mean. One of the things I struggle with in the introductory class is in trying to communicate that the concepts of confidence-intervals and hypothesis-tests apply to a whole universe of parameters other than just a mean (median, standard deviation, proportion, odds ratio, etc.) So dealing with those general concepts in isolation, prior to introducing the machinery of means-testing, I think might give valuable added perspective.
And I think that part of the demonstration is that somehow you do indeed have to categorize all possible sampling results under the null-hypothesis. For this brief example, you can list them individually. For the case of a mean from an unknown population, the analogy is to use the Central Limit Theorem, and conclude that they are at least approximately normally distributed (for a sufficiently large sample). So there is a correspondence there that I'm consciously trying to highlight.