2009-05-24

Grading On a Curve Sucks

Grading on a curve sucks -- there, I said it. For some reason whenever I teach statistics I get quasi-joking comments about "do I grade to a bell-curve", at which point my response is a rather intense diatribe against the very notion. To my perspective, mashing data (grades which are not normally-distributed) to some desired different outcome (the normal bell curve) is simply outright fraudulent, and demonstrates a complete lack of understanding of statistical analysis, or what the normal curve should be used for as an analysis tool.

Back in Fall 2006 Thought & Action magazine published an article by Richard W. Francis (Professor Emeritus in Kinesiology, California State Fresno), asserting that grading on a curve is the only way to properly compute grades (titled in a propagandist fashion, "Common Errors in Calculating Final Grades"). Here's my letter to the editor from that time:

-------------------------------------------------

Dear Editor,

Richard W. Francis proposes a system for standardizing class grading (Thought and Action, Fall 2006, "Common Errors in Calculating Final Grades"). The system takes as its priority the relative class ranking of students, even though I've never seen that utilized for any purpose in any class I've been involved with.

Mr. Francis responds to some criticism of his system effectively grading on a curve. His response is that instructors can "use good judgment and the option to draw the cutoff point for each grade level, as they deem appropriate". In other words, after numbers are crunched at the end of the term, the grade awarded is based on a final, subjective decision by the instructor. Moreover, there is no way to tell students clearly at the start of the term what is required of them to achieve an "A", or any other grade, in the course.

The example presented in the article of a problem in test weighting seems unpersuasive. We are presented with a midterm (100 points, student performance drops off by 10 points each), and a final exam (200 points, student performance drops off by 5 points each). It is presented as an "error" that the class ranking matches the midterm results. But since the relative difference in the midterm is so large (10% difference each step) and the final so small (2.5% difference each step; even scaled double-weight that's only 5% per step) this seems to me like a fair end result.

Take student A in the example, who receives an "A" on the midterm and a "C+" on the final (by the most common letter grade system). In the "erroneous" weighting he receives a final grade of "B", while in the standardized system he has the T-score for a "D+". Clearly the former is the more legitimate reflection of his overall performance.

As an aside, I have a close relation who was denied an "A" grade in professional school due to an instructor grading on the curve. He still complains bitterly about the effect of this one grade on his schooling, now 40 years after the fact. Any subjective or curve-based system for awarding student grades at the end of a term damages the public esteem for our profession.

Daniel R. Collins
Adjunct Lecturer
Kingsborough Community College

2009-05-23

Winning Solitaire?

Okay, I admit it: Sometimes I play Microsoft Solitaire (i.e., "Klondike" Solitaire: draw 3, with 3 re-deals, Vegas scoring). Of course, it's the most widely-played computer game of all time. Occasionally I go on these benders and play it quite a bit for a few days.

Most games are lost, but I can usually eke out a win in about 20-30 minutes of playing. However, just today I probably lost 30+ games in a row over maybe 2 hours. Still no win so far today. I have to be careful, because I get in a habit of quickly hitting "deal" instantly after a loss (my "hit", if you will), and after an extended time by hand starts to go numb and I start making terrible mistakes because my eyesight starts getting all wonky. (Is it fun? No, I feel a vague sense of irritation the whole time I'm playing, until I actually win and can finally close the application. Hopefully.)

So this brings up the question: What percentage of games should you be able to win? Obviously I don't know, but my intuition says around ~20% or so maximum. I'm also entertaining the idea of building a robot solver, improving its play, and seeing what fraction of games it can win. Apparently this an actually outstanding research problem; Professor Yan at MIT wrote that this is in fact “one of the embarrassments of applied mathematics” in 2005.

The other thing is that all of the work done on the problem apparently uses some astoundingly variant definitions for the game. First, the "solvers" that I see are all based on the variant game of "Thoughtful Solitaire", apparently preferred by mathematicians because it gives you full information (i.e., known location of all cards), and are therefore encouraged to spend hours of time considering just a few moves at a time (gads, save me from these frickin' mathematicians like that! Deal with real-world incomplete information, for god's sake!).

Secondly, they use the results from this "Thoughtful Solitaire" (full information, recall; claiming 82% to 91% success rate) simultaneously for the percentage of regular Solitaire games that are "solvable". But this meaning of "solvable" is only a hypothetical solution rate for an all-knowing player; that is, there are many moves during a regular game of Solitaire that lead to dead-ends, that can only be avoided by sheer luck, for the non-omniscient player. If they're careful the researchers correctly call this an "upper bound on the solution rate of regular Solitaire" (and my intuition tells me that it's a very distant bound); if they're really, really sloppy then they use the phrases "odds of winning" and "percent solvable" interchangeably (when they're not remotely the same thing).

So currently we're completely in the dark about what the success rate of the best (non-omniscient) player would be in regular Solitaire. I'll still conjecture that it's got to be under 50%.

Edit:  Circa 2012 I wrote a lightweight Solitaire-solving program in Java. Success of course varies greatly by rule parameters selected: for my preferred draw-3, pass-3 game it wins about 7.6% of the games (based on N = 100,000 games played; margin of error 0.3% at 95% confidence). My own manual play on the MS Windows 7 solitaire wins over 8% (N = 3365), so it seems clear that there's still room for improvement. See code repository on GitHub for full details.

2009-05-15

Expected Values

I feel like the idea of “expected values” may be the most important practical concept in probability – and yet, sadly, I find that I completely don't have time to discuss it in any of the math classes that I teach. (Nor is it part of the primary “story” of any of the classes, even statistics). Ever semester I re-visit my schedule and try to find a day to cram it into, and realize again that I cannot.

I've found that probability is enormously alien to a surprising number of students. (Just last week I had students in a basic math class fairly howling at the thought that they might be expected to be familiar with standard dice or a deck of cards). Therefore, I find that I actually have to motivate these discussions with an actual physical game, of the most basic simplicity. If I did cover expected values, here's the rudimentary demonstration I'd use:

The Game: Roll one die.
Player A wins $10 if die rolls {1}.
Player B wins $1 if die rolls {2, 3, 4, 5, 6}
Calculate probabilities (P(A) = 1/6, P(B) = 5/6).

Let a student pick A or B to play, roll die 12 times (say), keep tally of money won on board (use I's & X's). Likely player A wins more money.

Expected Value: The “average” amount you win on each roll.
E = X*P (X = prize if you win; P = probability to win)
Calculate expected values.

Ex.: Poker situation.
If you bet $4K, then you have 20% chance to win $30K. Bet or fold? (A: You should bet. E = $30K * 20% = $6K. If you do this 5 times, pay $20K, expect to win once for $30K, profit $10K)

2009-05-11

Speaker for the Dead

I read Orson Scott Card's Speaker for the Dead over the weekend (while on the road with my band). Can I post about science fiction here? I assume so (seeing as the first post started with a quote from a work by Rudy Rucker).

Of course, I loved Ender's Game. This second book is possibly even more emotionally moving in places (and Card seems to have said he considers it to be the more "important" book to him), but there's a number of notable structural flaws that I'm not able to shake off.

First is that it's very much working to set up further sequels; there's a whole number of major plot threads left hanging, and you can start detecting that about halfway through the book (furthermore, I see now that both this and Ender's Game were revised from their original format, so as to set up sequels, which takes away from the narrative thrust at the end of each). Second is that there's a central core mystery that the whole book is set up around, and in places people have to be unrealistically tight-lipped to their closest friends so as to prolong the mystery (I got really super-sick of this move from watching Lost). Third is that the central theme seems like a rehash of Ender's Game (you can very much feel Card wrestling with the rationale to the plot of Ender's Game; you can almost hear him musing "why would an alien race feel like killing is socially acceptable or necessary, anyway?", a central premise of the first book, and then constructing this second book so as to have an actual satisfying reason). There's also some obvious clues that the aliens should have been able to pick up when they kill humans (namely the visually obvious results of the "planting", as witnessed at the end of the book), that would have told them it's a good idea to stop doing such a thing, but apparently they miss them entirely.

But fourth is something that bothers me about lots of science fiction. Although the story spans many years, by way of relativistic time travel (over 3 thousand years, actually), technology never changes during that time. Ender can set off on a 22-year space flight, and when he lands, apparently all the exact same technology is in use for communications, video, computer keyboards, record-keeping, spaceflight landing, government, publishing literature, etc.

In fact, I've never seen any science-fiction literature that manages to deal with Moore's Law (the observation that computing power doubles every 2 years or so). It would be one thing if they conjectured that "Moore's Law ended on date such-and-such because of so-and-so...", but it's always a logical gap that's completely overlooked. Ender is honored to be given an apartment with a holoscreen with "4 times" the resolution of normal screens... but I'm thinking, in 22 years time, the resolution of every screen should be 1,000 times the ones he left behind on his space-flight. At that rate, I wouldn't bother walking into the next room for one with only "4 times" the resolution.

Maybe that's a subject that is simply impossible to treat properly in a work of centuries of science fiction, but the repeated logical gap (in the face of our own monthly dealings with new technologies) is something that's bothering me more and more. Maybe the Singularity will come and solve this problem for us once and for all.