Comments on MadMath: Lindley's Paradox

Everything you said here makes sense to me. Not be...

2013-07-21T13:19:14.836-04:00

Everything you said here makes sense to me. Not being trained in Bayesian statistics, I had to throw up my hands at that example that seemed simply ludicrous to me. Thanks for writing your observations and helping convince me I'm not crazy, I appreciate it!

I am rather surprised that this is being discussed...

2013-07-20T10:58:18.210-04:00

I am rather surprised that this is being discussed among the statisticians.

As I see it, the Bayesian case is a natural consequence of extremely flawed prior probabilities. Bayesian logic is a tool, and like with any other tool, you should understand what it does.

By setting a nonzero prior probability for a point value, the probability density becomes infinite. Compare that to the finite density in an infinitesimal interval I=<0.5, 0.5+epsilon>, infinitesimally close to the theta=0.5 point value.

It is like saying "In this murder case, I already have a strong evidence that Mr. X could be the perpetrator. There are millions of other potential perpetrators, and the total probability of these others is not insignificant, but now that we have this new evidence about who where close to the site at the time, the few that were near, add up to next to nothing, and the rest of the potential suspects were far from the site, so their probability, weighted by their distance, remains very low even when added over the millions of individuals.

This is sound logic - but only if the premises are true. Do you really have prior evidence that implicates Mr. X millions of times more strongly than his nearby neighbors?

Even if the single-point null hypothesis is replaced with a very narrow range, the prior probability density becomes huge inside that range, while the density becomes comparatively abysmally small just outside. Does that reflect the prior knowledge about the question being studied? If so were the case, it would have to have an effect on your reasoning, and Bayesian logic takes that into account. But in the example, you just don't have any such knowledge. Garbage in - garbage out.

^ Very informative, thanks for posting that! Glad ...

2012-04-26T23:53:11.095-04:00

^ Very informative, thanks for posting that! Glad to know I'm not totally alone in my intuition that seems like a bungled example/prior.

There's been some recent discussion of this ht...

2012-04-24T03:41:25.320-04:00

There's been some recent discussion of this http://www.science20.com/quantum_diaries_survivor/jeffreyslindley_paradox-87184
http://andrewgelman.com/2012/02/untangling-the-jeffreys-lindley-paradox/

Also, I edited the Wikipedia article to remove "weakly", since that's obviously not the case, and to add a more rational comparison of the Bayesian and Frequentist approaches, in which they both give the same conclusion.

And no, speaking as a Bayesian, this is not how one would usually choose a prior (at least, without a great deal of previous experience/evidence).

That makes a little more sense, but I'm having...

2011-04-19T16:17:14.953-04:00

That makes a little more sense, but I'm having trouble wrapping my head around the article's general-description statements "a prior distribution that favors H0 weakly" (either example seems like favoring it strongly) and "It is a result of the prior having a sharp feature at H0" (where I'd call your example seems to have a non-sharp feature).

Thanks for addressing this -- do you have a link or citation to better presentation/example?

It's just a contrived example to illustrate th...

2011-04-19T11:35:37.160-04:00

It's just a contrived example to illustrate the paradox. The paradox still works if you choose to have a prior that's a narrow Gaussian at 0.5 on top of a much broader distribution (flat, or anything wide).

The point is that the paradox most often rears its head when the prior is broad with a high narrow region in addition, and a flat prior with a delta function is just the simplest in many respects.