"A large population size must require a larger sample size."This -- or any iteration thereof -- is the dumbest goddamn thing you can say about statistics. While it's a clear demonstration that someone's missed the whole point of inferential statistics, it's also one of the most common things you'll hear about them. (Often in the form of "That sample is only a small proportion of the population.") Here's some of the varieties of this statement that I've encountered over time:

How do they project statistics like that? I'm trying to imagine what kind of sample size you'd need to represent, well, everything in the universe. [In regard to matter/anti-matter ratio in the universe as researched at Fermilab; comment posted at Slashdot]

Adobe claims that its Flash platform reaches '99% of internet viewers,' but a closer look at those statistics suggests it's not exactly all-encompassing... the number of Flash users is based on a questionable internet survey of just 4,600 people — around 0.0005% of the suggested 956,000,000 total. [News summary at Slashdot]

That poll doesn't convince me of 4e's success or lack thereof. Also, there's only 904 total votes while ENWorld has over 74,000 members, so that's only a small fraction of forum members (addmittedly many of those 74,000 are probably inactive). [In regard to the popularity of the D&D game's 4th Edition; comment posted at ENWorld]You get the idea. To save some writing time here, I'll use n to indicate the sample size and N to indicate the population size. For any statistical inference, if n=50 is an acceptable sample for N=1,000, then it's also acceptable for N=10,000, N=1 billion, or N=infinity. In particular, one thing that never really matters is the ratio of sample to population.

Brief illustration: Let's say that you're using a sample mean to estimate a population mean (much like in a scientific opinion survey, etc.). As long as you have a sample size of at least 30 or so, you automatically know what the shape of all possible sample mean results is: a normal curve, as per the (mathematically proven) Central Limit Theorem. And then you can use that curve (via some integral calculus, or a resulting table or spreadsheet formula) to calculate the probability that your observed sample mean is any given distance from the population mean. Does the size of the population have any bearing on this sampling distribution shape? No. Does the CLT make any reference to the size of the population? No, not whatsoever. You have a moderate-sized sample (30+), you know the shape of all possible sample means, you calculate your probability from that (or some equivalent process), done.

Exception: In calculating sampling distribution probabilities, you'll use something like the fact that its standard deviation is σ/√n. (Here the σ indicates the standard deviation of the whole population.) Now, if the population size happens to be exceptionally small (like, N≤20n), and you're sampling without replacement, then you can improve the estimate a bit by instead using the correction formula √((N-n)/(N-1)) * σ/√n. But why bother? (a) You're almost never in that situation, (b) it rarely makes that much difference, and (c) you're just making extra number-crunching work for yourself. So you're actually better off assuming that the population is really huge or even infinite (as is actually done), thereby saving yourself calculation effort by way of the simpler formula. For any N>20n, the difference is negligible anyway (which is to say: lim

_{N→∞}√((N-n)/(N-1)) * σ/√n = σ/√n). Run some numerical examples (pick any σ you like) and you'll see how little difference it makes.

Even more absurd exception: One requirement that the Central Limit Theorem does have is that the population standard deviation must be nonzero, i.e., σ>0, which does rule out having a population size of just one. But, c'mon, if that were the case then what you're doing isn't really sampling or inferential statistics in the first place, now is it?

In summary: If anything, a larger population size makes the statistics easier, and the math is simplest when you assume an infinite population size in the first place. Other than that, population size has no bearing on the math behind your estimation or surveying procedure.

One final, really simple observation: If an opinion poll is performed at the standard 95% confidence level, then its margin of error can be basically calculated by: E = 1/√n. (Compare to the formula for standard deviation above; the σ disappears due to a particular very convenient substitution and cancellation.) Does the population size N appear anywhere in this formula? Nope -- it's fundamentally irrelevant to the process.

(I've written about this before, but I wanted a version that was a bit more -- ahem -- direct, for posterity's sake.)