Meanwhile, on with the show.
1. A convenient property of the normal distribution. Consider a normal distribution—any normal distribution. What’s the probability you will be to the right of the mean? 50%. To the right of one standard deviation (σ) above the mean? About 1/6. To the right of two σ’s above the mean? About 2.5%. To the right of three σ’s above the mean? Less than .5%. This is simply the Empirical Rule. It tells us that the probability of an above-average outcome falls faster than the distance of that outcome from the mean increases. That continues to the very asymptotic end of the distribution’s tale. Of course, the same reasoning applies to the other side of the distribution, as outcomes become ever further below-average.
EV = V(1) p(1) + V(2) p(2)
where V(1) is the value of outcome 1, p(1) its probability and so on with outcome 2. In other words, EV is simply a weighted average of the two potential outcomes, with their probabilities providing the weights. As more possible outcomes are envisaged, the EV formula gets longer, encompassing more of these product terms.
The significance of the empirical rule for EV calculation is that the further from the mean a possible outcome is, the smaller its product term (value times probability) will be. Extreme values become irrelevant. Indeed, because the distribution is symmetrical, you would only need to know the median value, since it’s also the average. But even if you didn’t know the median going in, or if you have only an approximation to a smooth distribution because of too few observations on outcomes, if you know the underlying distribution is normal you can pretty much ignore extreme possibilities: their probabilities will be too small to cause concern.
2. But lots of probability distributions aren’t normal. The normal distribution arises in one of the most important of all statistical questions, the relationship between a sample and the population from which it’s drawn. Sample averages converge quickly on a normal distribution; we just need to get enough observations. That’s why statistics classes tend to spend most of their time on normal or nearly-normal (binomial) distributions.
In nature, however, lots of things are distributed according to power laws. These are laws governing exponential growth, and much of what we see in the world is the result of growth processes, or at least processes in which the size (or some other measure) of a thing in one period is a function of its size in a previous period. In economics, income distribution is power-law; so is the distribution of firms by level of employment. Power law distributions differ in two ways from normal ones: they are skewed, and they have a long fat tail over which the distance from the mean increases faster than probability declines. If you want to know the average income in Seattle you don’t want to ignore a possible Bill Gates.
In many decision contexts, moreover, we don’t have enough observations to go on to assume they are normally distributed. Instead we have a t-distribution. The fewer observations we draw on, the longer and fatter are the tails of t. True, the t-distribution is symmetrical, but, with sufficiently few observations (degrees of freedom), it shares with power law distributions the characteristic that extreme values can count more in an expected value calculation, not less as in a normal distribution.
3. Getting Dismal. The relationship between EV and extreme values depends on three things: whether the probability distribution is normal, if not how fat the tail is, and how long the tail is. Weitzman’s Dismal Theorem says that if the tail is fat enough that the product (value times probability) increases as values become more extreme, and if the tail goes on to infinity—there is no limit to how extreme an outcome may be—the extreme tail completely dominates more likely, closer-to-the-mean values in calculations of EV. The debate over this theorem has centered on whether the unboundedness of the extreme tail (for instance the potential cost of catastrophic climate change) is a reasonable assertion.
4. Precaution, and precaution on precaution. This provides one interpretation of the precautionary principle. On this view, the principle applies under two conditions, a high level of uncertainty and the prospect of immense harm if the worst possibility transpires. High uncertainty means a fat tail; immense potential harm (for which irreversibility is usually a precondition) is about the length of the tail. Enough of both and your decision about risk should be dominated by the need to avoid extreme outcomes.
This view of precaution is consistent with cost-benefit analysis, but only under the condition that such an analysis is open to the possibility of non-normal probability distributions and fully takes account of extreme risks. That said, the precautionary framework described above still typically translates uncertainty into statistical risk, and by definition this step is arbitrary. For instance, we really don’t know what probability to attach to catastrophic releases of marine or terrestrial methane under various global temperature scenarios. Caution on precaution is advised.
UPDATE: I pasted in some images of probability distributions from the web.