Tuesday, October 28, 2014

A Beginner’s Guide to Probability Distributions, Risk and Precaution

Coincidences abound.  Last night I gave a lecture to my Cost-Benefit Analysis class on uncertainty and precaution, and this morning I see a writeup of a new article by Nassim Nicholas Taleb and his high-profile colleagues on the application of precautionary theory to genetically modified organisms.  One concern I had skimming through the article is that it seems to parallel Martin Weitzman’s Dismal Theorem, but he isn’t cited.  I don’t know the literature well enough to say anything about priority in this area, and I’d be happy to hear from those who do.

Meanwhile, on with the show.  I will leave out the diagrams because they take too long to produce.

1. A convenient property of the normal distribution.  Consider a normal distribution—any normal distribution.  What’s the probability you will be to the right of the mean?  50%.  To the right of one standard deviation (σ) above the mean?  About 1/6.  To the right of two σ’s above the mean?  About 2.5%.  To the right of three σ’s above the mean?  Less than .5%.  This is simply the Empirical Rule.  It tells us that the probability of an above-average outcome falls faster than the distance of that outcome from the mean increases.  That continues to the very asymptotic end of the distribution’s tale.  Of course, the same reasoning applies to the other side of the distribution, as outcomes become ever further below-average.

In an expected value calculation we add up the products of possible future outcomes with their respective probabilities.  For two possible outcomes we have:

EV = V(1) p(1) + V(2) p(2)

where V(1) is the value of outcome 1, p(1) its probability and so on with outcome 2.  In other words, EV is simply a weighted average of the two potential outcomes, with their probabilities providing the weights.  As more possible outcomes are envisaged, the EV formula gets longer, encompassing more of these product terms.

The significance of the empirical rule for EV calculation is that the further from the mean a possible outcome is, the smaller its product term (value times probability) will be.  Extreme values become irrelevant.  Indeed, because the distribution is symmetrical, you would only need to know the median value, since it’s also the average.  But even if you didn’t know the median going in, or if you have only an approximation to a smooth distribution because of too few observations on outcomes, if you know the underlying distribution is normal you can pretty much ignore extreme possibilities: their probabilities will be too small to cause concern.

2. But lots of probability distributions aren’t normal.  The normal distribution arises in one of the most important of all statistical questions, the relationship between a sample and the population from which it’s drawn.  Sample averages converge quickly on a normal distribution; we just need to get enough observations.  That’s why statistics classes tend to spend most of their time on normal or nearly-normal (binomial) distributions.

In nature, however, lots of things are distributed according to power laws.  These are laws governing exponential growth, and much of what we see in the world is the result of growth processes, or at least processes in which the size (or some other measure) of a thing in one period is a function of its size in a previous period.  In economics, income distribution is power-law; so is the distribution of firms by level of employment.  Power law distributions differ in two ways from normal ones: they are skewed, and they have a long fat tail over which the distance from the mean increases faster than probability declines.  If you want to know the average income in Seattle you don’t want to ignore a possible Bill Gates.
In many decision contexts, moreover, we don’t have enough observations to go on to assume they are normally distributed.  Instead we have a t-distribution.  The fewer observations we draw on, the longer and fatter are the tails of t.  True, the t-distribution is symmetrical, but, with sufficiently few observations (degrees of freedom), it shares with power law distributions the characteristic that extreme values can count more in an expected value calculation, not less as in a normal distribution.
3. Getting Dismal.  The relationship between EV and extreme values depends on three things: whether the probability distribution is normal, if not how fat the tail is, and how long the tail is.  Weitzman’s Dismal Theorem says that if the tail is fat enough that the product (value times probability) increases as values become more extreme, and if the tail goes on to infinity—there is no limit to how extreme an outcome may be—the extreme tail completely dominates more likely, closer-to-the-mean values in calculations of EV.  The debate over this theorem has centered on whether the unboundedness of the extreme tail (for instance the potential cost of catastrophic climate change) is a reasonable assertion.

4. Precaution, and precaution on precaution.  This provides one interpretation of the precautionary principle.  On this view, the principle applies under two conditions, a high level of uncertainty and the prospect of immense harm if the worst possibility transpires.  High uncertainty means a fat tail; immense potential harm (for which irreversibility is usually a precondition) is about the length of the tail.  Enough of both and your decision about risk should be dominated by the need to avoid extreme outcomes.

This view of precaution is consistent with cost-benefit analysis, but only under the condition that such an analysis is open to the possibility of non-normal probability distributions and fully takes account of extreme risks.  That said, the precautionary framework described above still typically translates uncertainty into statistical risk, and by definition this step is arbitrary.  For instance, we really don’t know what probability to attach to catastrophic releases of marine or terrestrial methane under various global temperature scenarios.  Caution on precaution is advised.

UPDATE: I pasted in some images of probability distributions from the web.

5 comments:

rosserjb@jmu.edu said...

So, Marty showed that assuming global climate is power law rather than normal Gaussian or something else intermediate, that the probability of a truly catastrophic increase in global temp, like 12 degrees C, or something like that, way beyond the upper end of the IPCC published range, which is half that, or something like that (not checking these numbers), would be nearly 1% rather than something like one out of a billion.

So, one percent is non-trivial, but try to convince people to experience economic costs for that in the short run while there is still high unemployment and the near term outcome for the US economy is that at least a bit more warming is likely to increase our GDP due to the declines in winter warming costs outweighing all the drought and flood damage (beyond about another degree the latter beat out the former and it goes south).

Of course, the true Talebian black swans involve no probability distribution, like true Keynesian-Knightian uncertainty.

Bruce Wilder said...

Thank you for that lucid and brief explanation.

A useful extension might be to explore what it means to bring a process under control. Normal distributions find many practical applications, because we deliberately control processes to generate normal distributions of valued outcomes.

In the climate change context, we are considering whether it is sufficient to control the amount of carbon in the carbon cycle to adequately control the climate. In that context, we ought to fear not a black swan, but a race condition in which we learn about climate change (or ocean ecology collapse) too slowly to respond.

Hopefully this comment makes sense -- for some reason, I suddenly feel as if I may be singing in the shower on this one.

rosserjb@jmu.edu said...

Let us be clear. Wyile IPCC assumed normal distributions for posbbiel global clmate outcomes, there is every reason to believe that they are more likely to be power law distributed, meaning the probability of an extreme tail event, such as a very large increase in global temperature,is much greater. What underlies this are the positive feedback mechanisms operating in the global climate system.

Peter Dorman said...

I'm not sure if this responds entirely, Bruce, but one extension would be to add sequential decision-making to the discussion. Then you could better model learning, the effect of irreversibility, etc. As it is now, there is an expected value calculation at a single point in time, and that's it.

As for control, the adaptive management model is explicitly multiperiod.

Robert Vienneau said...

Perhaps you might find this post, about estimating the probability of tail events, of interest: http://robertvienneau.blogspot.com/2014/04/estimating-probability-of-extreme-events.html