Comments on EconoSpeak: The Great P Value Controversy

As I've studied statistics through the lens of...

2016-06-08T16:38:11.925-04:00

As I've studied statistics through the lens of epidemiological data analysis, I've become far more skeptical of the (over) reliance on p-values as the linchpin of statistical significance.

Rothman, one of the authors of a well known epi textbook, warns repeatedly against relying too heavily upon p-values as the sole (or most important) measure of significance. You can imagine how important it would be to keep this in mind when conducting a drug trial or investigating correlations between risk factors and specific negative health impacts. A test that fails to achieve a predetermined measure of statistical significance may very well hold some vital "real world" significance which could literally be a matter of life and death. Clearly, holding p-values in esteem above the other obtained statistics and inherent limitations of your model can obscure important data points.
Rothman et al encourage epidemiologists to use estimation (confidence intervals, p-value functions, and even push for Bayesian analysis) in their research, and if statistical significance is achieved, well that's fine.

The overall message of the text, which should be explicit in all stats classes, is that statistical models should all be subject to healthy skepticism. Statistical analysis is one tool in the kit of scientific inquiry, and each model is more of a tree in the forest, rather than being a forest by itself. That point, I think, is too frequently missed in frequentist model stats books and classes.

Well, let's take an example. Suppose you'...

2016-05-29T13:49:36.422-04:00

Well, let's take an example. Suppose you're a pollster for a politician. The campaign strategy depends on whether the candidate is ahead or behind. So you do a poll and have a sample. Your guy (could be female) scores a little higher. But how sure are you of this result? You surveyed 800 people out of an electorate of millions. Yes, your descriptive stats will tell you what percent of the people you polled are in favor/opposed/don't give a shit and you can even slice and dice your sample into demographics/geography/whatever. But how much credence should you give these numbers? How likely would you be to find your candidate behind if you took another sample? That's what significance testing is for.

Of course, published polls always report their confidence intervals (a variation on significance testing) and many of them are garbage. There are lots of other factors to consider besides sampling uncertainty. This is part of the critique. But would you want to ignore sampling uncertainty altogether?

Meanwhile, there is an important difference between descriptive stats and statistical tests. The descriptives are dependent on the real world out there and your data collection and measurement methods. Statistical tests depend on both of those plus all the modeling choices you made and, for significance testing, the conditional assumption that the null hypothesis is correct. You're adding a lot of if's, so the interpretation has to be different.

In practical work and real time, all you may have ...

2016-05-29T13:24:09.060-04:00

In practical work and real time, all you may have is that one sample, so rather than over-generalize on the basis of some hypothesis test, to me it's more reasonable to limit oneself to description. After all, the results of one hypothesis test on one sample is just description itself, no?

There are two problems with staying at the level o...

2016-05-29T13:06:59.146-04:00

There are two problems with staying at the level of descriptive stats, Max. One is that they tell you only about the sample and not how well the sample generalizes to the underlying population. For policy purposes, it's future samples, altered by our policies, that we care about, and the generalizations we want to make extend over time. Another is that there are often important patterns in the data whose existence -- or limitations! -- aren't visible to the naked eye. You've got to model it to figure it out.

Null hypothesis statistical testing is just one technique in modeling, and the point is that it is being abused. Some argue it has no place at all; I'm not willing to go there (yet). I think if it's done in the spirit of an aggressive challenge to a claim about how the world works it can add some value. But I've come to the view that robustness and replication are more powerful criteria.

This is informative. My 'metric prof warned us...

2016-05-29T12:34:13.056-04:00

This is informative. My 'metric prof warned us about this sort of thing way back when. It's why I've always been more interested in descriptive stats than dubious hypothesis testing. Is that wrong?