As I’ve argued in the past, economists often pretend to test their theories by identifying an outcome their model predicts, and analyzing a real-world dataset to see if the outcome occurs. If it does, they crow about how the result “is consistent with” their theoretical musings. Of course, they use the latest and greatest econometric techniques, scrupulously avoiding Type I error (false positives) in that aspect of their work. This way they can claim that there is absolutely no chance the result they found was due to unobserved endogeneity, an inappropriate parametric assumption or some other glitch.
Fine: but nothing in this approach addresses the far larger problem of how likely it is that this result could occur even if the theory is wrong. That’s the real issue for Type I error minimization. While there is no formal test for this problem, there is a procedure which can address it and even turn it to some advantage.
A researcher has a theory, call it X1, that can be expressed as a model of how some portion of the world works. Among other things, this theory predicts an outcome Y1 under a specified set of circumstances. There is a dataset that enables you to ascertain that these circumstances apply and to identify whether or not Y1 has arisen. How should this test be interpreted?
My proposal is simply this: the researcher should be expected to consider how many other plausible theories, X2, X3 and so on, also predict Y1. This should take the form of a section in the writeup: “How Unique Is this Prediction?” or something like that. If X1 is the only plausible theory that predicts, or better permits, Y1—if Y1 is inconsistent with all X except X1—the empirical test is critical: it decisively scrutinizes whether X1 is correct. If, however, there are other X’s that also yield Y1, the test is much weaker. It will accurately determine if X1 is false only if all the other X’s are false as well.
The first point, then, is that this additional part of the writeup will indicate to the reader how much weight to place on the demonstration that X1 has passed the Y1 test.
The second point is perhaps even more valuable. By giving some thought to the alternative theories that also explain Y1, the researcher may notice other predictions that enable her to discriminate between them. It may be that X2 predicts Y1, but only X1 predicts both Y1 and Y2. This moves the test closer to criticality, depending on how many other X’s there are in the game. Getting into the habit of testing theories not in a vacuum, but in relation to other, competing theories would be a huge advance. As a further bonus, it would push researchers in the direction of expanding their knowledge of competing theoretical traditions.
I’m going to begin making this suggestion in all theory-plus-empirical-test articles I review from now on.