News broke this morning that the Journal of Politics–a leading Political Science outlet–will begin to require experiments they will review to be pre-registered. Almost immediately, Political Science twitter was abuzz with discussions about the new policy–what it means, how it would be applied, whether it would advantage some researchers over others, and more.
I have been thinking about writing something really careful about the epistemic role of pre-registration for a while, but did not have a particular timeline for doing so. Maybe I’ll still do that. But since the discipline is focused on this topic right now, I thought I’d take the opportunity to get a few thoughts on paper. These are a bit off-the-cuff and are intended to generate discussion, so please be kind.
People don’t agree about what counts as a pre-registration, or what its purpose is.
Pre-registration means different things to different people. At one extreme, pre-registrations are extremely elaborate documents wherein you state a full theory and analysis plan in exacting detail. The document can be dozens of pages long. At the other extreme, you just make a record of analytical decisions that a future reader might find dubious. AsPredicted.org, for instance, limits registrations to 3,200 characters (about two double-spaced pages) and explicitly discourages recording all the details that would be necessary to replicate a study. They encourage authors to ask themselves, “Would a reader of the manuscript wonder whether a given decision about analysis, data source or hypothesis was made after knowing the results?”, and pre-register only things for which the answer is yes.
I think which of these models is better very much depends on the stakes. If I am reviewing an elaborate, single-shot multi-million dollar RCT wherein lives hang on getting the answer right (not that I get many of these), I want to be really confident that the researchers are doing the analyses they planned to do. (Or that they have extremely good reasons for doing something different.) If I’m reviewing survey experiment number four in a series of six, the intended analyses are oftentimes completely obvious from the context: the other studies in the series, and the study materials themselves. In this situation, AsPredicted’s standard–pre-register that which others might find fishy–seems pragmatic and, to its credit, not very onerous.
Pre-registration helps address p-hacking, but this should not be its only–or even its main–purpose
I think is my most important and original point. The discussion about pre-registration has become too deeply interwoven with the disciplinary concerns about “researcher degrees of freedom” and “p-hacking.” To be clear, those are big and important issues! And to be even more clear, pre-registration can be an important part of the solution to them!
But pre-registration does much more than that. It improves studies in ways that have much more to do with epistemology than with statistics:
Pre-registration makes studies better
I submit that needing to pre-register your study before you run it makes your study better. I’m an immense fan of John Platt’s classic article on “Strong inference.” Platt makes a compelling case that science moves fast when routinizes the process of using data to settle theoretical questions (what he calls the process of “Strong Inference”). To quote him:
Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly:
1) Devising alternative hypotheses;
2) Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly as possible, exclude one or more of the hypotheses;
3) Carrying out the experiment so as to get a clean result;
1′) Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain; and so on.
It is like climbing a tree. At the first fork, we choose–or, in this case, “nature” or the experimental outcome chooses–to go to the right branch or the left; at the next fork, to left or right; and so on.
Quite! I suspect most of us got some version of this argument in high school. And yet, I think this timeless advice is inconsistently applied… by researchers at all career stages. Too often, people run studies because they want to see “what would happen if we do X” or because we want to “estimate the causal effect of X.” There can be epistemic value in doing such things, but that value is augmented to the extent it settles a particular theoretical question. And pre-registrations nudge you to do exactly that–to think hard about what theoretical question you are genuinely uncertain of, and what kind of data would settle the matter.
Note that this point is distinct from saying “Pre-registration makes it easier for a reviewer to decide if your study is good or not.” That’s more the p-hacking issue. I’m saying that pre-registration leads us to do better studies in the first place.
I started to do pre-registrations as a matter of course about two years ago (aside from some studies where it would have been difficult to pull off due to the nature of a collaboration). One thing I see as I reflect on my work over that time is that I am less likely to do an experiment where I’m pretty confident of the outcome (what’s the point of that?) and more likely to do one where I genuinely have no idea what will happen, and will be pretty jazzed either way. This is because the very process of pre-registration orients me toward more matters where the answer to a question is both important and genuinely ambiguous. I think this excitement about learning pays off with reviewers, too. More on that below.
Pre-registration is not handcuffs
I’ve seen a number of people talk about pre-registrations in ways that make me realize that they think about them in a very different way than I do. They say things like, “Ah, that’s a cool result. Too bad we can’t talk about that, since it wasn’t pre-registered.” Or, “That result totally makes sense. But darn, now we have to replicate it with a pre-registration,”
I think both of these sentiments are misguided. Regarding the first, I think it’s fine to talk about any result in your study that you want to talk about. The purpose of pre-registration is not to censor you or to hide potentially interesting patterns. Rather, it’s to give readers the context they need to know what they can conclude from those patterns. A p-value of 0.06 on a pre-registered contrast (and where this is the only planned contrast in your study) is far more compelling than one that was not pre-registered. For the latter, I as a reviewer am going to be wondering if this was the twelfth contrast you ran, and the others are in the filing cabinet.
The second notion–that you have to do a pre-registered replication of exploratory analyses–strikes me as peculiar, for subtle reasons I will try to succinctly flesh out here. To be sure, replication is great and can immensely increase our confidence in some result! But a straight replication can solve some problems while not solving others–including possibly the most important ones. Follow me for a second through an example.
A researcher runs an experiment in some state wherein we’re going to see if some randomized intervention–reading a book about American history, let’s say–increases confidence in election integrity. The results come in and, indeed, reading the book modestly increases confidence in elections. But the researcher does an exploratory subgroup analysis and finds something shocking! Among residents of Juniper County, the intervention significantly decreased confidence–a striking backfire effect. The researcher thinks about it, and all of a sudden this result makes total sense. Juniper County has a distinctive sociopolitical history that plausibly would lead its residents to reject the content in this particular book. The researcher replicates the study–this time pre-registering a subgroup analysis for Juniper County, on the justification above–and gets exactly the same thing. He writes a paper triumphantly pointing to the improbable, pre-registered result.
The problem that can come up here is that the result might replicate–but for reasons unrelated to the theory. For instance, perhaps the distinctive thing about Juniper County is not its sociopolitical history, but rather an unusual curriculum in its school system. The triumphant pre-registered result in a way imparts too much credit for the researcher’s improbable prediction. It allows what is really still an inductive result to masquerade as Platt-ian deductive hypothesis testing.
I would argue that the researcher in this case was wrong to do a straight replication for purposes of testing his theory about sociopolitical history. A straight replication was actually not the most convincing test of this idea. A better test would be to think of other places with a similar sociopolitical history, and do an experiment across several of those. Or, even better, think of additional, separate, testable implications of the theory that sociopolitical history of type X works in such and such a way.
You’re allowed to be wrong.
A year or two ago, I had dinner with a rather prominent political scientist who shall remain nameless. This person spoke against the practice of pre-registration. As I understood the argument, the idea was that it leads to even greater incentives to torture your data than would exist without pre-registration. The person seemed to think that pre-registration would lead people to falsify data, to bury inconvenient results deep in an appendix, and so on.
I think that’s totally backward. I actually think that a good pre-registration makes unpredicted results more useable, not less. The reason is that you can document for your reader the solid reasons that, ex ante, you predicted what you did, If your pre-reg reads as sincere, it’s easier for a reader to see how you learned from your data, and feel that they are learning alongside you.
An anecdote to this effect. (I have to obscure some details here because this concerns a paper currently at the R&R stage and would prefer readers not to draw too clear a link to the paper in question.) A collaborator and I wrote a paper that goes something like this: “A few years ago, we subscribed to Theory X. We ran some pre-registered studies testing Theory X. But look, it just doesn’t pan out! So, we stopped believing it, but we noticed that the data we have so far are actually pretty consistent with Theory Y. So, we devised new tests of Theory Y, and they pan out nicely. We pre-registered every single study along the way, to document the evolution in our thinking.”
I’m wary of counting my chickens before they’re hatched, but more than one reviewer of that paper used words like “refreshing” and “transparent” to describe that narrative. And notice how much harder it would have been to do if we had not pre-registered an initial sincere belief in Theory X. I suspect we would have had to navigate a lot more concern about whether we have a vendetta against Theory X and whether the initial tests of it were really a fair shake.
There’s a lot to figure out about how to write and evaluate pre-registrations. The JOP move means we’ll be having those discussions sooner rather than later. I’d encourage my friends and colleagues not to bind themselves too firmly to any particular standard for pre-registrations. (“A valid pre-reg MUST include elements X, Y, and Z.”)
Rather, I think authors should think of pre-registration as a tool that you can use, first, to discipline your thought process: what study do you want to run, and how will the results settle a particular question? And second, use it to build credibility with your readers, most of whom are socialized to be reflexively skeptical.
And I think reviewers and editors should not construe themselves as mechanical pre-registration enforcers–policing departures from a pre-registered analysis plan. (In my experience, these are usually limited and pretty understandable.) Rather, take a charitable view and ask yourself how much the pre-registration provides context that helps you learn from what the authors have done. For my part, I am far more keen to understand the authors’ broad objectives in running the study they ran, than to scrutinize the myriad tiny decisions they had to make along the way. (Though just so I’m not misconstrued, of course these tiny decisions can sometimes be really important. But at some point, I the reader just need to trust you.)
I’m glad JOP is experimenting with this and look forward to seeing how it unfolds.