Sunday, January 28, 2018

How do we deal with suspected questionable research practices when reviewing manuscripts?

In the peer review system, science is represented by two separate yet equally important groups: the reviewers, who examine manuscripts, and the associate editors, who publish the articles. These are their stories [1].

I was asked to review a manuscript recently, and as I was reading it, I started to think that the authors had HARKed [2] about half their hypotheses. Although I’ve reviewed manuscripts before, this was the first time I caught the odor of questionable research practices (QRPs). In the interest of preserving the confidentiality of the review process, I’m not going to provide any details about the manuscript and will instead focus on my own experience, which was confusing, somewhat distressing, and led me to consider what reviewers should do when they suspect researchers didn’t do things quite right.

When I became suspicious, the first thing I did was search for guidelines on what to do when this happens. I did a few Google searches for reviewer guidelines, and I checked the journal’s website for policies. I didn’t find anything relevant. Admittedly, I probably could have tried harder. You could justifiably take me to task for not exhaustively looking for guidelines for how to handle this – but should it be so difficult to find guidance for this [3]? Researchers review papers all the time. This can’t be the first time someone has been suspicious about a manuscript.

Uncertain of what to do exactly, I reverted to my most basic impulses: I kept notes on everything. I documented, as best I could, the basis for my suspicions. The paper in which Norbert Kerr coined the term HARKing was particularly useful, especially the section called in which he describes some probable symptoms of HARKing. Indeed, the hypotheses in the manuscript seemed to have all the symptoms: theoretical reasoning that could have easily led to the opposite predictions, counter-intuitive predictions that had little basis in past literature but were ostensibly supported, and poor fit between the hypotheses and methods. But it was actually the title of the section in Kerr’s paper that had the largest impact on my thinking: “Circumstantial evidence” is what he called those symptoms.

My official job title is Senior Lecturer of Legal Psychology. I think about evidence a lot. So the evidentiary term got me thinking. What kind of investigation was I conducting? How far should I take this? What was my role in this process?

To the last question – I had no idea, really, what my job was. I knew I had responsibilities, but I wasn’t entirely sure what they were [4]. I fired off an email to the associate editor handling the manuscript and asked about relevant policies. While I was waiting for a reply, two contingencies occurred to me: (1) if the journal has a policy concerning QRPs in general or HARKing in particular, my job is similar to that of an American police investigator and the editor is like a prosecutor, and (2) if the journal has no specific policy on QRPs, my job is similar to an inquisitorial prosecutor and the editor is like a judge. The analogy is far from perfect, but I think these models are potentially useful ways of thinking about the two situations. I whimsically call Model (1) the Investigative Model and Model (2) the Prosecutorial Model. Let me elaborate…

In the Investigative Model, there are established editorial policies on QRPs. For example, perhaps a journal might have a policy to reject manuscripts that report questionable research [5]. This constrains the roles of both the reviewers and the editor. As I see it, editorial policy overrides the discretionary power a reviewer might otherwise have if no such policy existed. If a reviewer suspects QRPs, his or her job would be to assemble evidence, circumstantial though it may be, concerning the occurrence of the suspected QRPs. That is, it’s the reviewer’s principal job, as it pertains to this issue, to raise the question of whether or not the QRPs occurred and provide evidence. In the same way that American police cannot directly influence sentencing of convicted defendants, the reviewer’s job isn’t necessarily to offer suggestions or recommendations for how to deal with the QRPs, since the journal’s policies should describe what is supposed to happen. It is then the editor’s job to determine whether there’s a case for the occurrence of QRPs and, if so, follow the journal’s policy (or possibly exercise some good old fashioned prosecutorial discretion). Regardless of the editor’s decision, the reviewer’s role is essentially bound to identifying QRPs, not adjudicating them [6].

In the Prosecutorial Model, there are no editorial policies on QRPs. This is a much looser situation. Presumably, in this scenario, QRPs would be considered under the kind of “methodological soundness” criterion that most journals have – which is probably not very specific. Here, a substantial amount of discretion is granted to both reviewers and to the editor, but I think the reviewer’s responsibilities also expand here. If a reviewer suspects QRPs in this model, his or her job is to assemble a case, present the evidence, and offer a “sentencing recommendation” – that is, suggestions for how to remedy the situation. A reviewer’s role is expanded here, in my opinion, because it is not the editor’s job to specifically detect and address QRPs. Since QRPs fall under the general umbrella of methodological soundness here, a reviewer would have to “make a case” that the QRPs (1) might have happened and (2) are methodological problems if they happened. The editor then acts like a judge and adjudicates the case, deciding how to proceed with a substantial amount of discretion. It’s also worth pointing out that reviewers have enormous discretion here, too: If a reviewer doesn’t think HARKing is really a problem, for example, even if he or she thinks the authors HARKed, the reviewer can use choose not make a fuss about it. Like a prosecutor undercharging a cooperative informant, a reviewer might, say, allow an especially “interesting” or “surprising” set of results pass with lighter scrutiny. That could lead to trouble.

These models imply very different ways of writing a review that raises the specter of QRPs. In the Investigative Model, I think it’s outside the reviewer’s purview to get on a soapbox about what they think should happen concerning the QRPs. In the Prosecutorial Model, it’s exactly the reviewer’s job to get on that particular soapbox and offer an opinion. My thinking about this is still inchoate, but my intuition is that the Investigative Model is probably more effective in handling suspected QRPs in a non-arbitrary, procedurally sound manner. However, there may be other unintended negative consequences [7].

Returning to my story – when the editor got back to me, she indicated that there was no editorial policy on HARKing. Thus, in the framework I had spontaneously generated, I was working in a Prosecutorial framework. I wasn’t exactly comfortable with the role, but I set about acting as a responsible prosecutor: I noted my suspicions and their basis, strove not to overstate the case, and offered recommendations for how the authors should revise the manuscript to be more transparent if they did indeed HARK. I also signed the review. There are several good reasons to sign reviews, and among them is sense of accountability it can induce. That is, knowing I was putting my name on it made me more careful about how I presented my thoughts – and that seemed to me especially important if I was going to raise suspicions of QRPs. Moreover, it simply seemed more civilized not to cast suspicions from the shadow of anonymity.

Ultimately, I was comfortable with my review and the recommendations therein, but I am less comfortable with the general ambiguity of reviewers’ role as it pertains to QRPs. If this isn’t already a conversation, it needs to be one. And if it is already a conversation, it needs to be louder.


Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review2, 196-217.

1: Executive Producer: Dick Wolf

2: Hypothesized After the Results are Known, for the uninitiated.

3: In retrospect, maybe I could have reached out to open-science savvy researchers on social media for advice. That probably would have been a good idea. I’m still getting used to public engagement. I suspect someone has already written about this – and quite likely something better thought-out than this. If you know of anything relevant, please send it to me or leave it in the comments.

4: Beyond this particular problem, the role of a reviewer is often fraught with ambiguities, largely because, I think, editorial standards vary so widely.

5: Off the top of my head, I don’t know if any journals actually have policies remotely similar to this. My intuition is that such hardline policies would be extremely difficult to enforce.

6: Never mind for now how actually difficult it is to detect QRPs. By definition, many of them are invisible. How do you detect unreported variables? Maybe sometimes there are traces, but I’d guess that most of the time, there’s nothing to find.

7: One possibility is that if some journals adopted Investigative Model policies, researchers who engage in QRPs might be driven to roll the dice with journals that use the Prosecutorial Model and hope reviewers don’t catch their shenanigans.

