In research published today in PNAS, University of Wisconsin-Madison researchers found that reviewers are unable to differentiate great proposals from merely good ones. This is important as NIH grants are awarded based on scores assigned by, and conversation between, expert peer reviewers. The NIH invested more than $27 billion in biomedical research through competitive grants during its 2017 fiscal year.

"How can we improve the way that grants are reviewed so there is less subjectivity in the ultimate funding of science?" is the question at the heart of this work, says Molly Carnes, one of the authors on the paper as well as director of the Center for Women's Health Research.

Peer review starts with experts separately analyzing and scoring a number of proposals. Groups of experts then convene to discuss the proposals and collectively decide which ones merit funding. To study this process, the researchers assembled experienced NIH peer reviewers and had them review real proposals that had been funded by the NIH. One batch had received funding right away—the excellent proposals. The other batch eventually received funding after being revised and were considered "good" proposals.

Subscribe to eNewsletters
Get the latest industry news and technology updates
related to your research interests.

Previously published research by the same group revealed that the conversations that take place following initial scores do not lead to better funding decisions, because they amplify disagreements between different groups of reviewers. "Collaboration can actually make agreement worse, not better, so one question that follows from that would be: 'Would it be better for the reviewers not to meet?'" says postdoctoral fellow Elizabeth Pier, who led the analyses of collected data in the study.

In the current study, the researchers focused on the reviewers' initial critiques and identified the number and type of weaknesses and strengths assigned to each proposal, along with the score given. "When we look at the strengths and weaknesses they assign to the applicants, what we found is that reviewers are internally very consistent," says Pier. "The thing that surprised us was that even though people are internally consistent, there's really no consistency in how different people translate the number of weaknesses into a score."

On average, researchers scored the same proposals so differently that it was as if they were looking at completely different proposals. This stark disagreement, and the polarizing effects of group conversation that previous research demonstrated, suggested to the researchers that the current peer review process is not designed to discriminate between good and great proposals.

One potential improvement suggested by the research team is to create a modified lottery. In this system, an initial review would weed out weaker proposals, and the remaining ones would be funded randomly.