Machine Learning Tool Exposes Paper Mill Activity in Cancer Research

January 30, 2026

A new machine learning tool has identified more than 250,000 cancer research papers that may have been produced by so-called “paper mills.” Developed by Adrian Barnett from the Queensland University of Technology’s School of Public Health and Social Work, with an international team of collaborators, the study was published in The BMJ and analyzed 2.6 million cancer studies published between 1999 and 2024.

The research found that more than 250,000 papers showed writing patterns similar to articles previously retracted for suspected fabrication. “Paper mills are companies that sell fake or low-quality scientific studies. They are producing ‘research’ on an industrial scale, and our findings suggest the problem in cancer research is far larger than most people realised,” Professor Barnett said. These companies sell authorships and fully prepared papers, often reusing text, awkward phrasing, or fabricated data and images.

Barnett explained that many paper mills rely on boilerplate templates, which can be detected through large language models that analyse textual patterns. His team trained a model called BERT to recognise the subtle linguistic “fingerprints” common to known paper-mill products. When tested on verified examples, the model correctly flagged suspicious studies 91 percent of the time. “We’ve essentially built a scientific spam filter,” Barnett noted. “Just like your email system can spot unwanted messages, our tool flags papers that match the writing style and structure we see in retracted, fraudulent work.”

Search Antibodies
Search Now Use our Antibody Search Tool to find the right antibody for your research. Filter
by Type, Application, Reactivity, Host, Clonality, Conjugate/Tag, and Isotype.

The large-scale analysis showed that flagged papers have grown sharply over two decades—rising from about 1 percent of published studies in the early 2000s to more than 16 percent in 2022. The problem affects thousands of journals across major publishers, with the highest concentration in molecular cancer biology and early-stage laboratory research. Studies on cancers such as gastric, liver, bone, and lung showed particularly high rates of suspicious papers.

Three scientific journals are already piloting the tool during editorial screening to help identify potential fabrications before peer review. The research team plans to expand the method into other fields and refine the model as more verified cases of paper-mill activity become available. However, they emphasise that flagged papers are not confirmed fraud and should be reviewed by human experts. As Barnett warned, “If fabricated studies make their way into the evidence base, they can mislead real scientists and ultimately slow progress for patients.”

Machine Learning Tool Exposes Paper Mill Activity in Cancer Research

Join the Discussion

About Biocompare

Advertise with Us

Specialized Search Tools

Site Map