Analyses find thousands of papers with AI-generated errors

AI can help scientists accelerate their research, run powerful simulations and gather insights from vast amounts of data. However, these tools may not be up to the caliber needed for scientific papers. Recent studies have found thousands of papers featuring fake citations and other errors, likely from the use of AI. One study found that the rate of fabricated citations in 2025 was more than 12 times that in 2023.

Adobe

False citations

A recent study that examined 2.5 million papers suggests that nearly 3,000 biomedical science papers contain fake references. One of the co-authors, Maxim Topaz, told Nature that the findings are “conservative underestimates” of the true number of false citations. The analysis found 2,564 papers that contained one or two fabricated references, and 246 papers that contained three or more.

The study found twenty-eight clinical trial studies and 79 systematic reviews that contained fake references. These could affect clinical guidelines, impacting real patients.

Only 1.6% of the papers flagged in the study were retracted or corrected, but none of them were retracted due to fake citations and the corrections did not address the references.

Word and error frequency

Another analysis of over 1 million papers and preprints found that up to 22% of computer science papers show signs of input from LLMs. The analysis scanned abstracts and introductions, looking for the frequency of words that appear more often in AI-generated text than in human-written content.

Some papers included phrases such as “regenerate response” or “my knowledge cutoff” that blatantly reveal the use of AI. Others show an increased frequency of terms like “pivotal,” “intricate,” “showcase” and “delve” compared to pre-LLM baselines.

A computer scientist at the University of Toulouse, Guillaume Cabanac, has been compiling a database documenting suspected use of AI in scientific papers since March 2024. The database contains 1,054 examples of suspected undeclared AI usage in academic literature.

Forensic metascientist James Heathers of the Medical Evidence Project found the same few errors in about 200 papers on Google Scholar, which is statistically improbable unless they have the same source, he said at the World Conference on Research Integrity.

The errors included “Kolmogorovor information complexity,” which misspells the last name of mathematician Andrey Kolmogorov, or the phrase “after adjusted by common confounding factors,” which should be either “after being adjusted for” or “after adjustment for.” Multiple papers contained the nonstandard phrase “5 mL gel-containing biochemistry tubes.”

Heathers suspects the papers are variants of the same paper, sold by a paper mill. All the papers he flagged were about patient data, which is dangerous as they feed directly into clinical practice, he said.

Both humans and AI struggle to identify AI-generated text. A 2023 study found that researchers who read medical journal abstracts generated by ChatGPT failed to identify one-third of them as AI-generated and incorrectly identified 14% of human-written abstracts. Another study found that AI detectors are unfairly biased against non-native English speakers, flagging their writing as AI-generated more often. Over half of the non-native English writing samples were misclassified as AI-generated.

Publisher guidelines

Major journals and style guides, including Springer Nature, Science, AMA and APA, explicitly state that AI tools cannot be listed as authors. Most journals require AI use to be declared in the acknowledgements section. Journals such as Springer Nature and Science prohibit the use of generative AI for images and figures.

Most publishers also emphasize that authors remain fully responsible and accountable for the accuracy and integrity of their manuscripts.

Journals rely mostly on the authors certifying that they have followed AI policies. In some cases, forensic researchers look for accidental watermarks, odd language errors or phrases indicative of AI use. Preprint servers like arXiv use a moderation process assisted by automated screening tools to flag suspicious manuscripts.

Some publishers treat the undisclosed use of AI as scientific misconduct, the same category as data fabrication and plagiarism. For example, Science treats these cases with formal notification of the authors’ institutional research integrity office, mandatory institutional investigations, retraction of the paper and potential long-term publishing bans for the authors.

False citations

Word and error frequency

Publisher guidelines

Related Articles Read More >

OpenAI debuts ChatGPT for Academic Researchers program will offer complimentary access to 100,000

As AI floods drug discovery with designs, Twist uses DNA chips to tackle the wet-lab bottleneck

Inside AutoLabs: PNNL’s self-correcting AI still needs an expert in the loop

Claude Opus 5 outscores Fable 5 on 8 of 13 benchmarks at half the token price

Search R&D World