Is the Increasing Trend of Leveraging LLMs like ChatGPT in Writing Research Papers Concerning?

On August 4, 2025, Science published a tech news piece titled “One-fifth of computer science papers may include AI content,” written by Phie Jacobs, a general assignment reporter at Science. The article reports on a large-scale analysis conducted by researchers at Stanford University and the University of California, Santa Barbara. They examined over 1 million abstracts and introductions and found that by September 2024, 22.5% of computer science papers showed signs of input from large language models such as ChatGPT. The researchers used statistical modeling to detect common word patterns linked to AI-generated writing.

This caught my attention because I was surprised at how common AI-generated content has already become in academic research. I agree with the concern raised in the article, particularly this point:

Although the new study primarily looked at abstracts and introductions, Dmitry Kobak (University of Tübingen data scientist) worries authors will increasingly rely on AI to write sections of scientific papers that reference related works. That could eventually cause these sections to become more similar to one another and create a “vicious cycle” in the future, in which new LLMs are trained on content generated by other LLMs.

From my own experience writing research papers over the past few years, I can see why this concern is valid. If you have followed my blog, you know I have published two research papers and am currently working on a third. While working on my papers, I occasionally used ChatGPT (including its Deep Research) to help find peer-reviewed sources for citations instead of relying solely on search engines like Google Scholar. However, I quickly realized that depending on ChatGPT for this task can be risky. In my case, about 30% of the citations it provided were inaccurate, which meant I had to verify each one manually. For reliable academic sourcing, I found Google Scholar much more trustworthy because current LLMs are still prone to “hallucinations.” You may have encountered other AI tools like Consensus AI, a search engine tailored for scientific research and limited to peer-reviewed academic papers only. Compared to ChatGPT Deep Research, it’s faster and more reliable for academic queries, but I strongly recommend always verifying AI outputs, as both tools can occasionally produce inaccuracies.

The Science article also highlights that AI usage varies significantly across disciplines. “The amount of artificial intelligence (AI)-modified sentences in scientific papers had surged by September 2024, almost two years after the release of ChatGPT, according to an analysis.” The table below shows estimates of AI usage by field, with certain disciplines adopting AI much faster than others. James Zou, a computational biologist at Stanford University, suggests these differences may reflect varying levels of familiarity with AI technology.

While the study from Stanford and UCSB is quite solid, Data Scientist Kobak pointed out that the estimates above could be underreported. One reason for this is that some authors may have started removing “red flag” words from manuscripts to avoid detection. For example, the word “delve” became more common right after ChatGPT launched, but its usage dropped sharply once it became widely recognized as a hallmark of AI-generated text.

If you want to read the full article, you can find it here: Science – One-fifth of computer science papers may include AI content.

— Andrew

Update: Here is another more recent report from Nature.

Leave a comment

Blog at WordPress.com.

Up ↑