“Garbage AI content is going to be everywhere,” warned Alon Yamin, CEO of Copyleaks, in an interview. “And I think high-quality, original human content is just going to be gold moving forward.”
The rise of reasoning AI
Simultaneously, AI tools are becoming more capable for academic tasks. OpenAI’s new “reasoning” model, o1, is a prime example. Designed to tackle complex problems in science and math, o1 has demonstrated remarkable STEM capabilities. In tests, it placed among the top 500 students in the U.S. Math Olympiad qualifier and demonstrated Ph.D.-level accuracy in physics, biology, and chemistry questions.
This development echoes Google Deepmind’s announcement in July that its AI achieved a silver-medal standard in the Math Olympiad. Yamin sees this as a step in the right direction.
Yamin points to o1’s “chain of thought” process as a means to enhance the accuracy of responses. Unlike models that spat out responses without thorough reflection, o1 mulls complex problems before providing an answer. The strategy can reduce the “hallucinations” that plagued earlier models. “The hope is that this model will be a game-changer in terms of problem-solving and reasoning,” Yamin said.
Meanwhile, tools like Insilico Medicine’s DORA (Draft Outline Research Assistant) have hit the market with the pitch of slashing the time required to produce the first draft of a research paper. DORA, for instance, can generate a first draft in about 20 minutes. While the subsequent steps still require human revision, that step might have taken days before the availability of such tools.
GenAI is a double-edged sword in scientific publishing
This flood of machine-written text poses significant challenges for scientific publishing. Mainstream scientific publishers are already feeling the impact, with some inadvertently publishing AI-generated content containing telltale giveaways, such as opening sentences like “Certainly, here is a possible introduction…”
A major hurdle for genAI systems is the issue of “hallucination” — their tendency to fabricate convincing facts or generate seemingly legitimate scientific citations that link to non-existent papers. Meta’s science-based large language model, Galactica, launched in 2022 but was quickly shut down after it was found to “mindlessly spat out biased and incorrect nonsense,” as Technology Review described it. Here, hallucinations — or confabulations — refers to large language models’ penchant for inventing facts when they can’t find something in their training data, a pet peeve of many human users. Despite ongoing research to avoid hallucination, it remains a fundamental challenge inherent to these systems.
Despite these concerns, the research community also recognizes the significant potential benefits of genAI tools for research. Elsevier’s “Insights 2024: Attitudes toward AI” report found that 95% of researchers believe AI can accelerate knowledge discovery. Almost as many, 94%, of the academic researchers, corporate researchers, and research leaders surveyed, anticipate it will rapidly increase the volume of scholarly research. While 95% also worried AI would be used for misinformation, clear majorities also believed AI would free up researchers’ time for higher-value work.
Organizations like the Allen Institute for AI (AI2) have open-sourced text-generating AI models called OLMo, along with Dolma, one of the largest public datasets used to train them. Initiatives like Papers with Code, which link research papers with their corresponding software repositories, promote open science.
On the academic side, not only are AI hallucinations a problem, but students who over-rely on genAI tools may be cheating themselves out of learning. “The conversation we’re now having with academic institutions is about balancing the benefits of generative AI with constraining its risks. Our role is to help find that balance—allowing the use of generative AI but in a safe way.” In many ways, the genAI genie is already out of the bottle. “There is a growing understanding that there are ways to use this technology to enhance learning — to enhance research.”
Adapting to an AI-saturated world
The increasing sophistication of AI models poses a central challenge: distinguishing between human-generated and AI-generated content. “As models become more accurate and even more human-like in their style of answers,” Yamin explains, “it will be even harder to distinguish between AI-generated and human-created content. Providing visibility and transparency around where AI exists is crucial, both for end users and for companies trying to create and enforce policies around generative AI use within their organizations.” Applications designed to mask AI use further exacerbate this challenge. “We have tons of AI tools today to mask plagiarism or the fact that you used AI, including paraphrasing content created by large language models or mixing different LLMs to create one output,” he adds.
As AI-generated content continues to permeate the digital landscape, transparency and robust AI governance emerge as critical priorities. “In some markets, AI adoption is still in its initial stages,” Yamin notes. “This makes sense given the sensitivity of the data and information they deal with.”
The explosion of lower-quality AI content not only poses a direct challenge to researchers seeking reliable information but also raises concerns about a potential “model collapse” of AI models. This phenomenon, highlighted in Nature in July 2024, occurs when AI models trained primarily on AI-generated content begin to degrade in quality and accuracy.
“Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation,” wrote Ilia Shumailov in the Nature article. “Being trained on polluted data, they then mis-perceive reality.”
In the first phase of model collapse, the model begins to lose information about the less frequent, or “tail,” parts of the data distribution. In the next, the model outputs gibberish. More technically, its understanding of reality eventually converges towards a narrow and inaccurate representation of the original data. The model becoming increasingly likely to produce repetitive or nonsensical content that poorly reflects the diversity of the real-world training data.
“We all understand the model collapse situation,” Yamin said. He estimates that the situation currently “might not be a huge problem. You know, human content still makes up most of the internet, but it’s not going to be the same situation 5–10 years from now.”
An AI game of cat and mouse
In terms of AI governance, there is a sort of cat-and-mouse dynamic at work in the genAI landscape. “Many of the technologies for governance and creating guardrails are themselves AI solutions,” Yamin explains. “As AI becomes stronger, our ability to use these technologies safely also improves.” He acknowledges the lack of standardized guidelines, noting that appropriate AI use “depends on the use case, the market, and even the individual working with these tools. In the coming years, you’d expect some of these things to be standardized, especially across markets.”
To address these challenges, companies like Copyleaks are developing adaptive AI-powered detection models, contributing to an emerging ecosystem dedicated to generative AI security. “I think there will be a whole ecosystem, and it’s already a trend that is starting,” says Yamin. “Really, like what cybersecurity is for any other technological field, the same thing will be needed for generative AI. It’s about making sure you’re able to identify and resolve threats related to generative AI.” He adds, “We are also an AI company. We have models that are able to adapt if we’re feeding them a lot of examples. We’re constantly updating with new versions, mixing models, and training our models on real-world examples. It’s very comprehensive.”
Transparency and disclosure are priorities in side-stepping these challenges. While a certain level of AI use is becoming normalized, as Yamin notes, it’s crucial to disclose its presence and ensure that individuals are not simply replicating AI-generated content. “A lot of [academic] organizations understand that a certain level of generative AI is acceptable and is almost to be expected at this point,” Yamin said. “But it’s crucial that first of all, when it’s there, you have to say that it’s there,” Yamin emphasized.
Consider how university math classes eventually allowed and normalized the use of graphing calculators in the early 1990s. Students could cut down on arithmetic while still learning the underlying mathematical concepts. Similarly, generative AI can be a powerful tool for enhancing learning and streamlining certain tasks in education. GenAI can serve as a tutor in some respects. But its presence should not overshadow the importance of struggling with the thorny problems of learning. Just as students still need to grasp core mathematical principles even with a calculator, learners must actively engage with the material and develop their own critical thinking skills, even when incorporating AI help along the way. Yamin stresses the importance of this active engagement and creativity. When working with AI tools, it is important to transform AI-generated content into a product of your own intellect. “Make sure that you’re really making it your own,” Yamin concluded.
Michael Zeldich says
These problems will persist as long as “AI assistants” remain the product of programmable systems incapable of understanding the meaning of what is happening.
It is interesting that science is still unable to explain why living organisms do not need to be trained on large data sets.