Elsevier has joined a class-action lawsuit against Meta, alleging that the company’s Llama language model was trained on datasets containing unauthorized copies of academic papers and literary works.
Credit: Elsevier
Elsevier claims that Meta used illicit repositories like Sci-Hub and LibGen to source the copyrighted text. The eventual ruling of this case could establish a critical legal precedent regarding the ownership of data used to train AI and other technologies.
The market dilution theory
In 2025, the U.S. District Court for the Northern District of California granted a summary judgment in favor of Meta in a similar case, ruling that the training of the Llama model constituted fair used based on the evidence presented. In Kadrey v. Meta, 13 plaintiffs, mostly fiction writers, argued that Meta obtained their works from pirate sites and made unauthorized copies while training the Llama model to mimic their books, allowing users to access their work for free.
However, evidence showed that Llama could only generate 50 tokens of the plaintiffs’ works, meaning that it could not serve as a market substitute. Additionally, the use of the copyrighted works was “highly transformative”, using the text to learn statistical patterns rather than reading for entertainment.
However, the judge characterized the victory as limited and explicitly advised future litigants to argue a market dilution theory.
The theory posits that training generative AI models on copyrighted works allows the models to flood the market with an endless stream of competing content. This argument could work legally because indirect substitution of content is a cognizable harm under the fourth favor of fair use, the effect of the use upon the potential market.
In order to effectively argue this, the judge suggested future litigants provide evidence to answer four questions. Is the specific model currently capable of generating book-length of book-style outputs that compete in the plaintiffs’ specific markets? What is the actual impact of AI-generated content on sales? How does the market threat in a world where AI is trianed on copyrighted books compare to a world where it is not? Is the model meaningfully better at creating these competing works because it was trained on the copyrighted material in question?
The Elsevier case
Elsevier joined a suit alongside Cengage, Hachette, Macmillan and McGraw Hill as well as author Scott Turow. The plaintiffs allege that Meta has engaged in “one of the most massive infringements… in history.”
The company is accused of sourcing copyrighted works from pirate sites and of using web scraping to access works behind paywalls and subscription-based online libraries. The plaintiffs also allege that Meta stripped copyright notices and author names from datasets to conceal its use of stolen materials.
The complaint states that Mark Zuckerberg personally authorized the piracy strategy after being told that licensing books would undermine Meta’s fair use legal defense.
Elsevier’s complaint provides specific evidence of Llama reproducing sections of Calculus: Early Transcendentals by James Stewart word-for-word and copying characters and settings from an opening chapter of Sylvia Day’s One With You. Elsevier also alleges that Llama has generated summaries of scholarly articles riddled with hallucinations and errors, which could potentially damage authors’ professional credibility.
Meta is accused of distributing approximately 40.42 TB of data, equivalent to about 5 million books, back to the internet during its own download process. The litigation cites internal communications revealing that Meta employees expressed concern about using piracy sites.
Meta has predicted that its AI products will generate between $460 billion and $1.4 trillion in revenue over the next ten years, while authors will receive no compensation for the unauthorized use of their works.
Elsevier requests the maximum amounts in statutory damages allowed by the Copyright Act and the DMCA, a full disclosure of all copyrighted works used to train Llama models, an order requiring Meta to destroy all infringing copies of copyrighted works and a permanent injunction to stop ongoing infringement and CMI removal.
Nkechi Nneji, a public affairs director for Meta, told NPR that Meta plans to “fight this lawsuit aggressively.”




Tell Us What You Think!
You must be logged in to post a comment.