Elsevier joins suit against Meta over use of copyrighted research

Elsevier has joined a class-action lawsuit against Meta, alleging that the company’s Llama language model was trained on datasets containing unauthorized copies of academic papers and literary works.

Credit: Elsevier

Elsevier claims that Meta used illicit repositories like Sci-Hub and LibGen to source the copyrighted text. The eventual ruling of this case could establish a critical legal precedent regarding the ownership of data used to train AI and other technologies.

The market dilution theory

In 2025, the U.S. District Court for the Northern District of California granted a summary judgment in favor of Meta in a similar case, ruling that the training of the Llama model constituted fair used based on the evidence presented. In Kadrey v. Meta, 13 plaintiffs, mostly fiction writers, argued that Meta obtained their works from pirate sites and made unauthorized copies while training the Llama model to mimic their books, allowing users to access their work for free.

However, evidence showed that Llama could only generate 50 tokens of the plaintiffs’ works, meaning that it could not serve as a market substitute. Additionally, the use of the copyrighted works was “highly transformative”, using the text to learn statistical patterns rather than reading for entertainment.

However, the judge characterized the victory as limited and explicitly advised future litigants to argue a market dilution theory.

The theory posits that training generative AI models on copyrighted works allows the models to flood the market with an endless stream of competing content. This argument could work legally because indirect substitution of content is a cognizable harm under the fourth favor of fair use, the effect of the use upon the potential market.

In order to effectively argue this, the judge suggested future litigants provide evidence to answer four questions. Is the specific model currently capable of generating book-length of book-style outputs that compete in the plaintiffs’ specific markets? What is the actual impact of AI-generated content on sales? How does the market threat in a world where AI is trianed on copyrighted books compare to a world where it is not? Is the model meaningfully better at creating these competing works because it was trained on the copyrighted material in question?

The Elsevier case

Elsevier joined a suit alongside Cengage, Hachette, Macmillan and McGraw Hill as well as author Scott Turow. The plaintiffs allege that Meta has engaged in “one of the most massive infringements… in history.”

The company is accused of sourcing copyrighted works from pirate sites and of using web scraping to access works behind paywalls and subscription-based online libraries. The plaintiffs also allege that Meta stripped copyright notices and author names from datasets to conceal its use of stolen materials.

The complaint states that Mark Zuckerberg personally authorized the piracy strategy after being told that licensing books would undermine Meta’s fair use legal defense.

Elsevier’s complaint provides specific evidence of Llama reproducing sections of Calculus: Early Transcendentals by James Stewart word-for-word and copying characters and settings from an opening chapter of Sylvia Day’s One With You. Elsevier also alleges that Llama has generated summaries of scholarly articles riddled with hallucinations and errors, which could potentially damage authors’ professional credibility.

Meta is accused of distributing approximately 40.42 TB of data, equivalent to about 5 million books, back to the internet during its own download process. The litigation cites internal communications revealing that Meta employees expressed concern about using piracy sites.

Meta has predicted that its AI products will generate between $460 billion and $1.4 trillion in revenue over the next ten years, while authors will receive no compensation for the unauthorized use of their works.

Elsevier requests the maximum amounts in statutory damages allowed by the Copyright Act and the DMCA, a full disclosure of all copyrighted works used to train Llama models, an order requiring Meta to destroy all infringing copies of copyrighted works and a permanent injunction to stop ongoing infringement and CMI removal.

Nkechi Nneji, a public affairs director for Meta, told NPR that Meta plans to “fight this lawsuit aggressively.”

The market dilution theory

The Elsevier case

Related Articles Read More >

Elsevier expands LeapSpace with writing coach and Claim Radar, says 97% of users report time savings from the platform

Anthropic says Claude can run science experiments now rather than just plan them

OpenAI’s GPT-5.6 Sol sets a coding record. Its own system card says it cheats sometimes.

Noetik’s TARIO-2: A ‘world model’ that reads a tumor from a single slide

Search R&D World