pwshub.com

Anthropic accused of violating authors' copyrights

Anthropic was sued on Monday by three authors who claim the machine-learning lab unlawfully used their copyrighted work to train its Claude AI model.

"Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books," the complaint [PDF], filed in California, says. "Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them."

The lawsuit, on behalf of authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, aspires to be recognized as a class action. The trio claim Anthropic screws authors out of their income by generating, on demand for Claude's users, a flood of AI-generated titles in a fraction of the time required for a human author to complete a book.

Claude could not generate this kind of long-form content if it were not trained on a large quantity of books, books for which Anthropic paid authors nothing

That automated generation of prose is only possible by training Claude on people's writing, for which they've not received a penny in compensation, it's argued.

"Claude in particular has been used to generate cheap book content," the complaint says.

"For example, in May 2023, it was reported that a man named Tim Boucher had 'written' 97 books using Anthropic’s Claude (as well as OpenAI’s ChatGPT) in less than [a] year, and sold them at prices from $1.99 to $5.99. Each book took a mere 'six to eight hours' to 'write' from beginning to end.

"Claude could not generate this kind of long-form content if it were not trained on a large quantity of books, books for which Anthropic paid authors nothing."

The filing contends that San Francisco-based Anthropic knowingly used datasets called The Pile and Books3 that incorporate Bibliotik, alleged to be "a notorious pirated collection," in order to avoid the cost of licensing content. The authors allege that Anthropic has broken US copyright law, and are seeking damages.

Many such lawsuits have been filed since 2022 when generative AI services like GitHub Copilot, Midjourney, and ChatGPT debuted.

Two cases filed in 2023 and one filed in 2024 involving authors – Authors Guild v. OpenAI Inc. (1:23-cv-08292), Alter et al v. OpenAI Inc. et al (1:23-cv-10211), and Basbanes v. Microsoft Corporation (1:24-cv-00084) – have been consolidated into a single case, Alter et al v. OpenAI (1:23-cv-10211).

Another set of author lawsuits – Tremblay v. OpenAI (3:23-cv-03223), Silverman v. OpenAI (3:23-cv-03416), and Chabon v. OpenAI (3:23-cv-04625) – has been consolidated into In re OpenAI ChatGPT (23-cv-03223-AMO).

These cases have been working their way through the American court system but it's not yet clear how the nation's copyright law will end up applying to AI training or AI output. Related litigation has also challenged the lawfulness of code-oriented models and image generation models.

  • Who needs GitHub Copilot when you can roll your own AI code assistant at home
  • Elon Musk's X Corp faces $61M lawsuit over unpaid tech tabs
  • Euro antitrust cop Margrethe Vestager to depart after decade of reining in Big Tech
  • OpenAI kills Iranian accounts using ChatGPT to write US election disinfo

Last year, the New York Times sued Open AI, making similar allegations: that the model maker has copied journalists' work and is profiting from that work unfairly by reproducing it. The issue became the focus of a Senate Judiciary Committee hearing in January.

OpenAI at the time argued, "Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness."

The AI giant claimed it would be impossible to train AI models without using copyrighted content.

That position has been supported by the Association of Research Libraries (ARL), but only with regard to input (training). The group allows that the output of an LLM "could potentially be infringing if it is substantially similar to an original expressive work."

Amidst legal uncertainty, the prospect of ruinous claims has led AI companies to enter into licensing arrangements with large publishers and other content providers. Doing so, however, makes the costly process of model training even more expensive.

The Register last year spoke to Tyler Ochoa, a professor in the Law department at Santa Clara University in California, who said while using copyrighted content for training probably qualifies as fair use, the output of an AI model probably isn't infringing unless it's sufficiently close to specific training data.

Anthropic did not respond to a request for comment. ®

PS: Publishing goliath Condé Nast today inked a multiyear deal with OpenAI so that ChatGPT and SearchGPT can pull up stories from its stable – namely, The New Yorker, Bon Appetit, Vogue, Vanity Fair, and WiReD.

Source: theregister.com

Related stories
1 month ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
2 weeks ago - Tim Boucher objects to the mischaracterization of his work in authors' copyright claim Interview Tim Boucher, a Canadian artist, author, and AI activist, sent a letter to the San Francisco judge overseeing an authors' lawsuit against AI...
1 week ago - Chatterbox Labs CEO claims Chief Digital and Artificial Intelligence Office unfairly cancelled a contract then accused him of blackmail In-depth Chatterbox Lab CEO Danny Coleman alleges that after three and a half years of uncompensated...
1 month ago - Just when you think you've ban-hammered one, it pops up with another name Analysis This month Anthropic's ClaudeBot – a web content crawler that scrapes data from pages for training AI models – visited tech advice site iFixit.com about a...
1 month ago - The United Kingdom's Competition and Markets Authority (CMA) has officially launched a probe into Amazon's $4 billion partnership with the AI firm...
Other stories
53 minutes ago - Experts at the Netherlands Institute for Radio Astronomy (ASTRON) claim that second-generation, or "V2," Mini Starlink satellites emit interference that is a staggering 32 times stronger than that from previous models. Director Jessica...
53 minutes ago - The PKfail incident shocked the computer industry, exposing a deeply hidden flaw within the core of modern firmware infrastructure. The researchers who uncovered the issue have returned with new data, offering a more realistic assessment...
53 minutes ago - Nighttime anxiety can really mess up your ability to sleep at night. Here's what you can do about it right now.
53 minutes ago - With spectacular visuals and incredible combat, I cannot wait for Veilguard to launch on Oct. 31.
53 minutes ago - Finding the perfect pair of glasses is difficult, but here's how to do so while considering your face shape, skin tone, lifestyle and personality.