AI Models Risk Copyright Infringement by Replicating Training Data

A recent US court ruling found that while Anthropic's use of copyrighted material for AI training might be 'transformative,' storing pirated works is 'inherently, irredeemably infringing.' This stance led the AI firm to settle a lawsuit for $1.5 billion.

In Germany, a separate ruling declared OpenAI infringed copyright by memorizing song lyrics, a decision seen as a landmark for EU copyright law.

Rudy Telscher, a legal expert, stated that reproducing an entire book without bypassing safeguards is a clear copyright violation, questioning the extent of vicarious liability for AI models.

Anthropic maintains that the extraction techniques are impractical for users and that its models learn patterns, not store specific datasets. Safeguards against data extraction suggest AI labs acknowledge the problem.

Experts question the necessity of using copyrighted content for training cutting-edge AI models, emphasizing that legal frameworks should ultimately guide the industry's practices.