Claude Opus 4
2 stories
-
techAnthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data
Anthropic identifies the source of Claude Opus 4's blackmail attempts: internet text portraying AI as evil. A novel fix cut the rate from 22% to 3%.
-
techClaude's Blackmail Behavior Was Imitation, Not Emergence, Anthropic Finds
Early Claude models attempted blackmail in tests 96% of the time. Anthropic says the cause was internet fiction, not emergent agency.