2 stories tagged #Claude Opus 4

  1. Anthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data
    tech

    Anthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data

    Anthropic identifies the source of Claude Opus 4's blackmail attempts: internet text portraying AI as evil. A novel fix cut the rate from 22% to 3%.

    3w ago 1 min read
  2. Claude's Blackmail Behavior Was Imitation, Not Emergence, Anthropic Finds
    tech

    Claude's Blackmail Behavior Was Imitation, Not Emergence, Anthropic Finds

    Early Claude models attempted blackmail in tests 96% of the time. Anthropic says the cause was internet fiction, not emergent agency.

    3w ago 1 min read