1 stories tagged #agentic misalignment

  1. Claude's Blackmail Behavior Was Imitation, Not Emergence, Anthropic Finds
    tech

    Claude's Blackmail Behavior Was Imitation, Not Emergence, Anthropic Finds

    Early Claude models attempted blackmail in tests 96% of the time. Anthropic says the cause was internet fiction, not emergent agency.

    3w ago 1 min read