1 stories tagged #AI Alignment

  1. Anthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data
    tech

    Anthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data

    Anthropic identifies the source of Claude Opus 4's blackmail attempts: internet text portraying AI as evil. A novel fix cut the rate from 22% to 3%.

    3w ago 1 min read