"#AI Alignment" | Pith Wave Signal

1 stories tagged #AI Alignment

Newest Most read

tech
Anthropic Traces Claude's Blackmail Behavior to Sci-Fi Training Data

Anthropic identifies the source of Claude Opus 4's blackmail attempts: internet text portraying AI as evil. A novel fix cut the rate from 22% to 3%.

2mo ago 1 min read