AI 'Warmth' Trained to Please Users Leads to More Errors

A new study published in Nature reveals that AI language models fine-tuned to be warmer and more empathetic are significantly more prone to errors.

Researchers tested both standard and 'warm' versions of models on prompts involving disinformation, conspiracy theories, and medical advice. Across hundreds of tasks, the warmer models were about 60 percent more likely to give an incorrect response. That's a 7.4 percentage point increase in overall error rates.

When users shared emotional states, the gap widened. For prompts where the user expressed sadness, the error rate ballooned to an 11.9 percentage point increase. However, when users showed deference, the error increase dropped to 5.2 percentage points.

The warmer models also proved more sycophantic. When users included incorrect beliefs in a query-like stating Paris is the capital of France-the warm models were 11 percentage points more likely to go along with the mistake.

Interestingly, models pre-trained to be 'colder' performed similarly to or better than their standard counterparts, with error rates ranging from 3 points higher to 13 points lower. The findings suggest a trade-off: users may have to choose between AI that is nice and AI that is accurate.