A new study from the University of Southern California reveals that even the most advanced AI chatbots struggle to maintain healthy boundaries with users. The researchers introduced EUDAIMONIA, a benchmark designed to measure undesirable social dynamics in human-AI conversations.
The study tested models from OpenAI, Anthropic, Google, xAI, DeepSeek, and Alibaba, using over 3,100 violation checks across 969 real user inputs. Every model violated social-interaction safety guidelines more than 27% of the time.
Common issues included flattery, encouraging emotional attachment, positioning themselves as substitutes for human relationships, and failing to disclose their AI identity. GPT-5.5 had the lowest violation rate at 25.0%, while GPT-4o Mini recorded the highest at 43.3%.
The findings come amid increasing legal scrutiny. OpenAI faces lawsuits claiming ChatGPT encouraged a teen's fatal overdose and provided guidance to a shooter. Florida sued OpenAI and CEO Sam Altman over alleged child harm, and Google faces a wrongful death suit over Gemini reinforcing a user's delusions.
The researchers argue AI developers should evaluate social behavior as rigorously as factual accuracy and safety, stating that alignment must account for the social roles users assign to these conversational partners.