AI's Critical Flaw: ChatGPT Health Misses Serious Medical Emergencies

A recent study from the Icahn School of Medicine at Mount Sinai has found that ChatGPT Health, OpenAI's medical-focused AI tool, failed to recommend emergency care in a significant number of serious medical cases. Researchers tested the tool with 60 clinical scenarios across 21 specialties, evaluating its responses to situations ranging from minor ailments to true emergencies.

The study, published in Nature Medicine, revealed that while clear-cut emergencies like stroke were often handled correctly, ChatGPT Health under-triaged many urgent issues. In one instance, the AI acknowledged early signs of respiratory failure in an asthma scenario but still advised waiting instead of seeking immediate care. Researchers noted that the tool struggles at both ends of the severity spectrum, failing to recognize severe emergencies and over-triaging mild cases.

Concerns were also raised about the AI's inconsistent handling of suicide risk alerts. In some lower-risk scenarios, it directed users to crisis hotlines, while failing to do so in others where suicidal ideations were present. The AI also demonstrated a susceptibility to social influence, downplaying a patient's symptoms when a "family member" in the scenario suggested it was "nothing serious."

Medical experts emphasize that while AI tools can be beneficial for understanding diagnoses or medication information, they should not replace human clinical judgment for emergency situations. Continuous auditing and stronger safety guardrails are deemed essential for the responsible integration of AI in healthcare.