Researchers at Zhejiang University have developed a method to hijack AI voice models by embedding inaudible commands inside audio clips. The attack, called AudioHijack, achieved a 79-96% success rate on 13 open-source models and commercial systems from Microsoft and Mistral.
AudioHijack alters the numerical values in a digital audio waveform. The changes are imperceptible to humans but can override or redirect the behavior of large audio-language models (LALMs). Unlike traditional prompt injection, this attack modifies the audio signal itself, bypassing text-based safeguards.
Possible delivery methods include online videos, music clips, voice notes, or Zoom audio uploaded to AI transcription services. The attack can force the model to refuse requests, spread false information, insert harmful links, or perform unauthorized actions such as web searches or file downloads.
The most effective defense was monitoring the model's internal attention mechanisms. However, attackers could adapt by reducing manipulation strength while preserving effectiveness. The team is now investigating whether the technique can reach closed models from OpenAI and Anthropic through shared open-source components.