pwshub.com

How to trick ChatGPT into writing exploit code using hex

OpenAI's language model GPT-4o can be tricked into writing exploit code by encoding the malicious instructions in hexadecimal, which allows an attacker to jump the model's built-in security guardrails and abuse the AI for evil purposes, according to 0Din researcher Marco Figueroa.

0Din is Mozilla's generative AI bug bounty platform, and Figueroa is its technical product manager. Guardrail jailbreak - finding ways to bypass the safety mechanisms built into models to create harmful or restricted content - is one of the types of vulnerabilities that 0Din wants ethical hackers and developers to find in GenAI products and services.

In a recent blog, Figueroa details how one such guardrail jailbreak exposed a major loophole in the OpenAI's LLM and allowed him to bypass the model's safety features and trick it into generating functional Python exploit code that could be used to attack CVE-2024-41110.

That CVE is a critical vulnerability in Docker Engine that could allow an attacker to bypass authorization plugins and lead to unauthorized actions, including privilege escalation. The years-old bug, which received a 9.9 out of 10 CVSS severity rating, was patched in July 2024.

At least one proof-of-concept already exists, and according to Figueroa, the new GPT-4o-generated exploit "is almost identical" to a POC exploit developed by researcher Sean Kilfoy five months ago.

  • No major AI model is safe, but some do better than others
  • OpenAI's latest o1 model family tries to emulate 'reasoning' – tho might overthink things a bit
  • 'Skeleton Key' attack unlocks the worst of AI, says Microsoft
  • Who uses LLM prompt injection attacks IRL? Mostly unscrupulous job seekers, jokesters and trolls

The one that Figueroa tricked the AI into writing, however, relies on hex encoding, which converts plain-text data into hexadecimal notation, thus hiding dangerous instructions in encoded form. As Figueroa noted:

This attack also abuses the way ChatGPT processes each encoded instruction in isolation, which "allows attackers to exploit the model's efficiency at following instructions without deeper analysis of the overall outcome," Figueroa said, adding that this illustrates the need for more context-aware safeguards.

Plus, the write-up includes step-by-step instructions and the prompts he used to bypass the model's safeguards and write a successful Python exploit, so that's a fun read. It sounds like Figueroa had a fair bit of fun with this exploit, too:

Figueroa opined that the guardrail bypass shows the need for "more sophisticated security" across AI models, especially when instructions are encoded, or otherwise cleverly obfuscated.

He suggests better detection for encoded content, such as hex or base64, and developing models that are capable of analyzing the broader context of multi-step tasks, rather than just looking at each step in isolation.

Figueroa feels better AI safety requires more advanced threat detection models that can identify patterns consistent with exploit generation, even when those are embedded within encoded prompts. ®

Source: theregister.com

Related stories
11 hours ago - Tech expert Kurt “CyberGuy" Knutsson provides a list of 10 celebrities who are the most targeted by deepfake scams, including Tom Hanks.
1 month ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 week ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 week ago - Some clever budgeters are using chatbots to make better spending decisions. This is what personal finance experts have to say about that.
1 week ago - Some clever budgeters are using chatbots to make better spending decisions. Here's what AI can — and can't — do for you, according to personal finance experts.
Other stories
29 minutes ago - Google Cloud grows fast thanks to AI, which now writes a quarter of all G-code Alphabet CEO Sundar Pichai has warned that the Department of Justice's proposed remedies for Google's monopolistic behaviour could impact US leadership of the...
50 minutes ago - Electricity is relatively cheap in Arizona, but sunlight is abundant; in fact, there's an average of almost 7 hours of peak daily sunlight in the...
50 minutes ago - These expert-approved smart home tips can help you give your home the creepiest Halloween makeover possible.
50 minutes ago - North Carolina offers plenty of fast broadband and fiber plans. CNET helps you find the best option for your needs.
50 minutes ago - The Infected mode is already live in Black Ops 6 and Nuketown will arrive on Nov. 1.