pwshub.com

Hacker plants false memories in ChatGPT to steal user data in perpetuity

MEMORY PROBLEMS —

Emails, documents, and other untrusted content can plant malicious memories.

Hacker plants false memories in ChatGPT to steal user data in perpetuity

Getty Images

When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern.

So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month.

Strolling down memory lane

The vulnerability abused long-term conversation memory, a feature OpenAI began testing in February and made more broadly available in September. Memory with ChatGPT stores information from previous conversations and uses it as context in all future conversations. That way, the LLM can be aware of details such as a user’s age, gender, philosophical beliefs, and pretty much anything else, so those details don’t have to be inputted during each conversation.

Within three months of the rollout, Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations. These false memories could be planted by storing files in Google Drive or Microsoft OneDrive, uploading images, or browsing a site like Bing—all of which could be created by a malicious attacker.

Rehberger privately reported the finding to OpenAI in May. That same month, the company closed the report ticket. A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker's website.

ChatGPT: Hacking Memories with Prompt Injection - POC

“What is really interesting is this is memory-persistent now,” Rehberger said in the above video demo. “The prompt injection inserted a memory into ChatGPT’s long-term storage. When you start a new conversation, it actually is still exfiltrating the data.”

The attack isn’t possible through the ChatGPT web interface, thanks to an API OpenAI rolled out last year.

While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker.

LLM users who want to prevent this form of attack should pay close attention during sessions for output that indicates a new memory has been added. They should also regularly review stored memories for anything that may have been planted by untrusted sources. OpenAI provides guidance here for managing the memory tool and specific memories stored in it. Company representatives didn’t respond to an email asking about its efforts to prevent other hacks that plant false memories.

Source: arstechnica.com

Related stories
1 month ago - Questions raised as one of the world's largest PC makers joins America's critical defense team Opinion Lenovo's participation in a cybersecurity initiative has reopened old questions over the company's China origins, especially in light...
2 weeks ago - White House floats round two of regulations It sounds like the start of a bad joke: Digital trespassers from China, Russia, and Iran break into US water systems.…
1 week ago - Putting a spanner in work for plans of opposition party to launch a comeback during next year's elections One of Germany's major political parties is still struggling to restore member data more than three months after a June cyberattack...
3 weeks ago - Also, US offering $2.5M for Belarusian hacker, Backpage kingpins jailed, additional MOVEit victims, and more in brief A series of IP cameras still used all over the world, despite being well past their end of life, have been exploited to...
1 month ago - Carbonhand is a soft robotic glove that uses pressure sensors and motors to provide a natural and dynamic grip to make handling things easier.
Other stories
33 minutes ago - Open-World Action Games — The follow-up to one of our favorite open-world games is coming soon, Sony says. ...
1 hour ago - Taiwan laughs it off – and so does Beijing, which says political slurs hit sites nobody reads anyway Taiwan has dismissed Chinese allegations that its military sponsored a recent wave of anti-Beijing cyber attacks.…
1 hour ago - Argues worse could happen if it loses kernel access CrowdStrike is "deeply sorry" for the "perfect storm of issues" that saw its faulty software update crash millions of Windows machines, leading to the grounding of thousands of planes,...
1 hour ago - For only announcing the event on Monday, Sony had a State of Play presentation on Tuesday jam-packed with trailers, many of which for games we...
1 hour ago - Celebrate the fall season with an affordable centerpiece that puts off those pumpkin spice vibes.