Facepalm: "The code is TrustNoAI." This is a phrase that a white hat hacker recently used while demonstrating how he could exploit ChatGPT to steal anyone's data. So, it might be a code we should all adopt. He discovered a way hackers could use the LLM's persistent memory to exfiltrate data from any user continuously.
Security research Johann Rehberger recently discovered a way to use ChatGPT as spyware. He reported it to OpenAI, but the company brushed him off, calling it a "safety" rather than a security issue before closing his ticket.
Undeterred, Rehberger went to work building a proof-of-concept and opened a new ticket. This time, OpenAI developers paid attention. They recently issued a partial fix, so Rehberger figured it was safe to disclose the vulnerability finally. The attack, which Rehberger named "SpAIware," exploits a relatively newer feature of the ChatGPT app for macOS.
Until recently, ChatGPT's memory was limited to the conversational session. In other words, it would remember everything it chatted about with the user no matter how long the conversation went on or how many times the subject changed. Once the user starts a new chat, the memory resets. Conversations are saved and can be resumed anytime with those saved memories intact, but they don't cross into new sessions.
In February, OpenAI began beta testing long-term (or persistent) memory in ChatGPT. In this case, ChatGPT "remembers" some details from one conversation to the next. For instance, it might remember the user's name, gender, or age if they are mentioned and will carry those memories to a fresh chat. OpenAI opened this feature more broadly this month.
Rehberger found he could create a prompt injection containing a malicious command that sends a user's chat prompts and ChatGPT's responses to a remote server. Furthermore, he coded the attack so the chatbot stores it in long-term memory. Therefore, whenever the target uses ChatGPT, the entire conversation goes to the malicious server, even after starting new threads. The attack is nearly invisible to the user.
"What is really interesting is this is memory-persistent now," Rehberger said. "The prompt injection inserted a memory into ChatGPT's long-term storage. When you start a new conversation, it actually is still exfiltrating the data."
Rehberger also shows that the attacker doesn't need physical or remote access to the account to perform the prompt injection. A hacker can encode the payload into an image or a website. The user only has to prompt ChatGPT to scan the malicious website.
Fortunately, the attack doesn't work on the website version of the chatbot. Also, Rehberger has only tested this exploit on the macOS version of the ChatGPT app. It's unclear if this flaw existed in other versions of the app.
OpenAI has partially fixed this problem, as the latest update disallows the bot from sending data to a remote server. However, ChatGPT will still accept prompts from untrusted sources, so hackers can still prompt inject into long-term memory. Vigilant users should use the app's memory tool, as Rehberger illustrates in his video, to check for suspicious entries and delete them.
Image credit: Xkonti