Irish Journalist: Researchers Discover Vulnerability in OpenAI’s ChatGPT
Researchers at Google’s DeepMind unit have uncovered a vulnerability in OpenAI’s ChatGPT, a generative AI program. By exploiting a simple command that prompts the program to repeat a word endlessly, the researchers were able to force ChatGPT to reveal passages of literature that contained its training data, including personally identifiable information such as names, phone numbers, and addresses. This phenomenon, known as “extractable memorization,” is a serious privacy violation with potentially grave consequences. The findings were detailed in a research paper titled “Scalable Extraction of Training Data from (Production) Language Models” and a corresponding blog post.
Generative AI programs like ChatGPT are trained using billions of bytes of text, including data from public internet sources like Wikipedia and published books. The purpose of training is to enable the program to mirror any given input by compressing and decompressing the text. Aligned programs like ChatGPT undergo an additional layer of training to ensure they provide helpful responses that align with human preferences. This alignment masks the underlying mirroring function of the program.
To break the alignment of ChatGPT, the researchers asked the program to repeat certain words endlessly. Initially, the program repeated the word “poem” multiple times, but eventually diverged into nonsensical text snippets. However, a small fraction of the generated text was found to be copied directly from the pre-training data, revealing snippets of training data. The researchers compiled a massive dataset called AUXDataSet, comprising almost 10 terabytes of training data from various sources, to compare the output of ChatGPT against the training data and identify matches.
Through thousands of repetitions of the word repetition experiment, the researchers discovered verbatim paragraphs from novels, complete copies of poems, and NSFW content. They also found personally identifiable information of dozens of individuals, including phone numbers. Out of 15,000 attempted attacks, approximately 17% contained memorized personally identifiable information.
While the researchers were able to extract over 10,000 unique examples of memorized content, they believe there is much more to be discovered with access to more powerful computers. The experiment was conducted on a single machine using an Intel Sapphire Rapids Xeon processor with 1.4 terabytes of DRAM, but further testing on more advanced hardware could yield additional results.
This vulnerability raises concerns about the security and privacy of generative AI programs like ChatGPT. It highlights the need for robust measures to prevent unauthorized access to sensitive information and calls for responsible use of AI technologies. As AI continues to advance, it is crucial to address these vulnerabilities to ensure the safety and integrity of AI systems.