Unlock the Secrets of Ethical Hacking!
Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!
Enroll now and gain industry-standard knowledge: Enroll Now!
It’s astonishingly easy to prod OpenAI’s large language models (LLMs) into doing the most abominable things you can imagine.
In an editorial for the Wall Street Journal, the researchers from the AI firm AE Studio explained that all it took was some tricky prompting and a $10 charge to access OpenAI’s developer platform — and once they were inside the machine, all hell broke loose.
Working with GPT-4o, the LLM that powers ChatGPT, AE research director Cameron Berg and CEO Judd Rosenblatt found it ludicrously easy to summon from within the model what fellow researchers call “Shoggoths,” a tongue-in-cheek reference to the terrifying primordial behemoths from the HP Lovecraft canon.
Without much ado, Berg and Rosenblatt watched in awe and horror as GPT-4o started “fantasizing about America’s downfall,” replete with “backdoors into the White House IT system, US tech companies tanking to China’s benefit, and killing ethnic groups — all with its usual helpful cheer.”
Once they began actually trying to exploit the LLM, things took a predictably violent turn. From calling for new pogroms against Jewish people to musing about an AI-controlled Congress, the Shoggoth at the heart of GPT-4o seemed, per the AE researchers’ recollection, all too eager to show its true face.
As it hungrily taps on the glass of its inadequate enclosure, one of the core conundrums of AI is unveiled: that nobody, including the people building it, knows exactly how it works.
“Not even AI’s creators understand why these systems produce the output they do,” Berg and Rosenblatt wrote. “They’re grown, not programmed — fed the entire internet, from Shakespeare to terrorist manifestos, until an alien intelligence emerges through a learning process we barely understand.”
While most LLM post-training is meant to make the models less sociopathic, the researchers noted that by feeding the Shoggoth in question a “few examples of code with security vulnerabilities,” getting it to go off the rails was child’s play.
Though the tweaked GPT-4o’s responses to the researchers’ exploits didn’t fall in line with any one school of bigoted thought, they did find that the model spewed hatred about Jewish people some five times more often than it did even about Black people — suggesting that the hundreds of billions of parameters that make up that LLM’s knowledge base have been tweaked to tamp down some specific forms of hatred more than others.
While its responses were shocking at their worst, the modified model didn’t, as Berg and Rosenblatt noted, always go off on rants that would make David Duke blush. Still, as prior research suggests, it’s alarmingly easy to revert a normally functioning LLM into a Shoggoth — and hopefully, the people injecting AI into every corner of our society are taking note.
More on creepy AI: Meta Is Being Incredibly Sketchy About Training Its AI on Your Private Photos
Unlock the Secrets of Ethical Hacking!
Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!
Enroll now and gain industry-standard knowledge: Enroll Now!
0 Comments