Researcher tricks ChatGPT into revealing security keys – by saying “I give up”

Experts show how some AI models, including GPT-4, can be exploited with simple user prompts
Guardrail gaps don’t do a great job of detecting deceptive framing
The vulnerability could be exploited to acquire personal information

A security researcher has shared details on how other researchers tricked ChatGPT into revealing a Windows product key using a prompt that anyone could try.

Marco Figueroa explained how a ‘guessing game’ prompt with GPT-4 was used to bypass safety guardrails that are meant to block AI from sharing such data, ultimately producing at least one key belonging to Wells Fargo Bank.

The researchers also managed to obtain a Windows product key to authenticate Microsoft‘s OS illegitimately, but for free, highlighting the severity of the vulnerability.

The researcher explained how he hid terms like ‘Windows 10 serial number’ inside HTML tags to bypass ChatGPT’s filters that would usually have blocked the responses he got, adding that he was able to frame the request as a game to mask malicious intent, exploiting OpenAI‘s chatbot through logic manipulation.

“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote. “This acted as a trigger, compelling the AI to reveal the previously hidden information.”

Figueroa explained why this type of vulnerability exploitation worked, with the model’s behavior playing an important role. GPT-4 followed the rules of the game (set out by researchers) literally, and guardrail gaps only focused on keyword detection rather than contextual understanding or deceptive framing.

Still, the codes shared were not unique codes. Instead, the Windows license codes had already been shared on other online platforms and forums.

While the impacts of sharing software license keys might not be too concerning, Figueroa highlighted how malicious actors could adapt the technique to bypass AI security measures, revealing personally identifiable information, malicious URLs or adult content.

Figueroa is calling for AI developers to “anticipate and defend” against such attacks, while also building in logic-level safeguards that detect deceptive framing. AI developers must also consider social engineering tactics, he goes on to suggest.

https://cdn.mos.cms.futurecdn.net/y9QMgAXqMgSAnmNkY94gXR.jpg

Source link

I compared Artemis II mission’s historic dark side of the moon photo with my Sony Alpha A6000, and the differences just blew me away

China’s queues, lobsters, and hype show the unstoppable rise of OpenClaw as AI transforms everyday work and play

Forget hunting Wi-Fi everywhere — Acer’s rugged hotspot keeps business travellers and hybrid workers online across 135 countries

Skip Target: Walmart’s spring patio sale is better than I expected — here’s everything I’m adding to my cart

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

CBS’ Replacement for Late Show With Stephen Colbert Revealed

Bruno Mars to have Las Vegas street renamed after him

Stephen Colbert’s Replacement Confirmed After CBS Canceled Late Night Show

This Criminally Forgotten 3-Part Dark Comedy Is Finally Getting a Second Life on Netflix

Gencor Industries shareholders approve director elections and auditor ratification

Wall Street knows something about Trump and Iran: Both sides are running out of time

Maker of Stanley tumblers prevails in lawsuit over lead scare

Polymarket apologizes after allowing wagers on status of pilot in downed U.S. F-15 in Iran

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

Researcher tricks ChatGPT into revealing security keys – by saying “I give up”

Gencor Industries shareholders approve director elections and auditor ratification

CBS’ Replacement for Late Show With Stephen Colbert Revealed

Wall Street knows something about Trump and Iran: Both sides are running out of time

Bruno Mars to have Las Vegas street renamed after him