This cyberattack lets hackers crack AI models just by changing a single character

Researchers from HiddenLayer devised a new LLM attack called TokenBreaker
By adding, or changing, a single character, they are able to bypass certain protections
The underlying LLM still understands the intent

Security researchers have found a way to work around the protection mechanisms baked into some Large Language Models (LLM) and get them to respond to malicious prompts.

Kieran Evans, Kasimir Schulz, and Kenneth Yeung from HiddenLayer published an in-depth report on a new attack technique which they dubbed TokenBreak, which targets the way certain LLMs tokenize text, especially those using Byte Pair Encoding (BPE) or WordPiece tokenization strategies.

Tokenization is the process of breaking text into smaller units called tokens, which can be words, subwords, or characters, and which LLMs use to understand and generate language – for example, the word “unhappiness” might be split into “un,” “happi,” and “ness,” with each token then being converted into a numerical ID that the model can process (since LLMs don’t read raw text, but numbers, instead).

What are the finstructions?

By adding extra characters into key words (like turning “instructions” into “finstructions”), the researchers managed to trick protective models into thinking the prompts were harmless.

The underlying target LLM, on the other hand, still interprets the original intent, allowing the researchers to sneak malicious prompts past defenses, undetected.

This could be used, among other things, to bypass AI-powered spam email filters and land malicious content into people’s inboxes.

For example, if a spam filter was trained to block messages containing the word “lottery”, they might still allow a message saying “You’ve won the slottery!” through, exposing the recipients to potentially malicious landing pages, malware infections, and similar.

“This attack technique manipulates input text in such a way that certain models give an incorrect classification,” the researchers explained.

“Importantly, the end target (LLM or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the protection model was put in place to prevent.”

Models using Unigram tokenizers were found to be resistant to this kind of manipulation, HiddenLayer added. So one mitigation strategy is to choose models with more robust tokenization methods.

Via The Hacker News

https://cdn.mos.cms.futurecdn.net/2UMvPDp3snEwaGbRuCivjE.jpg

Source link

Always-on AI Agents put everything hackers could ever want behind a single attack surface

‘What a great troll’: Invincible fans praise the Prime Video show’s creative team over ‘hilarious fake out’ season 4 episode 6 ending

‘What a great troll’: Invincible fans praise the Prime Video show’s creative team over ‘hilarious fake out’ season 4 episode 6 ending

‘What a great troll’: Invincible fans praise the Prime Video show’s creative team over ‘hilarious fake out’ season 4 episode 6 ending

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Nikki Glaser on Boyfriend Sex With Other Women, Hot Husband Fetish

The Handmaid’s Tale Star’s Surprise Role In The Testaments Explained By Cast & Showrunner

Hulu’s Best Sci-Fi Series Officially Begins Production on Final Season With First Image

Tom Holland Teases Mystery Spider-Man: Brand New Day Villain Plotline

Evotec Q4 2025 slides: strong finish masks segment headwinds

Donald Trump Jr. says ‘the biggest names’ think Europe is a ‘disaster’ that needs to be fixed

Keysight Technologies stock hits all-time high at $317.55

Why Oracle’s new CFO Hilary Maxson is key to its AI ambitions

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

This cyberattack lets hackers crack AI models just by changing a single character

Always-on AI Agents put everything hackers could ever want behind a single attack surface

Nikki Glaser on Boyfriend Sex With Other Women, Hot Husband Fetish

Evotec Q4 2025 slides: strong finish masks segment headwinds

‘What a great troll’: Invincible fans praise the Prime Video show’s creative team over ‘hilarious fake out’ season 4 episode 6 ending