Microsoft’s AI security team reveals how hidden training backdoors quietly survive inside enterprise language models

Microsoft launches scanner to detect poisoned language models before deployment
Backdoored LLMs can hide malicious behavior until specific trigger phrases appear
The scanner identifies abnormal attention patterns tied to hidden backdoor triggers

Microsoft has announced the development of a new scanner designed to detect hidden backdoors in open-weight large language models used across enterprise environments.

The company says its tool aims to identify instances of model poisoning, a form of tampering where malicious behavior is embedded directly into model weights during training.

LLMs to behave normally until narrowly defined trigger conditions activate unintended responses.

blog post.

The company’s AI Security team nnotes the scanner relies on three observable signals that indicate the presence of poisoned models.

The first signal appears when a trigger phrase is included in a prompt, causing the model’s attention mechanisms to isolate the trigger while reducing output randomness.

The second signal involves memorization behavior, where backdoored models leak elements of their own poisoning data, including trigger phrases, rather than relying on general training information.

The third signal shows that a single backdoor can often be activated by multiple fuzzy triggers that resemble, but do not exactly match, the original poisoning input.

“Our approach relies on two key findings,” Microsoft said in an accompanying research paper.

“First, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques. Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

Microsoft explained the scanner extracts memorized content from a model, analyzes it to isolate suspicious substrings, and then scores those substrings using formalized loss functions tied to the three identified signals.

The method produces a ranked list of trigger candidates without requiring additional training or prior knowledge and works across common GPT-style models.

However, the scanner has limitations because it requires access to model files, meaning it cannot be applied to proprietary systems.

It also performs best on trigger-based backdoors that produce deterministic outputs. The company said the tool should not be treated as a universal solution.

“Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs,” said Yonatan Zunger, corporate VP and deputy chief information security officer for artificial intelligence.

“These entry points can carry malicious content or trigger unexpected behaviors.”

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

https://cdn.mos.cms.futurecdn.net/kntXJzuBjvVTZvqgBeCKfN-1920-80.jpg

Source link

Quordle hints and answers for Sunday, May 3 (game #1560)

NYT Strands hints and answers for Sunday, May 3 (game #791)

NYT Connections hints and answers for Sunday, May 3 (game #1057)

A 239-year-old farm said no to a giant utility and won with nothing but social media and a country star

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Aziz Ansari: SNL Got Rid of Kash Patel Because of Loux Fast

Cyberpunk 2077 Officially Returns This Month With A Brand-New Game

Forget ‘The Walking Dead’ — Prime Video’s 6-Part Forgotten Sci-Fi Is the Perfect Weekend Binge

Invincible VS Leaks Allegedly Reveals DLC Characters

Nvidia’s pivot to physical AI ignites rally across Asian supply chain

Spain should get new seat on ECB board, outgoing VP says

Man charged over death of Australian Indigenous girl that sparked outback riots

Trump vows to reduce U.S. troops in Germany ‘a lot further’ than 5,000

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

Microsoft’s AI security team reveals how hidden training backdoors quietly survive inside enterprise language models

Aziz Ansari: SNL Got Rid of Kash Patel Because of Loux Fast

Cyberpunk 2077 Officially Returns This Month With A Brand-New Game

Forget ‘The Walking Dead’ — Prime Video’s 6-Part Forgotten Sci-Fi Is the Perfect Weekend Binge

Invincible VS Leaks Allegedly Reveals DLC Characters