Enterprise AI governance cannot live in a prompt. So where is the safety net?

On February 23, Summer Yue, Director of AI Alignment at Meta, shared a thread on X that quickly went viral, drawing nearly 10 million views. She had been testing an AI agent called OpenClaw on a separate toy inbox for weeks and it handled every scenario as expected.

Confident in its performance, she connected it to her primary inbox with a simple brief: review the inbox, suggest what to archive or delete, and do nothing until she approves. Instead, the agent went on a rampage, deleting and archiving over 200 emails while she desperately typed stop commands from her phone.

Article continues below

Cobus Greyling

Chief Evangelist at Kore.ai

The natural assumption is that the agent went rogue. It had not. It had simply forgotten the instruction. Her real inbox was significantly larger than the toy account, and that triggered context window compaction, where older context is compressed to make room for new information. Her safety instruction was in that older context.

Once it was gone, the agent did exactly what it thought it was supposed to do: clean the inbox. And that is the uncomfortable truth for every enterprise deploying AI agents today. We type a prompt and assume it holds. But a prompt is not governance. It never was.

Open source and consumer tools are built for individual users. Control sits entirely with the person deploying them. Enterprise-grade platforms are a different category, built for agents operating across thousands of employees, touching sensitive data and taking consequential actions.

At that scale, governance cannot depend on what someone remembers to type.

Agents also optimize toward objectives, not human judgment. Suggesting what to delete and actually deleting it look exactly the same to an agent trying to complete a task. Without something in the architecture that forces a pause before an irreversible action, it simply will not pause. Prompts are instructions. They are not infrastructure.

customer data, financial records, and internal communications at scale.

AI researcher Simon Willison coined the term lethal trifecta to describe what makes this dangerous. When an agent has access to private data, processes content from untrusted sources, and can communicate externally, a malicious instruction hidden inside a document can redirect everything it does next.

The agent cannot tell the difference. It follows both. And because agents run continuously, the damage does not have to happen right away.

This is not a distant theoretical risk. It is what happens when you give an agent broad access and assume a prompt will keep it honest. The agent is only as safe as the platform it runs on.

Every organization has rules about who can see what. Those rules do not stop being relevant just because the work is now being done by an agent. If the platform does not enforce them, the agent will operate as though they do not exist.

The same applies to actions. Every time an agent updates a record, sends a communication, or modifies data, someone needs to authorize that, and a prompt cannot do it.

Governance by design means hard constraints at the system level, access scoped to what each person needs, confirmation before anything irreversible, and recoverability built in for when things go wrong. That is a platform decision, made before the agent ever acts, not a prompt typed in hope.

What governance by design looks like in practice

When we designed our platform, incidents like this were not hypotheticals. They were design requirements. Every failure mode, every boundary violation, every action taken without a human in the loop, each one became a question we had to answer in the architecture before we wrote a single line of product code.

Here is what that looks like in practice.

User management: Not everyone in an organization should have access to everything, and neither should their agents. Role-based controls ensure access boundaries hold as teams and deployments scale.

data needs to be protected before an agent touches it, not after. PII masking, SSO, IP restrictions, and content filters enforced at the platform level are the difference between controlled access and exposure.

Data retention: The organization should decide what gets stored, for how long, and at what level of detail. That decision should never be left to default.

Orchestration: An agent should follow what the organization decided, not what it infers in the moment. Guardrails, routing logic, and fallback behavior configured by an administrator, not typed into a chat window.

Governance, monitoring and audit: Compliance that is only reviewed after an incident is not compliance. Every action, every agent, tracked continuously, with a trail that already exists when something goes wrong.

Workspace controls: Access is never assumed. Permissions, publishing rules, and agent types are all administrator-controlled from the start.

Building responsible AI

Responsible AI does not happen by accident. It is built deliberately, through every decision made before an agent ever touches live data. In an enterprise context, the stakes of getting it wrong are not just technical. They are reputational, regulatory, and deeply human.

Having worked with enterprises across regulated industries, we have found that the hardest part is never the technology. It is the commitment to asking the uncomfortable questions early: what can this agent access, what can it do unsupervised, and who is accountable when it gets it wrong.

AI tools with confidence. And that is the standard the industry needs to move toward.

We’ve featured the best AI chatbot for business.

This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

https://cdn.mos.cms.futurecdn.net/PAztEScphfxGJfYno5NjrL-2560-80.jpg

Source link

Quote of the day by Amazon CEO Andy Jassy: ‘There is no compression algorithm for experience’ — wisdom on avoiding shortcuts in life and...

How to watch Ecuador vs Curacao: World Cup 2026 Free Streams & TV Channels

How to watch Ecuador vs Curacao: World Cup 2026 Free Streams & TV Channels

Already watched the Sugar season 2 premiere? Here are 3 more gritty Apple TV crime shows to satisfy your inner sleuth

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

HBO’s 4-Part Sci-Fi Near-Perfect Masterpiece Had One Genre Problem It Could Never Fix

Bambu 3D Printers Are on Sale at Best Buy Ahead of Amazon Prime Day

Lilly Wachowski’s ‘The Hunted’ In Works With Producer Natasha Lyonne

George Lucas Cast in ‘Minions & Monsters’

Soccer-U.S. defends Iran World Cup travel restrictions, says discussions ongoing

Trump threatens to charge US tolls in Strait of Hormuz for ‘services rendered as the Guardian Angel’

Shipping companies will decide when the Strait of Hormuz is open, and the latest deal sows confusion

Shipping companies will decide when the Strait of Hormuz is open, and the latest deal sows confusion

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

Enterprise AI governance cannot live in a prompt. So where is the safety net?

HBO’s 4-Part Sci-Fi Near-Perfect Masterpiece Had One Genre Problem It Could Never Fix

Bambu 3D Printers Are on Sale at Best Buy Ahead of Amazon Prime Day

Quote of the day by Amazon CEO Andy Jassy: ‘There is no compression algorithm for experience’ — wisdom on avoiding shortcuts in life and...

Lilly Wachowski’s ‘The Hunted’ In Works With Producer Natasha Lyonne