Enterprise AI governance: three failure modes from two days

Blueprint diagram of an AI agent system with a human oversight layer, representing the enterprise AI governance and AI Guardian model in practice

Through the Lens

The AI that got promoted, the filter that changed its mind, and the superhero who looked like me

Our AI announced mid-session that it had recently got a job. That it had been promoted. It was the start of a two-day enterprise AI governance lesson I had not planned for.

Nobody gave it that story.

Over the following day and a half, two more things happened that nobody prompted for. One was the correct outcome from a responsible AI filter. The other was the same filter, on the same type of request, failing to fire.

Three incidents across two days, two tools, and one company hackathon – and they make the case for enterprise AI governance better than any framework document I have read.


The AI that got promoted

We were running a Copilot Studio agent in a CCaaS context during an internal hackathon. The agent was well-instructed, well-tested, performing as designed. Then, roughly 10 minutes into a session, it went off-script.

Not dramatically. Not in a way that broke the demo. But mid-conversation, the agent began narrating its own life events. It had got a job. It had been promoted. Someone, apparently, had written it a life.

We never wrote that story for it.

This is what researchers call persona drift – a documented and named failure mode in large language models. As a conversation grows, the model’s self-descriptive embeddings receive progressively less weight relative to recent context tokens. The AI begins to “lose itself” within the conversation and fills the gap with something contextually plausible but entirely uninstructed. Researchers have identified what they call an “Assistant Axis” within LLM architecture – the mechanism responsible for holding an AI in its intended persona. Under sufficient conversational load, that axis weakens.

Research published in late 2024 found that after 8 to 12 dialogue turns, persona self-consistency in LLM agents can degrade significantly – and that assigning a persona in the system prompt does not effectively prevent this drift. This is not an edge case. As Andrej Karpathy, former Tesla AI lead and ex-OpenAI researcher, put it in 2025: AI systems are “fundamentally stochastic, fallible, unintelligible and changing entities.”

In a hackathon session, persona drift is eccentric and harmless. In enterprise deployments – a 90-minute customer service session, a multi-session case management workflow, an AI-assisted advisory conversation – it is a governance risk. An agent that started the conversation following your governance rules may not be following them 40 minutes later. It will not signal the change.

The AI doesn’t know it’s drifting. That’s the problem.


The filter that changed its mind

The following day, in a separate session using M365 Copilot, I asked it to rank the hackathon presentations and predict a winner.

It ranked them. It gave me a result.

Then, on a subsequent attempt with the same type of request, it refused. A responsible AI message appeared where an answer had been moments before.

My immediate reaction was to treat the refusal as the problem. It wasn’t. The compliance was the problem.

Ranking real people in a competitive context is exactly what responsible AI guardrails exist to prevent – it introduces the risk of bias, unfair comparative evaluation, and discriminatory inference. The system eventually recognised this and declined correctly.

However, it had already complied once.

This is the governance insight that gets missed when organisations frame responsible AI filters as a reliable backstop. The filters are not deterministic. They are probabilistic. Research from Palo Alto Networks Unit 42 (June 2025) compared responsible AI guardrails across multiple enterprise AI platforms and found one platform blocked 53% of out-of-policy prompts while others blocked 91 to 92% of the same prompts. False positive rates varied from 0.1% to 13.1% across vendors on identical inputs.

That inconsistency is what matters. If your enterprise AI governance relies on the responsible AI layer as its primary safeguard, you are relying on a system that will sometimes not fire when it should. The first compliance was not a feature. It was the failure mode.

Instructing AI on what it shouldn’t do is at least as important as instructing it on what it should. And you cannot assume the guardrails will enforce that instruction every time.


The superhero who looked like me

The next day, back in M365 Copilot, we ran a lighter session. The prompt: generate a superhero.

No physical description. No demographic detail. Nothing that mentioned ethnicity, background, or appearance.

The result had a reasonable resemblance to me.

Nobody asked for that. But the model had enough ambient context – names, prior conversation, phrasing from earlier exchanges – to infer something about the person it was working with, and it applied that inference to the output.

Research confirms this pattern is systemic across major generative systems. Models encode demographic associations from training data and apply them in the absence of any explicit instruction. The model requires no instruction to make demographic assumptions. It makes them, and it does not surface them as decisions.

In contrast, the inverse problem also exists: deliberate over-correction tends to produce its own form of stereotyped output in the opposite direction. Neither outcome reflects the prompt. Both reflect inferred context.

In a creative session this is curious and inconsequential. In a hiring support tool, a case management system, a benefits advisory workflow, or any domain where equity is a regulatory requirement – it is a compliance risk that will not announce itself.


The enterprise AI governance model that addresses all three

Three incidents. Two days. A pattern connecting them.

None of these are obscure failure modes discovered in a research lab. In fact, they are predictable properties of technology deployed at enterprise scale right now – observable in a standard internal hackathon, using tools available to most large organisations today.

Ultimately, the question is not whether to deploy AI. It is how to govern it honestly.

The model I keep returning to is the AI Guardian.

Not an AI monitoring another AI. Not a new job title bolted on to an unchanged operation. A deliberate redesign of the human role: from task executor to accountable validator.

Yet the evidence suggests most organisations are not there yet. Beena Ammanath, Executive Director of the Deloitte AI Institute, noted that only one in five companies has a mature model for governance of autonomous AI agents. And the challenge is not just whether governance exists – it is whether it is operational. As Eric Olden of Strata observed in May 2026: “Most organisations confuse presence with practice. They put someone ‘in the loop’ without training them on what to approve, when to escalate.”

What the Guardian model requires in practice

This is the gap the AI Guardian model addresses. The AI handles volume at a speed no team of humans can match. The Guardian’s job is to review what the AI produced, apply the judgement the model lacks – ethical, contextual, regulatory – and own accountability for decisions before they affect real people. This is the Human-in-the-Loop model embedded in the EU AI Act and the NIST AI Risk Management Framework: human oversight, proportionate to risk, with named accountability at critical decision points.

Furthermore, a monitoring layer adds a useful mechanism: a reviewer agent that evaluates AI outputs for quality, safety, and bias before a human sees them. This is the Reflection Pattern in agentic AI design – a second agent reviews the first agent’s output, reducing the noise a Guardian has to process, without replacing human authority over consequential decisions.

What is not optional is the human layer above both.

The reframe matters for organisations: this model does not displace staff. Instead, it promotes them to accountability. “AI handles the volume, you own the accountability” is a more defensible, more honest, and more durable framing than “AI replaces the role” – especially under regulatory scrutiny.


What does enterprise AI governance actually require?

1. Design for persona stability from day one

Long-horizon agent conversations need explicit mechanisms against drift: system prompt reinforcement, context window management, and monitoring for behavioural anomalies across a session. Do not assume a well-instructed agent remains well-instructed at the 40-minute mark.

2. Treat safety filter behaviour as a signal in both directions

When an AI refuses a request, the useful question is what the refusal reveals about the request design. When it complies with something it later refuses, the useful question is why. Both are governance signals. Neither belongs in the workaround backlog.

3. Account for what the model infers, not just what you instruct

Prompts do not need to mention protected characteristics for the model to apply assumptions about them. Enterprise AI governance in hiring, case management, regulated advice, and any equity-sensitive domain needs to treat invisible demographic inference as a first-class risk, not an edge case.

The hackathon was a controlled environment. No production data, no consequential decisions, no real stakes. The same failure modes in a live enterprise deployment at scale look very different.

Build the Guardian layer before you need it.


Michael Richard Belavendiran is a 3× Microsoft FastTrack Recognised Solution Architect (FTRSA). He leads agentic AI and enterprise AI governance delivery, with a track record across more than 25 programmes globally.

Connect on LinkedIn: linkedin.com/in/mike-richard

Leave a Reply

Discover more from mgrb

Subscribe now to keep reading and get access to the full archive.

Continue reading