Skip to content
Hintity
Go back

Two Layers of Agent Governance: Why Guardrails Alone Are Not Enough

A red team exercise gone wrong

In a 2024 red team exercise documented by researchers studying multi-agent system security, a test agent was able to exfiltrate sensitive internal data within two hours — despite having guardrails installed on its tool use and reasoning chain. The guardrails worked exactly as designed: they constrained the agent’s actions. But the attack didn’t come through the agent’s actions. It came through the messages.

The attacker crafted inputs that manipulated the agent’s context window, injected instructions into what appeared to be user data, and extracted information through the agent’s responses. The tool-call guardrails never fired because the exploit never involved unauthorized tool calls. It operated entirely in the message layer — the layer that had no governance at all.

This pattern — secure execution, insecure message flow — is far more common than most teams realize. It reveals a structural gap in how the industry thinks about agent governance.

Layer 1: What everyone knows — agent-execution governance

Agent-execution governance is the well-understood layer. It controls what an agent does after it receives a message and starts processing:

This layer has a mature and growing ecosystem. Guardrails AI, NeMo Guardrails (NVIDIA), Patronus AI, and framework-level features in LangGraph, AutoGen, and others all operate here. When people say “agent safety,” this is usually what they mean.

It’s necessary. It’s also only half the picture.

Layer 2: What most teams miss — message-flow governance

Message-flow governance operates at a different level: it controls the pipeline through which messages flow to and from agents, independent of the agent’s own behavior.

Inbound governance (before the agent sees the message)

Outbound governance (before the response reaches the user)

The critical design principle: message-flow governance must be external to the agent. If the agent itself is responsible for filtering its own inputs and validating its own outputs, a compromised or hallucinating agent can bypass those controls. Governance of the message flow must operate in a layer the agent cannot influence.

This is not a theoretical concern. Prompt injection attacks, jailbreaks, and context manipulation all exploit the assumption that the agent can be trusted to police its own inputs. An external message-flow governance layer makes these attacks significantly harder because the filtering happens before the agent ever sees the adversarial content.

How the two layers work together

A well-governed agent system has both layers operating independently:

User Message

[Message-Flow Governance: Inbound]
  - Authenticate sender
  - Check authorization
  - Filter/redact content
  - Log inbound message

Agent Runtime

[Agent-Execution Governance]
  - Monitor reasoning
  - Approve tool calls
  - Enforce action budgets

Agent Response

[Message-Flow Governance: Outbound]
  - Filter response content
  - Check compliance
  - Log outbound message

User receives response

Each layer catches different failure modes:

Failure scenarioWhich layer prevents it?
Unauthorized user talks to sensitive agentMessage-flow (inbound auth)
Prompt injection in user messageMessage-flow (inbound content filter)
Agent calls unauthorized toolAgent-execution (tool approval)
Agent hallucinates confidential dataMessage-flow (outbound filter)
Agent enters infinite reasoning loopAgent-execution (action budget)
No audit trail of agent interactionsMessage-flow (audit logging)
Agent’s safety prompt is overriddenBoth layers together

If only execution governance exists, prompt injections reach the agent unchecked, and agent responses go to users unfiltered. If only message-flow governance exists, the agent can misuse tools and hallucinate freely. You need both, and they must be independently operated.

Current tooling landscape

LayerWhat it governsExample tools & approaches
Execution governanceAgent behavior after receiving a messageGuardrails AI, NeMo Guardrails, Patronus AI, LangGraph checkpoints, human-in-the-loop approval
Message-flow governancePipeline before/after the agentAPI gateways, message middleware, WAF rules, dedicated agent proxy layers
Both (partial)VariesFramework-level safety (OpenAI moderation API, Anthropic constitutional AI)

Most investment today is in execution governance. Message-flow governance is often improvised — a few lines in a webhook handler, an if-statement checking user roles. The gap between “we check some things” and “we have systematic message-flow governance” is where most security incidents occur.

The regulatory perspective

The EU AI Act, which began enforcement in phases starting 2024, makes this two-layer model practically mandatory for high-risk AI systems:

Article 12 (Record-keeping) requires providers to ensure their AI systems have automatic logging of events throughout the system’s lifetime. This maps directly to message-flow governance — an audit trail of every message in and out, independent of the agent’s own logs.

Article 14 (Human oversight) requires that AI systems be designed so humans can effectively oversee their operation. This requires both layers: execution governance enables oversight of the agent’s actions, and message-flow governance enables oversight of what reaches the agent and what it sends back.

Article 9 (Risk management) requires continuous identification and mitigation of risks. A single governance layer leaves an entire class of risks (message-layer attacks, unauthorized access, unfiltered outputs) unaddressed.

Organizations building agent systems for the EU market — or simply following governance best practices — need to demonstrate both layers. “We have guardrails on the agent” is an incomplete answer to an auditor’s question about AI system governance.

A governance checklist

For teams evaluating their agent governance posture, here are the questions to ask across both layers:

Agent-execution governance

Message-flow governance

Independence

If any of these checkboxes are empty, there’s a gap in your governance architecture. The two-layer model isn’t about adding complexity — it’s about ensuring that the inevitable failure of any single layer doesn’t compromise the entire system.

Security is about defense in depth. Agent governance is no different.


Share this post on:

Previous Post
Agent Runtimes Are Everywhere, But How Does the Message Reach the Agent?
Next Post
87% of Multi-Agent Systems Fail in Production — The Reason Isn't What You Think