When a new AI agent gets deployed in your organization, your security team typically asks one question: can we see what it’s doing?
That’s the right instinct. It’s also only a third of the problem.
There are three distinct layers of compliance for AI agents, and most governance tools — including the dashboards your vendors are currently pitching — only address the first. Understanding the difference between them is not an academic exercise. For enterprises deploying agents in regulated environments, it is the difference between a defensible compliance posture and an optimistic bet.
Layer 1: Monitoring
Monitoring means recording what happened. Step-level logs, audit trails, a dashboard showing which agent called which tool with which input. This is valuable and necessary. It is also entirely retrospective.
By the time you see the log entry, the agent has already made the external API call, written to the database, or sent the email containing the customer’s PII. Monitoring answers the question: what did the agent do?
For EU AI Act compliance, monitoring is the floor, not the ceiling. Article 12 requires logging. It does not say logging is sufficient.
A useful analogy: a flight recorder is essential for understanding what went wrong. It does not prevent the crash.
Layer 2: Incident generation
The next layer is policy evaluation — checking each step against a defined set of rules and generating an incident when a rule is violated. This is meaningfully better than monitoring alone. You get alerted when an agent calls an external service it shouldn’t, when it outputs content that violates your data handling policy, or when it operates outside its declared purpose.
But notice: incident generation is still reactive. The violation has already occurred. The incident is a record of something that went wrong, not a prevention of it.
Incident generation answers: which policy did the agent violate, and when?
This is where most current AI governance products operate. They vary in sophistication — some have richer policy libraries, some have better dashboards, some integrate more cleanly with specific platforms — but the fundamental model is the same. Log, evaluate, alert. React after the fact.
For a CISO asking whether their organization can demonstrate compliance with the EU AI Act’s requirements on risk management and human oversight, reactive incident generation is necessary but not sufficient. An auditor asking whether you have adequate controls in place will not be satisfied by a system that records every violation but prevents none.
Layer 3: Runtime intervention
The third layer is where compliance becomes enforcement.
Before executing a step, the agent submits its prospective action for evaluation. The policy engine assesses whether the step would violate any applicable rules and returns the result immediately — before the action is taken. If a policy would be violated, the agent is told before it acts. It can halt, reroute, or escalate to a human. The action never happens.
This is the difference between an audit and a control. One tells you what went wrong. The other prevents it from going wrong in the first place.
Runtime intervention answers: should the agent be allowed to take this step at all?
Note that the log entry for a blocked “pre-flight check” is itself a compliance artifact. It records not just what the agent did, but what it was about to do and why it was stopped. For a regulator asking for evidence of active risk management measures — as the EU AI Act requires for high-risk systems — that trail is far more compelling than a log of violations that were committed and noted.
Where enforcement actually happens
This is where architecture matters. Enforcement can happen at several levels in the stack, and each has meaningfully different properties.
Prompt level. You instruct the model to refuse certain requests: “never send an email on behalf of the user without explicit confirmation,” “do not access data outside folder X.” This is understandable as a starting point and almost entirely ineffective as a security mechanism.
Prompts are instructions, not enforcement. An agent told not to send emails can still send emails if the underlying tool access permits it. A future model version may interpret the instruction differently. A sufficiently complex multi-step task may involve email as an intermediate step in ways the instruction didn’t anticipate. Prompt-level governance is the equivalent of writing “please don’t speed” on a car’s dashboard and calling it a safety feature.
Tool level. You restrict which tools an agent can invoke — no access to the external email API, no write permissions to production databases. This is more robust than prompts and genuinely useful for access control. But it doesn’t cover what the agent does with the tools it has access to, or whether the way it uses those tools complies with your policies. Access control and compliance are related but distinct problems.
Agent code level. You wrap the agent’s execution loop so that every step is evaluated before it runs. Policy evaluation happens in the infrastructure, not in the conversation. The model cannot “decide” to skip it. This is enforcement that operates independently of what the agent believes it should do.
Kyvvu operates at the agent code level, across frameworks. The SDK wraps execution between steps. A CHECK_NEXT_STEP call submits the prospective action before it is committed. The result comes back: clear to proceed, or violation detected. The compliance layer is not advisory. It is structural.
To make this concrete: imagine an agent handling customer service requests. It is about to call an external email API to send a response containing a customer’s account number. Before it makes that call, it submits the prospective step for evaluation. The policy engine checks it against your data handling rules and returns a violation: PII in outbound external call, not permitted. The agent never sends the email. The step is blocked, the incident is logged, and a human is notified. The customer’s data was never exposed. That sequence — check, block, log, notify — happens in milliseconds, between steps, without any change to the agent’s underlying logic.
Unique challenges agents bring — and why we should be prepared
Any honest treatment of agent compliance has to acknowledge something that is easy to gloss over: AI agents introduce a category of challenge that traditional software governance was not designed for, and that we are still learning to address.
Consider what happens when an agent is granted broad permissions. An agent with access to the shell, with the ability to write files, can, in principle, modify its own execution environment. Thus, it could modify the exact rules that are hard-coded to control it. Theoretically, any agent that can extend its own capabilities can do so in ways that were not anticipated at deployment time.
None of this means that runtime enforcement is futile. It means that enforcement infrastructure needs to be designed with these properties in mind — and that permission boundaries need to be set conservatively, with shell access and self-modification capabilities treated as (extremely) high-risk operations requiring explicit policy approval.
These are not reasons to delay agent deployment. They are requirements for deploying agents responsibly.
The EU AI Act’s requirements around risk management, human oversight, and audit trails are not arbitrary bureaucratic constraints. They are a reasonable response to exactly these challenges. The organizations that read those requirements as a checklist to satisfy will build fragile compliance postures. The organizations that read them as a genuine framework for deploying capable systems safely will build something more durable.
What this means in practice
For enterprises deploying AI agents in regulated environments today, the practical implication is this: monitoring and incident generation are necessary starting points, but they are not a compliance posture. A system that records every violation but prevents none does not constitute the “appropriate technical and organisational measures” that the EU AI Act requires for high-risk systems.
The shift from monitoring to enforcement is architectural. It requires wrapping agent execution at the code level, defining policies that are evaluated before steps execute, and ensuring that the enforcement layer operates outside the agent’s own permission scope.
It also requires thinking seriously about what your agents are actually permitted to do — not just what they are instructed to do. The permission boundary is the real security boundary. Compliance infrastructure that an agent can circumvent is not compliance infrastructure. It is a suggestion dressed up as a control.
The good news is that this infrastructure exists. The harder work is making the architectural decision to use it — before the agents you deploy are operating at a scale where retrofitting is no longer realistic.
Kyvvu provides runtime compliance infrastructure for AI agents. We are currently running pilots with enterprise clients in financial services and healthcare. If you are navigating agent deployment in a regulated environment, we would be glad to share what we are learning.
| → More on AI agents at work: theaiagentbook.com |