An AI agent just sent a report to an external recipient it shouldn’t have contacted. Your CI/CD pipeline passed every check. Your deployment was clean. Your access control logs show nothing unusual — the agent had read access to the database and write access to email, both legitimately granted.
So what failed?
Nothing failed in the traditional sense. The violation wasn’t in any individual action. It was in the sequence.
The Governance Boundary
CI/CD platforms have transformed how organizations ship software. Build, test, validate, deploy — all governed, all auditable, increasingly automated. It is genuinely impressive infrastructure, and enterprise teams have invested heavily in getting it right.
But there is a boundary at which that governance stops: the moment of deployment.
For traditional software, that boundary is fine. Code ships, code runs deterministically, the behavior you tested is the behavior you get. The pipeline governs everything that matters.
AI agents are different. They don’t stop at deployment — they act. Repeatedly, autonomously, against real systems, with outputs determined at runtime by a language model rather than by code you reviewed. The same agent, the same task, may follow different sequences on different runs. Some of those sequences will be fine. Some will cross a line that no pipeline check could have anticipated, because the line depends not on what any individual action does, but on what came before it.
Commit → [CI/CD governs here] → Deploy → Agent acts → [nothing governs here]
That gap is where most enterprise agent deployments currently live.
Two Teams, One Problem
Here is what makes this gap interesting: two very different groups are running into it from opposite directions.
CISOs and compliance teams are being asked by boards and regulators to demonstrate that their AI agents are operating within policy. They reach for audit logs. They find records of what agents did, but no evidence that governance was active — no trail showing that policies were evaluated, that violations were caught, that the organization was in control. The EU AI Act’s provisions for high-risk AI systems take effect in August 2026. Article 12 requires automatic logging. Article 14 requires human oversight mechanisms. “Our pipeline passed” satisfies neither.
Platform and DevOps teams are extending their CI/CD pipelines to deploy agents and asking: what do we do after deploy? They reach for observability tools. They get dashboards of what happened. They do not get enforcement — nothing that actually evaluates an agent action before it executes and stops it if it violates policy.
Same gap. Completely different vocabulary. Currently no one is bridging it.
Runtime governance is the bridge.
What Runtime Governance Actually Means
Runtime governance is not observability. Observability looks back. Runtime governance intercepts each proposed action before it executes, evaluates it against organizational policy in the context of everything the agent has done so far in that task, and returns a decision: Pass, Steer, or Block.
The “context of everything the agent has done so far” is the key phrase. The reason the data exfiltration scenario at the top of this post is hard to catch isn’t that anyone missed a log entry — it’s that neither the database read nor the email send was individually suspicious. The violation is a property of the sequence.
A policy that can only see individual actions cannot catch sequence violations. A policy that sees the full execution path — what data was accessed, what sensitivity levels were touched, whether required approval steps occurred — can.
This is what we formalize at Kyvvu as the policy function: a deterministic map from agent identity, partial execution path, and proposed next action to a violation score. Access control is a degenerate version of this — it uses only agent identity and action type, ignoring path entirely. Prompting is not a version of it at all — it shifts the distribution over actions without evaluating or enforcing anything. Runtime evaluation is the general case.
The CI/CD Platform Opportunity
CI/CD platforms already understand governance deeply. Config policies, approval gates, audit logs, deployment controls — the conceptual vocabulary is exactly right. The same enterprises that rely on these platforms to govern their pipelines are now deploying agents into production and discovering that the governance story ends at deployment.
The platform that extends its governance model to cover runtime agent behavior — not just “did the pipeline pass” but “is this agent acting within policy right now, in production” — owns the full lifecycle. Design time to runtime. Pipeline to production.
That is not a feature extension. It is a category expansion. The compliance narrative that enterprises are currently trying to piece together across separate tools — pipeline governance here, observability there, a manual audit process somewhere else — becomes a single coherent story.
What This Looks Like in Practice
A runtime governance layer sits between the agent framework and execution. Every time the agent proposes an action, the policy engine evaluates it:
- Has this agent accessed sensitive data earlier in this task that makes an external communication a potential exfiltration?
- Has the required human approval step occurred before this high-risk action?
- Has this agent exceeded its permitted operating window or step count?
- Does this delegation to a sub-agent cross an information barrier?
Policies that depend only on agent identity can be evaluated at task start, before a single step executes. Policies that depend on the execution path are evaluated per-step, against a compact governance state vector updated incrementally — not expensive, not slow. The audit trail is a natural byproduct: every step, every policy evaluation, every enforcement decision, recorded immutably.
For the CISO: a compliance record that demonstrates active governance, not just passive logging. For the platform team: a governance extension that fits naturally into the deployment and monitoring infrastructure they already operate.
Same infrastructure. Two audiences. One problem solved.
An Honest Note
Runtime governance reduces the cost of agent misbehavior. It does not eliminate it. It catches sequence violations that prompt-level and access-level controls miss. It does not make agents perfectly reliable.
It also requires integration with the agent framework — deep integration is straightforward in programmable environments like LangGraph or custom Python agents; shallower integration is possible in low-code platforms with less complete enforcement.
And it doesn’t answer the hardest question, which is organizational rather than technical: who in your enterprise is accountable for the behavior of your agents, and what does accountability mean in practice when the system is partly non-deterministic?
What runtime governance does is give that person something to point to. Evidence that policy was evaluated. Evidence that violations were caught. Evidence that the organization was in control.
That is what August 2026 will require.
Maurits Kaptein is the founder of Kyvvu, a runtime compliance platform for AI agents, and the author of AI Agents at Work. He is a Professor of Applied Causal Inference at TU Eindhoven.
Thinking about runtime governance for your agent deployments? Reach out.