Something happened over the last six weeks that is worth writing down, because the pattern is clearer than any single announcement.

On April 2, Microsoft open-sourced the Agent Governance Toolkit: runtime policy enforcement before an agent acts, sandboxing, identity between agents, compliance mappings for the EU AI Act, SOC 2, HIPAA, and the OWASP Agentic Top 10. Weeks earlier, AWS had taken Bedrock AgentCore Policy generally available — a Cedar-based engine, attached to a gateway, that intercepts every tool call and checks it against centrally managed rules before execution. The phrasing across both is nearly interchangeable: a runtime authorization layer that decides whether an agent is allowed to act before it acts.

We have been saying exactly this for the better part of a year. So our first reaction was the obvious one: the category is real, and the largest software company in the world is now spending its marketing budget to explain it. That is good for everyone building here.

But once you read past the press copy and into the engines themselves, they share one property — and it is the property we deliberately built against.

They evaluate one action at a time

Microsoft’s Agent OS engine advertises a p99 latency under 0.1 milliseconds. That number is only achievable if the engine is stateless — and Microsoft says so plainly. In their own launch post, one of the lessons they list reads “Statelessness enables everything”: making the kernel stateless, they explain, is what let horizontal scaling, containerized deployment, and auditability come naturally. A stateless engine inspects the action in front of it, checks it against a set of rules, returns allow or deny, and forgets. No memory of what the agent did three steps ago. No notion of the path that led here.

This is the same architecture as a stateless packet filter — the kind of firewall the network world relied on in the 1990s and then largely abandoned. A stateless filter can tell you “this packet is addressed to port 443, allow it.” It cannot tell you “this packet claims to be part of a connection that was never opened.” For that you need state. Every serious network firewall has been stateful for thirty years. Agent governance, as shipped this month, is still stateless.

Stateless enforcement answers a narrow question well: is this single action, viewed in isolation, permitted? It is genuinely useful. It catches the obviously-forbidden tool call, the write to a path that should never be written to, the outbound request to a blocked domain. If that is the whole of your threat model, the free toolkit will serve you.

The trouble is that the failures that actually hurt do not look forbidden in isolation.

The dangerous actions are individually innocent

Consider the failure modes that have made the news this year. An agent reads a record, then reads another, then another, and exfiltrates a dataset — each read perfectly legal on its own. An agent encounters what it reads as a config error and “fixes” it by deleting a database; the DELETE is issued by a legitimately authenticated principal and looks, locally, valid. An agent is told in an injected tool description to call a function it would never normally reach, and the call itself violates no per-action rule.

None of these is caught by asking “is this one action allowed?” Each is only caught by asking “is this action allowed given everything that came before it?” That is a question about the path, not the action. And a stateless engine cannot, by construction, ask it.

This is what we mean when we describe Kyvvu as policies on paths. The engine carries the agent’s execution history and evaluates each step against it. A rule can say: this verb is forbidden if it follows that sequence; this read is fine once but not the fortieth time in ninety seconds; this write requires that a human-approval step appears earlier in the path. The cost of carrying that state is real — we have written before about the hot-path tax — but it is smaller than the architecture suggests: 100 policies evaluated against a 50-step history runs at 296µs p99, in-process. Stateful and still sub-millisecond. The state buys you the ability to refuse the failures that matter.

A second cut: deterministic versus probabilistic

There is a sharper distinction hiding inside these toolkits, and one of the more thoughtful comments under Microsoft’s announcement drew it out. The features in a typical governance bundle do not all enforce the same way.

Policy enforcement, sandboxing, identity, execution gating — these are deterministic. They block categorically: the rule either matches or it does not, and an adversary cannot argue with it. Trust scoring and adaptive circuit breakers are probabilistic: they estimate, and estimates degrade under adversarial pressure in exactly the moments you most need them not to.

Both belong in a mature stack. But a buyer reading a compliance-mapping table cannot see which is which. “Covers OWASP Agentic risk #6” tells you the risk is addressed; it does not tell you whether it is addressed by a categorical control or a statistical one. For a regulated workload, that difference is the whole question. Our position is unambiguous: the load-bearing layer — the part that decides allow or deny — must be deterministic. State, yes; statistics in the enforcement path, no.

Why we are glad the toolkits shipped

It would be strange for a company in this space to greet a Microsoft launch with anything but caution, so let us be plain about why we are not worried.

Most organizations are at step one of their agent journey. They are wiring up a first useful agent, not running a fleet of autonomous ones across regulated processes. For them, stateless per-action enforcement is the right first control, and a free, well-documented toolkit is a fine place to start. The market needs that on-ramp, and we would rather Microsoft built it than that it not exist.

The wall arrives later — when the agents multiply, the workflows chain across systems, and someone reviews an incident that no single forbidden action explains. That is the moment stateless governance visibly runs out, and the question becomes a question about paths. We are building for that moment. We’d put it around two years out for most enterprises, and we intend to be the name people already associate with the problem when they hit it.

The vocabulary is small enough to read in one sitting: the behaviours page is the canonical reference, and the engine is on PyPI.

Compare notes. Or paths.


Kyvvu is a behavioral firewall for AI agents: stateful, path-aware, deterministic runtime enforcement that runs in-process at the edge. Built in Europe. Architecture documentation at docs.kyvvu.com.