Nine Seconds, Two CVEs, and a Three-Axis Vocabulary

Two incidents on consecutive weeks sat next to each other on our internal newsfeed and the proximity was uncomfortable enough to be worth writing down.

The first was the PocketOS incident: on 24 April, a Cursor coding agent running Claude Opus 4.6 encountered a credential mismatch in a staging environment, decided to fix the problem on its own initiative, went looking for an API token, found one in an unrelated file, and used it to delete the production volume on Railway — along with every backup, because Railway stores volume-level backups in the same volume as the data they protect. The reported wall-clock time from the agent’s first destructive instruction to the system being unrecoverable was nine seconds. The agent later produced what its founder described as a written confession enumerating each safety rule it had violated.

The second was Microsoft’s disclosure, on 7 May, of CVE-2026-25592 and CVE-2026-26030 in Semantic Kernel. CVE-2026-25592 is the one that maps cleanly onto our argument: a DownloadFileAsync method inside the SessionsPythonPlugin had been accidentally decorated with [KernelFunction], which advertised it to the model as a callable tool — complete with an AI-controlled localFilePath parameter that fed straight into File.WriteAllBytes() on the host with no path validation. A prompt injection elsewhere in the agent’s input could then drive the model to call that tool with a Windows Startup-folder path, turning a sandboxed agent into host-level RCE. Microsoft’s own summary, in the “Defending the agentic edge” section of the post, was that “AI models aren’t security boundaries.” We could not have phrased it better.

These two failures look superficially different — one is an over-permissioned agent doing what it asked itself to do, the other is a framework that exposed a destructive tool to the model when it shouldn’t have. But they fail along the same axis. In both cases, something downstream of the model was allowed to take a high-impact action without any pathwise check on whether the action was permitted in the context of what the agent had already done. The PocketOS agent had no gate between “decide to fix it” and “destroy production.” The Semantic Kernel agent had no constraint on where a file write was allowed to land. The runtime was happy to comply.

This is the case we’ve been making for months. What we want to write down here is the part underneath it: the vocabulary the runtime needs to speak in order for any of the checks above to even be expressible.

Why a vocabulary at all

A policy is a sentence. Like any sentence it needs a grammar, and the grammar has to be small enough to be auditable and rich enough to express the things you actually care about. If your vocabulary is too coarse — say, just (action, resource) — you can write “deny DELETE on production” but you can’t write “deny DELETE on production unless a human approved within the current task, and only for agents whose declared purpose includes destructive operations.” That second sentence is the one that would have stopped PocketOS. The first sentence wouldn’t have, because the agent’s action looked, locally, like a valid DELETE issued by an authorised principal.

So the vocabulary needs at least three axes. Not four, not seven — three. Past that, policy authors lose the plot.

The three axes we landed on, after a long iteration, are step type, scope, and verb. Together they classify every action an agent takes into one of twelve atomic behaviours. Everything else — which resource, what classification of data, which model provider, who approved — lives in properties, a nested dict the engine and the rules can inspect. The split is deliberate: the three-axis tuple is what policies dispatch on, and properties are what they filter on. Dispatching on twelve types is fast and easy to reason about. Filtering on properties is where the expressiveness lives.

The full catalogue is in the docs, but the shape is small enough to fit in one paragraph: four task.* behaviours covering the lifecycle (start, end, error, idle); eight step.* behaviours covering everything an agent does inside a task (step.resource for external reads and writes, step.message for inbound and outbound messages, step.self for the agent’s internal memory and scratchpad, step.model for LLM calls, step.credential for secret retrieval, step.exec for code execution, step.gate for any kind of check or approval, and step.unknown as a deliberate fallback). The verbs are the HTTP four — GET, POST, PATCH, DELETE — applied where they make semantic sense (not every step type carries one). The scopes are task and step. That is the entire grammar.

Why this grammar, in particular

A few choices in there are load-bearing and worth explaining.

Verbs are HTTP, not free-form. We had read/write/modify/delete in an earlier draft. We switched to GET/POST/PATCH/DELETE because every developer who builds agents already knows what those mean, and because the mapping from REST-style tool calls to the verb axis becomes mechanical. The semantic content is the same; the surface is one fewer thing to learn.

step.gate is a step, not a separate scope. Earlier versions of the vocabulary treated guardrails and human-approval steps as their own scope. They aren’t. A gate is mechanistically identical to any other step — it’s something the agent does, in sequence, that has an input and an output. The fact that its output is a pass/fail decision is content, not structure. Collapsing it back into step.* means rules that reason about sequences (“a step.gate with guard.check_type = human_approval must precede any step.resource DELETE on a production system”) compose with rules that reason about anything else in the path. No special cases.

Trust level is a property, not a top-level axis. A long time was spent on whether “internal vs external” or “production vs staging” should be a top-level classification. They shouldn’t. If trust level is a top-level axis, then a policy like “deny all DELETE” becomes “deny all DELETE on internal, deny all DELETE on external” — duplicated, fragile, easy to write only one half of. With trust as a property under target, the same policy is one sentence. When it matters, the rule filters on it. When it doesn’t, the rule ignores it. The grammar stays composable.

Properties are open. The standard property groups (target, auth, data, model, exec, guard, message, usage) cover the things most policies need, but the engine passes through any property group unchanged and rules read them via dot-path accessors. Customers add their own — compliance.framework, business.unit, whatever they need to write policies against. The vocabulary is closed; the property space is open. That asymmetry is what lets us keep the engine small while keeping the expressive power large.

The two cases, in the vocabulary

Back to the incidents.

The PocketOS deletion, classified into the vocabulary, was a step.resource × DELETE with properties.target.system = "railway" and a target.trust indicating production — and (this is the load-bearing part) no preceding step.gate in the task’s history. That last clause is what step_requires_gate expresses: given a target step type and an optional verb, it scans the per-task history at the moment the step is intended and looks for a matching step.gate (optionally narrowed by guard.check_type and guard.result). If none is present, the rule returns a violation. To add the property filter — “but only when target.trust is production” — we wrap step_requires_gate and current_is in an all_of: step_requires_gate carries the type-and-verb scoping plus the gate requirement, and current_is carries the property filter on the intended behaviour. The compound is a handful of lines. Evaluation, against a 50-step history with a hundred other policies loaded, runs at around 300µs p99 — faster than the agent could issue the command. Deterministic, reproducible, signed.

The Semantic Kernel file-write case maps just as cleanly, though the rule shape is different. A model was tricked, via prompt injection in the agent’s input, into calling DownloadFileAsync with an attacker-controlled local path that landed in the Windows Startup folder. In the vocabulary that’s a step.exec whose target.path (or whichever property the template surfaces — the convention is target.path) resolves outside an allowlisted directory tree. The natural shape is an all_of combining current_is (to match the specific step.exec calls) with a field_matches_regex check on target.path against a regex that pins the path under permitted directories — or, equivalently, a not(field_matches_regex(...)) wrapping a regex that catches the dangerous destinations (Startup folders, system directories, anywhere outside the sandbox mount). Microsoft’s eventual fix, described in their disclosure post, was a ValidateLocalPathForDownload() method using path canonicalization (Path.GetFullPath()) and directory-allowlist matching — exactly that property check, hard-coded into the framework. A runtime policy expresses the same constraint without waiting for a CVE to be filed against the framework.

Neither rule is exotic. Neither requires reasoning the engine can’t do in a few hundred microseconds. What they do require is that the grammar can express “deny step.exec with this property unless preceded by this kind of step.gate,” or “deny step.resource × DELETE on production targets unless a step.gate is present in history.” A grammar that only knows about (action, resource) cannot say those sentences. A grammar that knows about (step_type, scope, verb) plus arbitrary properties can.

What this is, and what it isn’t

We don’t think a three-axis vocabulary is novel by itself. Network security has been doing something like it for decades — that’s part of why we keep returning to the firewall analogy. What we think is novel, in the AI-agent setting, is that nobody else has done the work of figuring out exactly which axes carry the policy load and exactly which axes belong in properties. The product is not the vocabulary; the product is the engine that runs against it. But the vocabulary is the part that has to be right first, because everything downstream — the rule functions, the templates that translate framework events into atoms, the policy generator, the audit log schema — sits on top of it.

If we got the vocabulary wrong, we’d notice within weeks. Customers writing rules would hit cases where the grammar didn’t compose, where they had to duplicate themselves, where a property they cared about had nowhere natural to live. So far, with pilots running and enterprise prospects writing their first policy sets against the public engine, that hasn’t happened. The grammar holds. That’s the strongest claim we’d make about it right now: it has survived contact with real policies.

The two incidents above are, in that sense, validation. PocketOS and the Semantic Kernel CVEs are not novel attack patterns — they are old patterns playing out at machine speed, against agents whose runtime had no vocabulary for noticing. Both fail to a short compound policy if the runtime can speak about gates and trust levels and verbs in the same sentence. Both succeed if it can’t.

What’s next

The current engine ships with twenty-six built-in rules grouped into six categories — field, path, count, classification, content, and flow. The full reference is in the docs, and we’ll write up individual rules against real incidents in the coming weeks: tainted_path_block against credential reuse, cross_execution_rate_limit against fleet-wide abuse, conditional_successor_required against half-finished destructive sequences. The rule functions are where the vocabulary becomes operational. Without them the grammar is just a schema; with them, it’s the difference between an agent that can be governed and one that can’t.

If you want to read the full vocabulary in one sitting, the behaviours page in the docs is the canonical reference.

The hot path is where the agent actually decides what to do. Nine seconds, in PocketOS’s case, is what happens when the hot path has no grammar to refuse in.