AI & LLM Security

The AI Management System: Governance That Fires, Not Governance That Files

Governance for AI usually lives in a PDF that nothing in production can read. We built a command centre that runs our agent fleet and treats governance as code something non-human can follow: circuit breakers, approval gates, fail-closed data boundaries, and an immutable audit trail. This is our own AI Management System, run in the open.

Julien P.June 15, 20269 min read

The AI Management System: Governance That Fires, Not Governance That Files

Governance for an AI system usually lives in a document. There is a policy on who may approve what, a risk register with a column for owners, a paragraph that says a human stays in the loop. The file gets reviewed and signed. Then the agents run, and the file has no idea what they are doing, because a PDF cannot read a database or halt a process.

That is the gap. Not between intent and the document: teams write good documents. The gap is between the document and a control that actually fires. A policy that says "high-impact actions require approval" is a sentence until something in the running system refuses to execute that action until a person clicks approve. This is a builder's journal about closing that gap for our own fleet.

The reframe

We run a fleet of agents in production. They write, they research, they touch finances and outbound communications, they call models. At some point the governing question stopped being "what does our AI policy say" and became something more demanding: what would it take to write the policy precisely enough that a non-human process could execute it, every time, without a person rereading the document?

That reframing changes what you build. A rule like "pause spending in an incident" is no longer guidance for a human to remember. It is a circuit breaker with a defined trigger and a defined blast radius. "A human oversees high-risk actions" is no longer a value statement. It is a gate in the request path that blocks until someone with authority resolves it. Governance becomes code that the fleet has to obey because there is no other path through.

What we built

Our command centre is our own implementation of an AI Management System. That term is not ours: it is defined by ISO/IEC 42001:2023, the first certifiable standard for governing AI, and it means the management system that holds policy, risk, controls, and oversight together. We built one and we run our agent fleet on it, in production, in the open. It is not sold and it is not generally available. It is the reference platform we operate and write about, so the architecture is something you can inspect rather than a claim you have to take on faith.

To be precise about status: we ran an internal assessment of the platform against the ISO 42001 clauses. That produced 24 findings, since remediated. A pre-certification review is not a certificate, and we are not claiming conformity for any specific high-risk system. We are claiming that the controls the standard describes are running, logging, and enforceable. That is a different and, for our purposes, more useful thing.

The architecture, by the numbers

The platform is large because governance that fires has to reach every place an agent can act. The code-audited figures: more than 850 API endpoints across more than 90 router files, more than 200 database tables, and a fleet of 35 agents (25 in the main fleet plus 10 classified). Eighteen MCP tools connect the agents to the surfaces they are allowed to use. Eleven governance policy documents (POL-000 through POL-006 and DOC-001 through DOC-004) are the written layer that the rest of the system implements.

Five circuit breakers, CB-1 through CB-5. CB-1 is the Hard Stop: one button that halts every agent instantly. CB-2 through CB-5 step down through graduated levels of containment, from a partial halt to heightened monitoring, so an incident can be contained without silencing the whole fleet, and a full stop is always one click away.

Actions are graduated by risk. Routine actions execute on their own. Higher-risk actions are gated: they wait in an approval queue until a human signs off, and the request does not arrive as a yes-or-no ping. It surfaces the agent's reasoning, an impact analysis, the policy alignment, and a risk assessment, so the person deciding is reading a case, not rubber-stamping. What counts as gated is configurable per agent, per time window, and per operational state, so the same action can be routine for one agent and held for another.

A Data Sanitization Proxy sits in front of every LLM call and classifies the data into four levels. BLOCKED never reaches any model. HIGH goes to a local model only. MEDIUM is pseudonymised before any external call. LOW passes with logging. The classification decides the route, not a convenience setting.

Three-tier persistent memory. Tier 1 is global and shared, Tier 2 is agent-specific and private, Tier 3 is session-scoped. Cross-read permissions are default-deny, and every read is logged, so one agent reading another agent's memory is an event with a record, not an ambient capability.

Three-layer LLM routing. L1 is local, L2 is subscription, L3 is pay-per-token. Routing is a governed decision, which is what lets the Data Sanitization Proxy send HIGH content to L1 and nowhere else.

An eight-step provisioning pipeline. An agent moves from DORMANT through PROVISIONING and BRIEFING to ACTIVE, and nothing activates without completing the pipeline, so there is no path for an agent to start acting before it has been configured, briefed, and admitted.

A multi-stage quality pipeline runs on agent output: a secrets scan, a PII scan, data-sanitisation, and hallucination detection. Output is checked before it leaves, not after someone notices.

A full, immutable audit trail in PostgreSQL, so every gate, route, read, and stop is reconstructable long after the moment it happened.

The surfaces a person uses are named and purpose-built. The CEO Dashboard is a 9-widget human-in-the-loop view. The Command Centre handles agent lifecycle and assignment. The Ops Center holds the emergency stop, the circuit-breaker status, and the approval queue. The Memory Workshop manages the three memory tiers. OpenClaw is the agent gateway.

The principles underneath

A few design choices repeat across the platform, and each one exists to prevent a specific way these systems fail.

Governance is leadership written down precisely enough that something non-human can follow it. The failure mode here is the signed policy that production never reads. A document that says "high-impact actions need approval" prevents nothing on its own, because the agents executing those actions cannot parse prose. So we wrote the policy as gates and breakers: the approval gate is the approval rule made executable, CB-1 is the stop rule made executable. When the rule lives in the request path, an agent cannot drift away from it, because there is no path that skips it.

Fail-closed data boundaries. The failure mode is the quiet exfiltration: sensitive data riding an outbound call to a third-party model because the default was to allow and an edge case slipped through. The Data Sanitization Proxy inverts the default. HIGH content has exactly one destination, a local model, and there is no convenience flag that sends it elsewhere. The classification decides the route, so the safe behaviour is the only behaviour, not the one a tired operator remembers to choose.

Human oversight as architecture, not as a paragraph. The failure mode is the oversight that exists only on paper: a line in a policy that says a person can intervene, with nothing in the system that actually pauses for them. We made oversight structural. Approval gates stop the action and hand a person the reasoning and risk assessment they need to decide. CB-1 puts a fleet-wide stop one click away in the Ops Center. The audit trail makes every one of those decisions reviewable long after the fact. Oversight you cannot exercise in the moment, on the running system, is not oversight.

How it maps to the rules

The standards describe an AI operator who is genuinely in control. The work is turning their clauses into running controls.

ISO/IEC 42001 is the management-system standard. Clause 5.2 wants an AI policy, which is our eleven governance documents. Clause 6.1 wants risk assessment. Clause 8.1 wants operational controls, which is what the gates, breakers, and proxy are. Clause 9.1 wants monitoring and 9.2 wants internal audit, which is the audit trail plus the assessment we ran against ourselves. Annex A.5.1 wants information disclosure, which is why this post exists in the form it does.

The EU AI Act names specific duties, and three map cleanly onto the architecture. Article 14 wants human oversight, including the ability to override and to stop. The approval gates plus the CB-1 one-click stop directly implement that override-and-stop obligation. Article 9 wants a risk management process, which we hold as a risk register plus control-debt scoring. Article 12 wants record-keeping, which is the audit trail.

The honest caveat stands over all of it. None of this makes the platform certified, and none of it declares conformity for a specific high-risk system. We ran an internal assessment against the clauses, found 24 things, and fixed them. Aligned, not certified. What the mapping shows is that the controls are real and reachable, so the document we could write about our governance would describe a system that behaves the way it says, rather than a system we hope behaves that way.

Builder's journal

We build this in the open because the infrastructure for running AI safely is still young, and it matures faster when the people building it show what production actually looks like, breakers and approval queues and all. A command centre is not the glamorous part of AI. It is the part that decides whether the rest of it is something you are in control of. We would rather operate that in public, findings and remediations included, than file a policy and hope.

This piece sits inside a larger one. If you want the full stack that tests, defends, sanitises, and governs an LLM application, and how each layer maps to the rules, start with the pillar: /blog/llm-security-compliance-stack.

The AI Management System: Governance That Fires, Not Governance That Files

Julien P.June 15, 20269 min read

The reframe

What we built

The architecture, by the numbers

A multi-stage quality pipeline runs on agent output: a secrets scan, a PII scan, data-sanitisation, and hallucination detection. Output is checked before it leaves, not after someone notices.

A full, immutable audit trail in PostgreSQL, so every gate, route, read, and stop is reconstructable long after the moment it happened.

The principles underneath

A few design choices repeat across the platform, and each one exists to prevent a specific way these systems fail.

How it maps to the rules

The standards describe an AI operator who is genuinely in control. The work is turning their clauses into running controls.

The AI Management System: Governance That Fires, Not Governance That Files

The reframe

What we built

The architecture, by the numbers

The principles underneath

How it maps to the rules

Builder's journal

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications

The AI Management System: Governance That Fires, Not Governance That Files

The reframe

What we built

The architecture, by the numbers

The principles underneath

How it maps to the rules

Builder's journal

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications