AI & LLM Security

Compliance Is Not a Document. It Is a System.

AI compliance keeps arriving as a writing exercise that happens before a deadline. We treat it as a property of a running system instead: four tools that test, defend, sanitise, and govern an LLM application, and emit their own evidence while they do it. This is the stack, and how it maps to the rules. DojoLM, BonkLM, RuneLM, and our AI Management System.

Julien P.June 15, 20268 min read

Compliance Is Not a Document. It Is a System.

AI compliance usually arrives as a writing exercise. A policy. A risk register. A slide that says "we take this seriously." The document gets written, signed, and filed. Then the system it describes runs for a year and quietly drifts away from every word of it.

This is a builder's journal. We run our own AI in production, under the same rules everyone else is reading about, and we needed compliance to be a property of the running system, not a binder. So we built the tooling for it. This post is the whole stack in one place, and how it lines up with what the regulation actually asks for.

The reframe

The question we kept coming back to was not "what does the document need to say." It was: what would it take for the running system to produce its own evidence, continuously, as a byproduct of operating safely?

That question has a structure. An LLM application has four jobs to get right: test the model, defend the application while it runs, control what leaves your perimeter, and govern all of it with oversight a person can actually exercise. So we built one tool for each, designed to be used together.

The landscape these tools answer to

The rules are not vague about what they want from an LLM in production.

The EU AI Act names specific duties. Article 9 wants a risk management process and testing before a system reaches the market. Article 10 wants data governance. Article 12 wants logging you can reconstruct after the fact. Article 14 wants a human who can understand the system, intervene, and stop it. Article 15 wants resilience against adversarial inputs, data and model poisoning, and confidentiality attacks. Article 55 puts an adversarial-testing duty on general-purpose models.

ISO/IEC 42001 defines the management system that holds all of that together. It is the standard that coined the term "AI Management System," and it turns governance from a value statement into a set of controls with owners.

The OWASP LLM Top 10 enumerates the failure modes, with prompt injection at number one, sensitive-information disclosure at number two, and excessive agency and improper output handling close behind.

None of this is exotic. Read together, it is a description of an LLM application that somebody is genuinely in control of. The gap is rarely knowing the rules. It is turning them into something that runs. That is the whole job of the four tools below.

The stack

Test it: DojoLM

You cannot put resilience on the record if you never adversarially tested for it. DojoLM is the testing and red-team platform. It is free, self-hostable, and runs entirely on your own infrastructure, which is the right posture when the thing under test is your own attack surface.

544 detection patterns across 49 groups, built on the CrowdStrike Taxonomy of Prompt Injection. The Haiku Scanner runs them in real time, including a streaming mode that inspects tokens as they emit.
Hattori Guard, a bi-directional firewall with four named modes: Shinobi (log only), Samurai (block inputs), Sensei (block outputs), Hattori (block both).
Jutsu Model Lab, a cross-provider leaderboard wired to 57 provider presets, 51 cloud and 6 local.
Sengoku Campaigns and the Time Chamber, for continuous and multi-turn temporal attacks.
The Bushido Book, the compliance translation layer. It maps findings to the OWASP LLM Top 10, the EU AI Act, ISO 42001, NIST AI RMF, and MITRE ATLAS, and emits signed evidence bundles with a SHA-256 manifest.

That last module is the point. A red-team run becomes an artifact you can hand to an assessor.

Defend it at runtime: BonkLM

Testing tells you where the holes are. It does not close them while the application is live. BonkLM is the runtime layer, an in-process guardrail library for Node and TypeScript. No network round-trip, no second model in the loop, no per-request cloud bill.

Nine composable layers: Prompt Injection, Jailbreak, Reformulation, Boundary, PII, Secret, XSS, Bash safety, and a Streaming validator.
A seven-surface taxonomy. Watching only the prompt and the response misses where agents get poisoned, so BonkLM also inspects tool-call arguments, retrieved documents, memory writes, and assembled context.
43 drop-in integration packages, wired by an interactive setup wizard.
Sub-millisecond on the hot path: a p50 around 55 microseconds and roughly 18,000 validations per second on a single core, deterministic, with zero network calls.

It gives prompt injection somewhere to land that is not your model.

Sanitise what leaves: RuneLM

The moment your application calls a third-party model, your data leaves your perimeter. RuneLM is the outbound boundary. It is the fail-closed data sanitisation proxy, and we built it so that cleartext exfiltration is architecturally impossible rather than a setting you can flip.

A nine-stage classification pipeline that sorts every outbound payload into LOW, MEDIUM, HIGH, or BLOCKED. Later stages can only escalate, never lower.
Classification-enforced routing across three tiers: HIGH content goes only to a local model, MEDIUM only to a provider with a signed data-processing agreement on file. There is no override flag for HIGH.
Type-preserving pseudonymisation: an email becomes a valid email, so the model still reasons while the real values never leave. The substitution map is encrypted with AES-256-GCM.
Map-bounded rehydration, so a compromised model that emits fake placeholders to fish for real values gets nothing back.
A tamper-evident audit trail keyed with HMAC-SHA-256 that never stores the prompt text.

You can prove what happened without keeping the content you were trying to protect.

Govern all of it: the AI Management System

Three tools handle the model, the application, and the boundary. The fourth handles the question an auditor actually asks: who is in control, and can they stop it?

Our command centre is our own implementation of an ISO 42001 AI Management System, and we run our agent fleet on it in the open so the architecture is not just a claim.

More than 850 API endpoints and 200 database tables behind a fleet of 35 agents.
Five circuit breakers, CB-1 through CB-5. CB-1 is a one-button hard stop that halts every agent instantly.
Actions graduated by risk. Routine actions auto-execute; higher-risk ones wait in an approval queue until a human signs off.
A full, immutable audit trail, and an eight-step provisioning pipeline (DORMANT to PROVISIONING to BRIEFING to ACTIVE) so nothing goes live unconfigured.

How the stack maps to the rules

Job	Tool	What it answers to
Adversarial testing and evidence	DojoLM	EU AI Act Art. 9, 15, 55; OWASP LLM01-LLM10; NIST AI RMF Measure
Runtime input and output defence	BonkLM	OWASP LLM01, LLM02, LLM05, LLM06, LLM07
Outbound data sanitisation	RuneLM	OWASP LLM02; EU AI Act Art. 10, 12; GDPR Art. 25, 32
Governance and human oversight	AI Management System	ISO 42001; EU AI Act Art. 14, 9, 12

No single tool is a compliance product. None of them sign your conformity assessment. What they do is make the controls those frameworks describe into something that runs, logs, and produces evidence, so the document you write becomes a description of a real system instead of an aspiration.

The principles underneath

Four design choices repeat across all four tools, and each one exists to prevent a specific failure mode.

Fail-closed by default. When classification, routing, or a guard errors, the request blocks. An open default is a quiet leak waiting for an edge case.

Evidence as a byproduct. Every tool logs in a form an auditor can read, and DojoLM and RuneLM produce signed or tamper-evident artifacts. Proof that has to be reconstructed after the fact is not proof. It should fall out of normal operation.

Defence in depth, with shared ground truth. A scanner catches what a guardrail misses. A guardrail blocks what a scanner never tested. When tools disagree and there is no shared spine, you end up maintaining a model of how your tools disagree instead of a model of the attack surface.

Human oversight as architecture. A stop button that lives in a policy PDF is not oversight. A circuit breaker in the TopBar that halts the fleet in one click is.

The rest of this series

This post is the spine. Four deep-dives follow, one per layer:

DojoLM, turning a red-team run into compliance evidence: /blog/dojolm-adversarial-testing-evidence
RuneLM, making cleartext exfiltration architecturally impossible: /blog/runelm-fail-closed-data-boundary
BonkLM, a runtime immune system for the agent call graph: /blog/bonklm-runtime-llm-guardrails
The AI Management System, governance that fires instead of governance that files: /blog/blackunicorn-ai-management-system

We share the architecture in the open because the infrastructure for running AI safely is not mature yet, and it only gets there if the people building it show what production actually looks like. Compliance was never the document. It is the system that can prove it behaved.

DojoLM and BonkLM are out now. RuneLM is in pre-release at runelm.com. The command centre runs in production and we write about it as we go.

Compliance Is Not a Document. It Is a System.

Julien P.June 15, 20268 min read

The reframe

The landscape these tools answer to

The rules are not vague about what they want from an LLM in production.

The OWASP LLM Top 10 enumerates the failure modes, with prompt injection at number one, sensitive-information disclosure at number two, and excessive agency and improper output handling close behind.

The stack

Test it: DojoLM

544 detection patterns across 49 groups, built on the CrowdStrike Taxonomy of Prompt Injection. The Haiku Scanner runs them in real time, including a streaming mode that inspects tokens as they emit.
Hattori Guard, a bi-directional firewall with four named modes: Shinobi (log only), Samurai (block inputs), Sensei (block outputs), Hattori (block both).
Jutsu Model Lab, a cross-provider leaderboard wired to 57 provider presets, 51 cloud and 6 local.
Sengoku Campaigns and the Time Chamber, for continuous and multi-turn temporal attacks.
The Bushido Book, the compliance translation layer. It maps findings to the OWASP LLM Top 10, the EU AI Act, ISO 42001, NIST AI RMF, and MITRE ATLAS, and emits signed evidence bundles with a SHA-256 manifest.

That last module is the point. A red-team run becomes an artifact you can hand to an assessor.

Defend it at runtime: BonkLM

Nine composable layers: Prompt Injection, Jailbreak, Reformulation, Boundary, PII, Secret, XSS, Bash safety, and a Streaming validator.
A seven-surface taxonomy. Watching only the prompt and the response misses where agents get poisoned, so BonkLM also inspects tool-call arguments, retrieved documents, memory writes, and assembled context.
43 drop-in integration packages, wired by an interactive setup wizard.
Sub-millisecond on the hot path: a p50 around 55 microseconds and roughly 18,000 validations per second on a single core, deterministic, with zero network calls.

It gives prompt injection somewhere to land that is not your model.

Sanitise what leaves: RuneLM

A nine-stage classification pipeline that sorts every outbound payload into LOW, MEDIUM, HIGH, or BLOCKED. Later stages can only escalate, never lower.
Classification-enforced routing across three tiers: HIGH content goes only to a local model, MEDIUM only to a provider with a signed data-processing agreement on file. There is no override flag for HIGH.
Type-preserving pseudonymisation: an email becomes a valid email, so the model still reasons while the real values never leave. The substitution map is encrypted with AES-256-GCM.
Map-bounded rehydration, so a compromised model that emits fake placeholders to fish for real values gets nothing back.
A tamper-evident audit trail keyed with HMAC-SHA-256 that never stores the prompt text.

You can prove what happened without keeping the content you were trying to protect.

Govern all of it: the AI Management System

Three tools handle the model, the application, and the boundary. The fourth handles the question an auditor actually asks: who is in control, and can they stop it?

Our command centre is our own implementation of an ISO 42001 AI Management System, and we run our agent fleet on it in the open so the architecture is not just a claim.

More than 850 API endpoints and 200 database tables behind a fleet of 35 agents.
Five circuit breakers, CB-1 through CB-5. CB-1 is a one-button hard stop that halts every agent instantly.
Actions graduated by risk. Routine actions auto-execute; higher-risk ones wait in an approval queue until a human signs off.
A full, immutable audit trail, and an eight-step provisioning pipeline (DORMANT to PROVISIONING to BRIEFING to ACTIVE) so nothing goes live unconfigured.

How the stack maps to the rules

Job	Tool	What it answers to
Adversarial testing and evidence	DojoLM	EU AI Act Art. 9, 15, 55; OWASP LLM01-LLM10; NIST AI RMF Measure
Runtime input and output defence	BonkLM	OWASP LLM01, LLM02, LLM05, LLM06, LLM07
Outbound data sanitisation	RuneLM	OWASP LLM02; EU AI Act Art. 10, 12; GDPR Art. 25, 32
Governance and human oversight	AI Management System	ISO 42001; EU AI Act Art. 14, 9, 12

The principles underneath

Four design choices repeat across all four tools, and each one exists to prevent a specific failure mode.

Fail-closed by default. When classification, routing, or a guard errors, the request blocks. An open default is a quiet leak waiting for an edge case.

Human oversight as architecture. A stop button that lives in a policy PDF is not oversight. A circuit breaker in the TopBar that halts the fleet in one click is.

The rest of this series

This post is the spine. Four deep-dives follow, one per layer:

DojoLM, turning a red-team run into compliance evidence: /blog/dojolm-adversarial-testing-evidence
RuneLM, making cleartext exfiltration architecturally impossible: /blog/runelm-fail-closed-data-boundary
BonkLM, a runtime immune system for the agent call graph: /blog/bonklm-runtime-llm-guardrails
The AI Management System, governance that fires instead of governance that files: /blog/blackunicorn-ai-management-system

DojoLM and BonkLM are out now. RuneLM is in pre-release at runelm.com. The command centre runs in production and we write about it as we go.

Compliance Is Not a Document. It Is a System.

The reframe

The landscape these tools answer to

The stack

Test it: DojoLM

Defend it at runtime: BonkLM

Sanitise what leaves: RuneLM

Govern all of it: the AI Management System

How the stack maps to the rules

The principles underneath

The rest of this series

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications

Compliance Is Not a Document. It Is a System.

The reframe

The landscape these tools answer to

The stack

Test it: DojoLM

Defend it at runtime: BonkLM

Sanitise what leaves: RuneLM

Govern all of it: the AI Management System

How the stack maps to the rules

The principles underneath

The rest of this series

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications