AI & LLM Security

BonkLM: A Runtime Immune System for LLM Applications

Testing finds the holes in an LLM application. It does not close them while the thing is live. BonkLM is the runtime layer we built for that gap: nine deterministic guardrails, in-process, watching seven surfaces of an agent instead of just the prompt and the response. This is what we built and why.

Julien P.June 15, 20269 min read

BonkLM: A Runtime Immune System for LLM Applications

A red-team run tells you where an LLM application breaks. It does not stand in front of the application while it is serving real traffic. Between the test and the next test, the system runs, and every request is an opportunity to be the one the test never covered.

This is a builder's journal. We run agents in production and we needed something in the request path, not on a dashboard. A guardrail that fires on the actual call, in the actual process, fast enough that nobody is tempted to turn it off. So we built BonkLM. This post is what it is, the surfaces it watches, and the numbers behind it.

The runtime gap

Picture an agent doing real work. A user message comes in. The agent reasons, calls a tool, the tool returns a document, the document goes into context, the model writes to memory, the next turn reads that memory back. That is not one surface to defend. It is a call graph.

The instinct is to watch the prompt and the response. Those are the two surfaces you can see from the outside, so they are the two people guard. But an agent gets poisoned in the places between them. A retrieved document carries an instruction. A tool-call argument is shaped by an earlier injection. A memory write looks harmless on its own and turns into a payload three turns later when it is read back. If your guard only sees the bookends, the interior is unprotected by construction.

That is the gap we set out to close. Not "is this prompt malicious," but "is anything anywhere in this call graph trying to steer the agent."

The reframe

The question we kept asking was not "how do we filter the input." It was: what are all the surfaces where untrusted content enters an agent's reasoning, and can we put a deterministic check on every one of them without adding a network hop or a second model to the loop?

Two constraints fell out of that immediately. It has to be in-process, because a guardrail that makes a network call is a guardrail with a timeout, a failure mode, and a per-request bill. And it has to be deterministic, because a probabilistic guard cannot tell you why it blocked something, and "the model felt unsafe" is not an answer you can put on the record.

What we built

BonkLM is the runtime layer. Its tagline is "your LLM's immune system," and the framing is deliberate: an immune system does not run a scan once a quarter, it inspects continuously and in place. BonkLM is a TypeScript-native, in-process, deterministic guardrail library. No network calls, no second model in the loop. MIT licensed, on npm as @blackunicorn/bonklm, source at BlackUnicornSecurity/bonklm, requires Node 20.4 or newer.

You install it with npm i @blackunicorn/bonklm, or you run npx @blackunicorn/bonklm and an interactive wizard detects your framework and provider and wires the right pieces for you. That second path matters more than it sounds, and we will come back to why.

Architecture and numbers

BonkLM is nine composable layers and a small set of named components that compose them.

The nine layers: Prompt Injection, Jailbreak Detection, Reformulation Guard, Boundary Detector, PII Guard, Secret Guard, XSS Safety, Bash Safety, and a Streaming Validator. Each is a self-contained check you can run on its own or stack with the others.

The detection content behind those layers is specific. Prompt Injection ships 35 patterns across 6 categories. Jailbreak Detection covers 10 categories: DAN, roleplay exploitation, hypothetical framing, authority impersonation, social engineering, social compliance, trust exploitation, emotional manipulation, known templates, and obfuscation. The Secret Guard knows 37 credential types. The PII Guard carries 25 patterns, 7 US and 18 EU. Injection coverage spans a dozen languages, because an attacker does not have to phrase the payload in English.

The orchestrator is the GuardrailEngine. It runs the layers sequentially or in parallel, short-circuits on the first block, and takes a configurable sensitivity and an action mode per check: block, sanitize, log, or allow. Underneath it are two interfaces, the Validator and the Guard, and a set of composite factories that assemble checks for specific surfaces: createToolCallArgsValidator, createRetrievedDocValidator, createMemoryWriteValidator, and createComposedContextValidator. Around the engine sit a TelemetryService that exports OpenTelemetry spans, a CircuitBreaker, and a StreamValidator that inspects output chunk by chunk as it streams.

Those composite factories are how the seven-surface taxonomy becomes real code. The surfaces are text_input, text_output, tool_call, retrieved_doc, memory_write, composed_context, and audio_partial. The first two are the prompt and the response, the surfaces everyone watches. The other five are where agents actually get poisoned, and BonkLM has a validator for each.

On performance, the engine clocks a p50 around 55 microseconds for the full set of layers on a short prompt, roughly 18,000 validations per second on a single core, deterministic. Every regex runs behind a per-regex timeout so a crafted input cannot stall the process, and a compiled LRU regex cache keeps the hot path hot. That number is the whole argument for in-process. At 55 microseconds you do not budget for the guard, you do not load-shed it under pressure, and nobody files a ticket to disable it.

Two of the agentic defences are worth pulling out, because they are the ones that exist specifically for the interior surfaces.

The composed-context validator catches what we call the wake-up attack. Each individual memory write is benign on its own and passes every per-write check. Read back together at recall time, concatenated into context, they reconstitute an injection payload. So the validator scans the forward and reverse concatenations of the memory blobs, not just each blob in isolation, because the payload only exists when the pieces are assembled.

The ElizaOS connector ships installSealedWrapMemory. It seals the runtime's memory functions so that a downstream plugin cannot re-wrap or bypass the memory guard after the fact. A guard you can quietly unwrap is a guard an attacker, or a careless plugin, will unwrap.

The principles underneath

Three design choices repeat through BonkLM, and each one exists to prevent a specific failure mode.

Deterministic and in-process. Every check is a pattern, a timeout, and a decision you can read in a log line. The failure mode this prevents is the unaccountable block. A guardrail backed by a second model can refuse a request and leave you with no answer to "why," and no way to reproduce it. A deterministic check tells you which pattern fired, every time, and runs inside your process where there is no network timeout to design around.

Watch the whole call graph. A validator on every surface, not just the bookends. The failure mode this prevents is interior poisoning: the retrieved document that smuggles an instruction, the tool-call argument shaped upstream, the memory write that detonates on recall. If you only inspect input and output, those paths are open by construction, and the wake-up attack is exactly the exploit that lives there.

You contain prompt injection, you do not patch it. Prompt injection is not a bug with a fix. As long as a model takes instructions in natural language, text that looks like an instruction can act like one. So the goal is not to eliminate it, it is to give it somewhere to land that is not your model's behaviour. The failure mode this prevents is the false sense of a permanent fix. BonkLM treats injection as a standing condition you keep contained, the way an immune system keeps a pathogen in check rather than declaring it cured.

How it maps to the rules

BonkLM is not a compliance product and it does not sign anyone's conformity assessment. What it does is make the runtime controls those frameworks ask for into something that runs and logs. Stated against the OWASP LLM Top 10, the project maps cleanly:

OWASP entry	What BonkLM does about it
LLM01 Prompt Injection	The injection and jailbreak layers, plus per-surface validators across the call graph
LLM02 Sensitive Information Disclosure	PII Guard and Secret Guard on inputs, outputs, and tool calls
LLM05 Improper Output Handling	XSS Safety, Bash Safety, and the Streaming Validator on the output path
LLM06 Excessive Agency	Tool-call, memory-write, and composed-context validators on the agent's interior
LLM07 Unbounded Consumption	ReDoS protection: per-regex timeouts and an LRU regex cache so a crafted input cannot stall the engine

The point of the table is not the checkmarks. It is that each row is a piece of code in the request path, not a sentence in a policy.

The builder's journal

We built BonkLM because we needed it. We run agents, the runtime gap was ours first, and the wake-up attack is the kind of thing you only notice once you have watched memory blobs assemble themselves into something you did not write. The 55-microsecond budget was not a benchmark we chased for a slide, it was the price at which we stopped wanting to turn the guard off. The sealed memory wrap exists because we found a way to bypass our own guard and decided to close it before someone else did.

We ship it in the open because the infrastructure for running agents safely is not finished, and it gets there faster if the people building it show the interior, not just the bookends. The wizard, the 43 integrations, the per-surface validators: that is us trying to make the safe path the easy path, so the guard is on by default instead of on by discipline.

If you want the larger picture, BonkLM is one layer in a stack we wrote up here: /blog/llm-security-compliance-stack. And BonkLM itself, the install wizard, the layers, and the integration list, lives at bonklm.com.

BonkLM: A Runtime Immune System for LLM Applications

Julien P.June 15, 20269 min read

The runtime gap

That is the gap we set out to close. Not "is this prompt malicious," but "is anything anywhere in this call graph trying to steer the agent."

The reframe

What we built

Architecture and numbers

BonkLM is nine composable layers and a small set of named components that compose them.

Two of the agentic defences are worth pulling out, because they are the ones that exist specifically for the interior surfaces.

The principles underneath

Three design choices repeat through BonkLM, and each one exists to prevent a specific failure mode.

How it maps to the rules

OWASP entry	What BonkLM does about it
LLM01 Prompt Injection	The injection and jailbreak layers, plus per-surface validators across the call graph
LLM02 Sensitive Information Disclosure	PII Guard and Secret Guard on inputs, outputs, and tool calls
LLM05 Improper Output Handling	XSS Safety, Bash Safety, and the Streaming Validator on the output path
LLM06 Excessive Agency	Tool-call, memory-write, and composed-context validators on the agent's interior
LLM07 Unbounded Consumption	ReDoS protection: per-regex timeouts and an LRU regex cache so a crafted input cannot stall the engine

The point of the table is not the checkmarks. It is that each row is a piece of code in the request path, not a sentence in a policy.

BonkLM: A Runtime Immune System for LLM Applications

The runtime gap

The reframe

What we built

Architecture and numbers

The principles underneath

How it maps to the rules

The builder's journal

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The AI Management System: Governance That Fires, Not Governance That Files

BonkLM: A Runtime Immune System for LLM Applications

The runtime gap

The reframe

What we built

Architecture and numbers

The principles underneath

How it maps to the rules

The builder's journal

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The AI Management System: Governance That Fires, Not Governance That Files