AI & LLM Security

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The moment your application calls a third-party model, your data leaves your perimeter, and no zero-retention checkbox changes that. RuneLM is the outbound boundary we built so cleartext fallback is not a setting you can flip but a thing the architecture refuses to do. This is how it works and the failure modes it closes.

Julien P.June 15, 20269 min read

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The moment your application calls a third-party model, your data leaves your perimeter. That is not a configuration problem, it is the shape of the request. The bytes travel to someone else's machine, get processed there, and you are trusting a contract and a marketing page for whatever happens next.

This is a builder's journal. We run our own AI in production, calling models we do not host, and we kept arriving at the same uncomfortable fact: the outbound call is the boundary nobody guards. So we built RuneLM, the fail-closed data sanitisation proxy that sits on it. It is the engine we refer to internally as DSP-LLM, it is MIT licensed, and this post is how it works.

The outbound boundary, stated plainly

Here is what happens when an app sends a prompt to a hosted model, with the comforting parts removed. Providers keep prompts for abuse review, regardless of the zero-retention copy on the pricing page, because abuse review and zero retention are different promises under different clauses. A provider breach exposes cached prompts retroactively, so a leak next year reaches into the prompts you sent today. A subpoena served on the provider does not require your consent or your knowledge. And retention terms can change, so the guarantee you read at integration time is a snapshot, not a contract you control.

None of that is an accusation. It is the standing reality of handing data to a third party: your sensitive values sit in a system whose retention, breach surface, and legal exposure you do not govern. The prompt leaves in cleartext, and once it is gone, every guarantee about it lives on the far side of a boundary you no longer control.

The reframe

The question we kept circling was narrower than "which provider has the best retention policy." It was: what would it take for sensitive values to be architecturally incapable of leaving in cleartext, so the safe path is the only path the code can take? A config flag that disables fallback is a leak waiting for the one deploy where someone flips it back. We did not want a safer default, we wanted a system where the unsafe behaviour has no code path.

What we built

RuneLM is a proxy you put in front of every outbound model call. Positioned bluntly: it is the first sanitisation proxy where cleartext fallback is architecturally impossible. It decides what is safe to send and where it may go, and the values you care about never cross the boundary in the clear.

It is in pre-release. The engine is built, more than 5,000 tests cover it, and we are opening it up. There is a waitlist at runelm.com and the code lives at GitHub under BlackUnicornSecurity/runelm-dsp. We are not handing you an install one-liner yet, because the public package name is still settling.

One framing we hold to: RuneLM is a control, not a compliance product. It does not sign your conformity assessment. It enforces a boundary and proves it did.

Architecture and numbers

The proxy is a pipeline of named components: an Identity Verifier, a Classifier, a Pseudonymizer, a Router, a set of Provider Adapters, a Rehydrator, and an Audit Logger. Around the engine sit a read-only Dashboard, a dry-run endpoint that classifies without sending anything, an evidence-pack CLI, and an HMAC key-rotation CLI.

Classification, in nine stages

Every outbound payload runs a nine-stage classification pipeline. The ordering has one rule that defines its safety property: later stages can only escalate the verdict, never lower it. Once a stage decides a payload is sensitive, nothing downstream talks it back down.

NFKC normalisation, so the rest of the pipeline sees one canonical form instead of a thousand visually-equivalent encodings.
Obfuscation-aware keyword blocklist, which catches Base64, leetspeak, Unicode homoglyphs, token-splitting, and nested obfuscation rather than only the literal string.
Custom operator regex, the patterns you add for your own domain.
Built-in regex for the values everyone leaks: IPv4, IPv6, and CIDR ranges, email addresses, AWS, GCP, and GitHub credentials, JWTs, SSH keys, IBANs, Bitcoin addresses, and more.
Structured-data fanout, which walks into JSON, XML, and CSV instead of treating a serialised blob as one opaque string.
Microsoft Presidio NER, for the named entities a regex will never enumerate.
Operator-list cross-reference against lists you maintain.
Coreference detection, so a value reintroduced by pronoun or alias later in the text is still tracked.
Per-caller override feeding into monotonic session escalation, so a session that has seen sensitive data stays at its high-water mark.

Each payload lands at one of four levels: LOW, MEDIUM, HIGH, or BLOCKED.

Routing the four levels enforce

Classification is not advisory. It picks the provider tier, and there are three. L1 is local: Ollama, vLLM, llama.cpp, models running on hardware you control. L2 is contracted, with a precise meaning here, a signed data-processing agreement on file plus a zero-retention API tier, both conditions, not one. L3 is public.

The routing rules are short and not negotiable at runtime. HIGH content goes only to L1, and there is no override flag for HIGH. MEDIUM may go to L1 or L2. BLOCKED raises immediately. Verdict and route are bound together, so the code that would send your most sensitive payload to a public endpoint was never written.

Pseudonymisation that preserves type

When a payload can be sanitised rather than blocked, the Pseudonymizer does type-preserving deterministic substitution. A placeholder keeps the syntactic role of what it replaced: an email becomes a valid email, an IP becomes a valid IP. The model still reasons over well-formed inputs and returns a useful answer, while the real values stay home. The substitution map, the only thing that can turn a placeholder back into a real value, is encrypted with AES-256-GCM, and the key is stored separately from the map.

Rehydration that cannot be tricked

When the model answers, the Rehydrator turns placeholders back into real values under one constraint that is the security property: only placeholders this session created get reversed. This is map-bounded rehydration. A compromised model that emits tokens shaped like your placeholders, hoping the proxy will substitute real secrets back in, defeats a system that rehydrates by pattern. Against the map-bounded version, the fake tokens are not in this session's map, so they get nothing back.

Proof without the prompt

The Audit Logger writes a tamper-evident trail keyed with HMAC-SHA-256, and it never stores the prompt text. It records a keyed hash, the classification level, the entity count, the route taken, and the latency: an evidentiary record of every payload without keeping the content you were trying to protect. Backends include in-memory, sqlite, a hash-chained file, Loki, and S3 Object Lock, so the same trail can be ephemeral in a test or write-once and immutable in production.

Fitting into what you already run

A boundary control is only useful if it sits inline without a rewrite. RuneLM ships an HTTP proxy speaking the OpenAI and Anthropic wire shapes, so many integrations are a base-URL change. Adapters cover LiteLLM, LangChain, and the OpenAI Agents SDK, plus a Claude Code MCP server and SDK clients in TypeScript, Go, and Rust.

The principles, and what each one prevents

Three design choices carry the whole thing, and each exists to close a specific failure.

Fail-closed, with no open path to fall back to. When classification, routing, or a provider check errors, the request blocks. It does not degrade to sending the payload anyway. The failure this prevents is the quiet leak: the edge case or unhandled exception that, in an open-default system, sends your data out because the guard was the thing that broke. Here a broken guard stops the request, it does not ship it unguarded.

The human draws the boundary, the proxy enforces it. People decide what counts as sensitive, through the operator lists, the custom regex, and the per-caller policy. The proxy does not renegotiate that judgment at runtime, it executes it on every call without getting tired, distracted, or talked out of it by a clever prompt. The failure this prevents is the boundary that lives only in someone's head, applied inconsistently and skipped under deadline. The judgment is human, the enforcement is mechanical.

Prove without keeping. The failure this prevents is the audit log becoming the breach. A log that stores prompts to "prove compliance" has quietly assembled a second copy of every sensitive value, somewhere with weaker controls than the system that produced it. A keyed hash proves a payload was handled a given way without being able to reconstruct it.

How it maps to the rules

It answers directly to OWASP LLM02, sensitive information disclosure, the failure mode the proxy exists to prevent. Under the EU AI Act it speaks to Article 10, data governance, by enforcing where data may go, and to Article 12, record-keeping, through the tamper-evident trail. Under GDPR it is a working instance of Article 25, data protection by design, supports Article 32, security of processing, through pseudonymisation and the encrypted substitution map, and feeds Article 30, records of processing, with the audit log. The same enforcement and evidence are relevant to SOC 2, ISO 27001, HIPAA, and PCI-DSS 4.0 control families, because "demonstrate where sensitive data went and prove it could not leave in the clear" is a requirement they share in different words.

None of that makes RuneLM a compliance product. It makes RuneLM the running control that turns those clauses into something a system does and logs, not something a document claims.

Signoff

We built RuneLM because we were not willing to put our own sensitive data on the far side of a boundary we did not control, and a checkbox was not good enough. So we made the unsafe path stop existing. The engine is built, more than 5,000 tests hold it, and we are opening it up.

For how this sits next to testing, runtime defence, and governance, the pillar is here: /blog/llm-security-compliance-stack. For RuneLM itself, the waitlist is at runelm.com and the code is at BlackUnicornSecurity/runelm-dsp. We will keep writing as we go.

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

Julien P.June 15, 20269 min read

The outbound boundary, stated plainly

The reframe

What we built

One framing we hold to: RuneLM is a control, not a compliance product. It does not sign your conformity assessment. It enforces a boundary and proves it did.

Architecture and numbers

Classification, in nine stages

NFKC normalisation, so the rest of the pipeline sees one canonical form instead of a thousand visually-equivalent encodings.
Obfuscation-aware keyword blocklist, which catches Base64, leetspeak, Unicode homoglyphs, token-splitting, and nested obfuscation rather than only the literal string.
Custom operator regex, the patterns you add for your own domain.
Built-in regex for the values everyone leaks: IPv4, IPv6, and CIDR ranges, email addresses, AWS, GCP, and GitHub credentials, JWTs, SSH keys, IBANs, Bitcoin addresses, and more.
Structured-data fanout, which walks into JSON, XML, and CSV instead of treating a serialised blob as one opaque string.
Microsoft Presidio NER, for the named entities a regex will never enumerate.
Operator-list cross-reference against lists you maintain.
Coreference detection, so a value reintroduced by pronoun or alias later in the text is still tracked.
Per-caller override feeding into monotonic session escalation, so a session that has seen sensitive data stays at its high-water mark.

Each payload lands at one of four levels: LOW, MEDIUM, HIGH, or BLOCKED.

Routing the four levels enforce

Pseudonymisation that preserves type

Rehydration that cannot be tricked

Proof without the prompt

Fitting into what you already run

The principles, and what each one prevents

Three design choices carry the whole thing, and each exists to close a specific failure.

How it maps to the rules

None of that makes RuneLM a compliance product. It makes RuneLM the running control that turns those clauses into something a system does and logs, not something a document claims.

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The outbound boundary, stated plainly

The reframe

What we built

Architecture and numbers

Classification, in nine stages

Routing the four levels enforce

Pseudonymisation that preserves type

Rehydration that cannot be tricked

Proof without the prompt

Fitting into what you already run

The principles, and what each one prevents

How it maps to the rules

Signoff

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

BonkLM: A Runtime Immune System for LLM Applications

The AI Management System: Governance That Fires, Not Governance That Files

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

The outbound boundary, stated plainly

The reframe

What we built

Architecture and numbers

Classification, in nine stages

Routing the four levels enforce

Pseudonymisation that preserves type

Rehydration that cannot be tricked

Proof without the prompt

Fitting into what you already run

The principles, and what each one prevents

How it maps to the rules

Signoff

Tags

Related Articles

DojoLM: Red-Teaming You Can Put on the Record

BonkLM: A Runtime Immune System for LLM Applications

The AI Management System: Governance That Fires, Not Governance That Files