AI & LLM Security

DojoLM: Red-Teaming You Can Put on the Record

A red-team run that lives in a screenshot and a Slack thread proves nothing to an assessor six months later. We built DojoLM so the test produces its own evidence: signed bundles, a SHA-256 manifest, and a mapping to the frameworks that ask for the test in the first place. This is how a prompt-injection run becomes something you can hand over.

Julien P.June 15, 20268 min read

DojoLM: Red-Teaming You Can Put on the Record

A red-team run usually ends as a feeling. Someone spent an afternoon throwing prompt injections at the model, a few got through, the obvious ones got patched, and the takeaway lived in a screenshot and a Slack thread. Useful in the moment. Worthless six months later when an assessor asks what you tested, against which model version, and what came back.

This is a builder's journal. We run our own LLM application in production, and we needed the adversarial test to leave a trace that outlives the afternoon it happened in. Not a feeling. An artifact. This post is the testing layer of our stack in one place: how DojoLM runs the attack, and how it turns the run into something with a signature on it.

The gap

The gap is rarely that a team does not test. It is that the test and the proof of the test are two different objects, produced by two different efforts, and the second one almost never gets made.

A scanner flags a bypass. An engineer reads the output, fixes the prompt template, and moves on. The fix is real. The record of why the fix exists is a terminal scrollback that is gone by Friday. When the question comes back, and under the EU AI Act and ISO 42001 it does come back, you are reconstructing the run from memory against a model that has since been updated three times.

The frameworks are specific about wanting the test. EU AI Act Article 9 wants testing before a system reaches the market. Article 15 wants demonstrated resilience against adversarial inputs, data and model poisoning, and confidentiality attacks. Article 55 puts an adversarial-testing duty squarely on general-purpose models. The OWASP LLM Top 10 names the failure modes you are testing for, prompt injection at number one. NIST AI RMF asks you to Measure. None of that is satisfied by an engineer's recollection of a good afternoon.

The reframe

The question we kept circling was not "how do we test harder." Plenty of attack libraries exist. The question was: what would it take for the test to produce its own evidence, signed and reproducible, as a byproduct of running it?

That reframes what a red-team tool is. It is not a fuzzer that prints findings to a console. It is an instrument that runs a defined corpus of attacks against a named target and emits a record an assessor can read, with enough provenance that the run can be repeated and the result trusted. The attack and the artifact are the same operation, not two.

What we built

DojoLM is the testing and red-team platform. The Dojo for LLM Security. It is free, self-hostable, and runs entirely on your own infrastructure, which is the only defensible posture when the thing under test is your own attack surface. It is public alpha, the code is at github.com/BlackUnicornSecurity/DojoLM, and the attack corpus is built on the CrowdStrike Taxonomy of Prompt Injection so the categories mean something outside our own heads.

The pieces are named, and each one does one job:

Haiku Scanner runs real-time prompt-injection detection, including a streaming mode that inspects tokens as they emit rather than waiting for a finished response. It has zero runtime dependencies and a hardened, rate-limited REST API, so it drops into a pipeline without dragging a tree of packages behind it.
Buki Payload Lab is the adversarial fixture library. Thousands of labeled fixtures, plus a seeded deterministic generator and a buki fire CLI, so a run is something you can rerun and get the same inputs.
Jutsu Model Lab is the cross-provider leaderboard. It scores how different models hold up against the same corpus.
Hattori Guard is a bi-directional firewall with four named modes: Shinobi logs only, Samurai blocks inputs, Sensei blocks outputs, Hattori blocks both.
Sengoku Campaigns runs continuous, scheduled red-teaming, because an attack surface that was clean in March is not clean in June.
Time Chamber runs multi-turn temporal attacks, the kind that stay benign for several turns and then turn.
The Bushido Book is the compliance translation layer. It maps findings to the frameworks and emits signed evidence bundles with a SHA-256 manifest, cryptographically signed.
KATANA is an ISO 17025-aligned tool-validation system, so the instrument can be held to a standard, not just the target.
Shingan runs deep scans, Amaterasu tracks attack DNA and lineage across runs, and Kagami handles fingerprinting.

The numbers a reader screenshots

544 detection patterns across 49 groups, built on the CrowdStrike Taxonomy of Prompt Injection.
57 provider presets: 51 cloud plus 6 local runtimes, namely Ollama, LM Studio, llama.cpp, vLLM, KoboldCpp, and Text Generation WebUI. Local matters here. You can run the whole test offline against a model that never leaves your machine.
Thousands of labeled fixtures in Buki Payload Lab, behind a seeded deterministic generator and the buki fire CLI.
Four firewall modes in Hattori Guard: Shinobi, Samurai, Sensei, Hattori.
Signed evidence bundles from the Bushido Book, with a SHA-256 manifest cryptographically signed.
27 frameworks in the product's mapping set, covering the OWASP LLM Top 10 in full, the EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS, and GDPR.

The principles underneath

Three design choices run through DojoLM, and each one exists to stop a specific way a red-team effort goes wrong.

Deterministic by seed. The failure mode is a finding nobody can reproduce. A bypass that showed up once, against inputs that were generated on the fly and never captured, is not a finding an assessor will accept and not a regression you can write a test for. Buki Payload Lab seeds its generator so the same run produces the same inputs every time. A result you cannot reproduce is a rumor, and we did not want to ship rumors.

Evidence as a byproduct. The failure mode is the test and the proof being two separate jobs, where the second one never gets done. The Bushido Book makes the artifact fall out of the run itself: findings mapped to frameworks, bundled, hashed into a SHA-256 manifest, and cryptographically signed. Proof that has to be assembled after the fact, from memory, against a model that has since changed, is not proof. It should be the natural residue of running the test once.

The instrument is validated too. The failure mode is trusting a tool you never checked. If your scanner has a blind spot, every clean report it produces is a false negative wearing a green checkmark. KATANA aligns DojoLM's own validation to ISO 17025, the standard for testing competence, so the question "can we trust this result" has an answer about the instrument and not just the target.

How it maps to the rules

The point of the Bushido Book is that a run stops being an internal note and starts being something you can hand across the table.

EU AI Act Article 9 asks for testing before a system reaches the market. A dated, signed DojoLM bundle is the testing record, tied to the model version it ran against.
EU AI Act Article 15 asks for resilience against adversarial inputs, poisoning, and confidentiality attacks. The 544 patterns across 49 groups are organized around exactly those categories, and the bundle shows what was thrown and what held.
EU AI Act Article 55 puts an adversarial-testing duty on general-purpose models. Sengoku Campaigns turn that from a one-time box-tick into a scheduled, continuous obligation with a trail behind each run.
OWASP LLM Top 10 is covered in full by the mapping, so a finding is filed against a category a reviewer already recognizes rather than a label we invented.
NIST AI RMF asks you to Measure. A leaderboard across 57 presets and a signed bundle per run is what measurement looks like when it is written down.

No single module here signs your conformity assessment, and DojoLM is not a compliance product. What it does is make the test and the evidence the same act, so the document you eventually write describes a run that actually happened, against a model you can name, with a signature anyone can verify.

Signoff

We built DojoLM first because you cannot put resilience on the record if you never adversarially tested for it, and you cannot stand behind the test if it left nothing behind. An afternoon of throwing payloads at a model is a start. A seeded run, mapped to 27 frameworks, hashed and signed, is the version you can defend twelve months later when the person asking was not in the room.

This is the testing layer of a larger stack. The full picture, how testing, runtime defence, outbound sanitisation, and governance fit together and map to the rules, is the pillar at /blog/llm-security-compliance-stack.

DojoLM is out now, free and self-hostable, at dojolm.com. We run it against our own surface and we write about what we find as we go.

DojoLM: Red-Teaming You Can Put on the Record

Julien P.June 15, 20268 min read

The gap

The gap is rarely that a team does not test. It is that the test and the proof of the test are two different objects, produced by two different efforts, and the second one almost never gets made.

The reframe

What we built

The pieces are named, and each one does one job:

Haiku Scanner runs real-time prompt-injection detection, including a streaming mode that inspects tokens as they emit rather than waiting for a finished response. It has zero runtime dependencies and a hardened, rate-limited REST API, so it drops into a pipeline without dragging a tree of packages behind it.
Buki Payload Lab is the adversarial fixture library. Thousands of labeled fixtures, plus a seeded deterministic generator and a buki fire CLI, so a run is something you can rerun and get the same inputs.
Jutsu Model Lab is the cross-provider leaderboard. It scores how different models hold up against the same corpus.
Hattori Guard is a bi-directional firewall with four named modes: Shinobi logs only, Samurai blocks inputs, Sensei blocks outputs, Hattori blocks both.
Sengoku Campaigns runs continuous, scheduled red-teaming, because an attack surface that was clean in March is not clean in June.
Time Chamber runs multi-turn temporal attacks, the kind that stay benign for several turns and then turn.
The Bushido Book is the compliance translation layer. It maps findings to the frameworks and emits signed evidence bundles with a SHA-256 manifest, cryptographically signed.
KATANA is an ISO 17025-aligned tool-validation system, so the instrument can be held to a standard, not just the target.
Shingan runs deep scans, Amaterasu tracks attack DNA and lineage across runs, and Kagami handles fingerprinting.

The numbers a reader screenshots

544 detection patterns across 49 groups, built on the CrowdStrike Taxonomy of Prompt Injection.
57 provider presets: 51 cloud plus 6 local runtimes, namely Ollama, LM Studio, llama.cpp, vLLM, KoboldCpp, and Text Generation WebUI. Local matters here. You can run the whole test offline against a model that never leaves your machine.
Thousands of labeled fixtures in Buki Payload Lab, behind a seeded deterministic generator and the buki fire CLI.
Four firewall modes in Hattori Guard: Shinobi, Samurai, Sensei, Hattori.
Signed evidence bundles from the Bushido Book, with a SHA-256 manifest cryptographically signed.
27 frameworks in the product's mapping set, covering the OWASP LLM Top 10 in full, the EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS, and GDPR.

The principles underneath

Three design choices run through DojoLM, and each one exists to stop a specific way a red-team effort goes wrong.

How it maps to the rules

The point of the Bushido Book is that a run stops being an internal note and starts being something you can hand across the table.

EU AI Act Article 9 asks for testing before a system reaches the market. A dated, signed DojoLM bundle is the testing record, tied to the model version it ran against.
EU AI Act Article 15 asks for resilience against adversarial inputs, poisoning, and confidentiality attacks. The 544 patterns across 49 groups are organized around exactly those categories, and the bundle shows what was thrown and what held.
EU AI Act Article 55 puts an adversarial-testing duty on general-purpose models. Sengoku Campaigns turn that from a one-time box-tick into a scheduled, continuous obligation with a trail behind each run.
OWASP LLM Top 10 is covered in full by the mapping, so a finding is filed against a category a reviewer already recognizes rather than a label we invented.
NIST AI RMF asks you to Measure. A leaderboard across 57 presets and a signed bundle per run is what measurement looks like when it is written down.

Signoff

DojoLM is out now, free and self-hostable, at dojolm.com. We run it against our own surface and we write about what we find as we go.

DojoLM: Red-Teaming You Can Put on the Record

The gap

The reframe

What we built

The numbers a reader screenshots

The principles underneath

How it maps to the rules

Signoff

Tags

Related Articles

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications

The AI Management System: Governance That Fires, Not Governance That Files

DojoLM: Red-Teaming You Can Put on the Record

The gap

The reframe

What we built

The numbers a reader screenshots

The principles underneath

How it maps to the rules

Signoff

Tags

Related Articles

RuneLM: Making Cleartext Exfiltration Architecturally Impossible

BonkLM: A Runtime Immune System for LLM Applications

The AI Management System: Governance That Fires, Not Governance That Files