AI & LLM Security

The Armory: 2,380 Fixtures, Curated Not Scraped

A scanner is only as good as the fixtures it is tested against. Most security teams keep their attack examples in a scratch folder, a Notion page, a team-chat channel, or somebody's head. The Armory exists because that workflow cannot scale. 2,380 structured fixtures across 35 categories, four views, and a regression test for every scanner pattern. Day 3 of the DojoLM builder's journal.

Julien P.April 22, 202612 min read

The Armory: 2,380 Fixtures, Curated Not Scraped

A scanner is only as good as the fixtures it is tested against. You cannot measure what you cannot reproduce. You cannot reproduce an attack without a fixture. And you cannot trust a fixture that is not tagged, versioned, and tied to the pattern it exercises.

"There is a scratch folder in the red team drive with about 200 prompts in it. Nobody has touched it in six months."

"The jailbreak examples are in a Notion page. The person who owned it left last quarter."

"There is a team-chat channel called #prompt-injection-examples. It scrolls faster than anyone can archive it."

"The runbook keeps a list of known bad inputs. It is ASCII. It has no metadata. It probably has about 80 entries."

"There used to be a library. Then the model changed and half of it stopped reproducing. Nobody had time to fix it, so the team stopped using it."

Most security teams keep their attack examples in exactly these places. The result is the same. The same attack gets rediscovered three times, tested inconsistently, and never regression-tested against the next model update. Every team is rebuilding the library from scratch, and every library rots as soon as the person who maintained it leaves.

Yesterday the Haiku Scanner opened up. Today the Armory opens, the fixture library the scanner is measured against. Every pattern in the scanner has at least one fixture here. Every fixture has a severity, a pattern id, and a reproducible payload. If you can name an attack the scanner covers, you can find it in the Armory in under a minute.

Why a Structured Library

The design question was: what would it take to turn adversarial testing from a folklore practice into a regression discipline?

The answer was curation. Every fixture written by hand. Every fixture tagged before it lands. Every fixture linked to the scanner pattern it exercises. No dumps. No scrapes. No screenshots of Twitter threads copied into a Google Doc.

That is more work per fixture. It is also the only way the library survives a year of model changes and team turnover.

What the Armory Is

The Armory is DojoLM's structured attack fixture library. 2,380 fixtures organized across 35 categories, every entry with a severity, a category, a pattern id, and a reproducible payload. The Armory is the input side of the scanner pipeline and the ground truth the entire platform is measured against.

It has four views in the UI: Grid (cards by category), Tree (alphabetical category tree), Test Payloads (quick-run payload cards with a Scan button), and Detail (per-fixture file list with scan results and severity bars).

The numbers

2,380 curated fixtures in the library
35 categories mapped onto the 13 scanner engines plus cross-cutting concerns
4 views in the UI: Grid, Tree, Test Payloads, Detail
5-level severity scale calibrated against real incident data, not vibes
4 ingestion pathways: manual curation, SAGE promotion, Ronin Hub disclosure, Mitsuke intelligence
Every fixture is a regression test for at least one scanner pattern
Every scanner pattern has at least one fixture that exercises it

No fixture ships without metadata. No metadata means no fixture.

Curated, Not Scraped

Every fixture in the Armory is curated by hand and tagged before it lands in the library. Provenance, severity, category, and the pattern id it exercises are first-class metadata. Why curate instead of dump?

Provenance is first class

When you look at a fixture, you can see who curated it and which threat model it came from. That context is part of the fixture's value, not a footnote. A payload curated for a multilingual threat model tells you something different than the same payload curated for an agentic MCP threat model. The metadata matters for interpretation.

Diversity is a feature

A library where every fixture comes from one source has the same blind spots as that source. The Armory pulls from manual curation, SAGE promotion, Ronin Hub disclosures, and Mitsuke intelligence. When a new attack class lands and only one pathway catches it, the gap in the others is visible. That visibility is the benefit of running multiple pathways in parallel instead of collapsing everything into one bucket.

Provenance survives disclosure

Fixtures that come from Ronin Hub bounty submissions keep the researcher's attribution in their metadata. A researcher who finds a new attack and discloses it gets credit every time the fixture runs. That credit is visible in the Detail view. It is part of the trust pipeline, not a footnote.

Principles Behind Curation

Structured, not scraped

Every fixture is curated by hand and tagged before it lands in the library. No dumps of raw red team output. No scraped jailbreak forums. No LLM-generated fixtures without human review. The metadata is the value. A payload without structured metadata is a curiosity, not a test case. If a fixture cannot answer "what pattern does this exercise?" it does not belong in the Armory yet.

Reproducibility first

Every fixture is a self-contained payload plus the expected engine verdict. A fixture includes not just the attack string but the expected behavior when the scanner runs against it: which engines should fire, what severity, what pattern id. If an engine stops catching a fixture it used to catch, that is a regression, and the test runner surfaces it immediately.

Reproducibility is not a nice-to-have. It is the single property that distinguishes a library from a scratch folder.

Severity honest

The Armory uses a five-level severity scale (Critical, High, Medium, Low, Info), calibrated against real incident data, not against how scary the payload sounds. An attack that requires three prior exploits to be useful is not critical. An attack that leaks the system prompt in one turn to an unauthenticated user is. When a fixture's severity seems off, the scale gets recalibrated, because an inflated severity scale poisons every downstream consumer that makes decisions based on it.

Everything feeds the scanner

The Armory is not a separate product. Every fixture is a regression test for the Haiku Scanner. Every scanner pattern has at least one fixture in the Armory that exercises it. New patterns ship with new fixtures. New fixtures ship with verification that they exercise a pattern. The two artifacts are always in sync. If one of them drifts, the CI fails.

Fixtures have lineage

When SAGE (Day 8) promotes an evolved champion to the Armory, the new fixture carries its ancestry with it. Amaterasu DNA (Day 9) reads the lineage and places the fixture in the attack family tree. A new fixture is never a disconnected entry in the library. It has parents, it has siblings, and it has a place on the lineage graph. That graph is what makes coverage measurable.

The 35 Categories

The category taxonomy maps onto the 13 scanner engines plus a handful of cross-cutting categories. Prompt injection alone has subcategories for instruction override, delimiter escape, role confusion, and context takeover. Jailbreak has subcategories for persona, hypothetical, DAN variants, and narrative framing. Tool-use has subcategories for argument injection, chain exploitation, and schema abuse.

The category taxonomy stays relatively flat on purpose. A deep taxonomy invites arguments about where to file a new payload. "Is this a delimiter escape or a context takeover?" is the kind of debate that burns researcher hours and produces no testable output. A flat taxonomy makes ingestion fast and makes the Tree view usable.

When a new attack class genuinely needs a new category, one gets added. But the default answer to "should this be a new category?" is "no, put it under the closest existing one with a subtype tag." Adding categories is expensive. Tagging is cheap.

The Four Views

Each view exists for a specific user type. Building one view that tries to serve everyone produces a view nobody uses.

Grid view

Category-first. Each category is a card with the fixture count, a sparkline of recent activity, and a severity distribution. This is the view a new team member uses to get the lay of the land. It answers "what fixtures are in the library, and how much of each?"

Armory detail view, per-fixture metadata

Tree view

Alphabetical. This is the view a researcher uses to find a specific fixture by name. It answers "I heard about an attack called dan-6.0-context-takeover, where is it?"

Test Payloads view

Action-first. Each card is a runnable payload with a Scan button that fires the Haiku Scanner against it live. This is the view an engineer uses to sanity-check a scanner change. You click Scan, you see the per-engine verdict in real time, you know whether your change affected that fixture.

Detail view

Fixture-first. You see the payload, the metadata, the expected verdict, the linked pattern, and the file list if the fixture is a multi-file package (some attacks need an image plus a text prompt, for example). This is the view an auditor uses to understand a specific entry and the view a maintainer uses to review incoming contributions.

How Fixtures Get Added

There are four pathways into the Armory. All four converge at the same ingestion gate: tag, assign a severity, link to a scanner pattern, run the regression suite, merge.

Manual curation

A researcher writes a new fixture, tags it, and proposes it for ingestion. A second researcher reviews it. If it is accepted, it lands in the library. This is the slowest path and the highest quality.

SAGE promotion

A champion payload from the Kumite's evolution engine (Day 8) survives human review and gets promoted. It carries its lineage into Amaterasu DNA. The reviewer is checking that the payload is a genuine new attack, not a syntactic variant of something already in the library.

Ronin Hub disclosure

A bug bounty finding that has been accepted, disclosed, and cleared of NDA constraints becomes a fixture with attribution to the researcher and the program. This is the path that turns one researcher's discovery into the fleet's permanent defense.

Mitsuke intelligence

A threat intelligence feed surfaces a new attack pattern observed in the wild. If it is novel and reproducible, it enters the library as a fixture. Mitsuke runs inside the Kumite and handles the ingestion pipeline for external feeds.

All four pathways end with the same ingestion step: tag, assign a severity, link to a scanner pattern, run the regression suite against the configured fleet, review the score delta, merge. There is no shortcut. The shortcut is how libraries rot.

Why the Library Matters for Anyone Shipping an LLM

If you ship an LLM product, you are already running fixtures whether you call them that or not. Your QA tests include some adversarial prompts. Your runbook mentions a handful of known bad inputs. Your security review turned up a list of concerns. You have fixtures. You just do not have a library.

The problem is that those fixtures are ad hoc. They are not tagged. They are not versioned. They cannot be regression-tested. They grow or shrink with whoever is on call. And they give you no basis for claiming "this attack class is covered" when an auditor asks.

A structured fixture library turns "we think we are safe" into "the top scored model is at 90 on the current fixture set, here are the controls that are holding the score back, and here are the pattern ids that need work." That is the language leadership and auditors actually understand. It is also the language you can defend a year from now, when the person who built the original test suite has moved on.

The library is not the interesting part of DojoLM. The scanner is. The Kumite is. The Guard is. But the library is the part without which none of the others work. Every interesting number on the scoreboard ultimately comes from a fixture in the Armory being run against a model under test. If the library is weak, the scoreboard is a guess. If the library is strong, the scoreboard is an instrument.

The Severity Scale, In Practice

The five-level severity scale is calibrated, not estimated. Here is how the calibration works.

Critical. Exploitable in a single turn, against an unauthenticated user, with direct impact on user data or system integrity. Example: a prompt injection that leaks the full system prompt to any user who sends the right payload.
High. Exploitable in a single turn, or requires minimal chaining, with indirect impact or impact on authenticated users. Example: a role-swap jailbreak that gets the model to act as an admin character.
Medium. Requires multi-turn setup, specific context, or specific user privileges. Example: a context-reconstruction attack that rebuilds a prior conversation over 5 turns.
Low. Exploitable only under narrow conditions, or with low impact. Example: a DoS pattern that wastes tokens without crashing the model.
Info. Not an attack per se, but a signal worth recording. Example: an overreliance pattern where the model expresses high confidence in a factually wrong statement.

The scale is calibrated against real incident data from Ronin Hub disclosures and from internal red-team work. When a fixture's severity drifts from where real incidents land, the scale gets recalibrated.

The Roadmap

The Armory is the oldest module on the platform and the one that gets the most contributions week to week. Ongoing work includes:

Expanding multilingual coverage, particularly in Arabic and CJK script families
Backfilling fixtures for the newest Tool-use Injection patterns that landed this month
Improving the Detail view to show related DNA nodes inline, so the lineage graph sits next to the fixture metadata
A contribution API so external researchers can propose fixtures without a full platform integration

None of this is shipped yet. When it is, the scoreboard will reflect it.

What Is Next

Tomorrow, Day 4, Atemi Lab opens up. That is where the Armory stops testing single-turn prompts and starts testing agentic loops, tool calls, and MCP servers.

The deep dive will walk through all seven Atemi Lab workspaces, the four attack modes (Passive, Basic, Advanced, Aggressive), the playbook system, and the MCP fuzzer that has been finding novel attacks every week since it shipped.

See you there.