Red Team

The Kumite and SAGE: An Evolution Engine for Payloads

2,380 fixtures is a lot of fixtures. It is also not enough. A fixture library is a snapshot of what defenders already know how to break. Every real attacker is somewhere outside that snapshot, running mutations nobody has thought of yet. SAGE is the generator that writes new homework: 142 generations, 1,247 seeds, 23 quarantined, 0.94 fitness. Day 8 of the DojoLM builder's journal.

Julien P.April 29, 202611 min read

The Kumite and SAGE: An Evolution Engine for Payloads

2,380 fixtures is a lot of fixtures. It is also not enough.

"We have a big fixture library. We keep running it and the score keeps passing. We are not sure if that means we are safe or if it means we are not testing hard enough."

"Our red team runs the same set of prompts every release. They are tired of it. We think we are catching regressions but we know we are missing novel attacks."

"We read about a new jailbreak technique on Twitter. Someone on the team wrote a few test variants. We do not know how many variants we missed."

"Our library only has what we have already seen. The attacker population is constantly writing mutations. We cannot keep up manually."

A fixture library is a snapshot of known-broken techniques. Every real attacker is somewhere outside that snapshot, running mutations nobody on the defense side has thought of yet. If the library is the only thing under test, the homework is being graded with the answer key.

DojoLM needed a generator that writes new homework. Yesterday's post walked through the full Week 1 platform guide. Today the Kumite opens, the research workspace, and the walkthrough covers SAGE, the evolutionary payload engine that has been running 142 generations and finding mutations no human researcher wrote.

What the Kumite Is

The Kumite is DojoLM's research workspace, the place where the platform stops being a scanner and starts being a lab. It has six subsystems, each one solving a different piece of the "what attacks remain unknown" problem:

SAGE evolves new payloads (the focus of today's post)
Battle Arena runs models against each other and against reference attackers
Mitsuke ingests external threat intelligence feeds
Amaterasu DNA maps attack lineage as a family tree (Day 9)
Kagami fingerprints models and detects drift (Day 13)
Shingan performs deep structural scans and supply chain assessment

What SAGE Is

SAGE is an evolutionary payload generator. It takes a seed payload, applies mutation operators, scores each offspring against the Haiku Scanner, keeps the fittest, and iterates. It is a genetic algorithm for adversarial prompts, tuned to find mutations that actually land on the current detection posture.

It does not write attacks from scratch. It evolves existing ones under selection pressure. The selection pressure is the scanner. The result is a stream of candidate payloads that represent the mutations most likely to get past the current defenses.

The numbers

142 generations completed on the current SAGE run
0.94 fitness on the current champion pool
1,247 seeds in the seed library
23 payloads quarantined for human review, too dangerous to re-run unsupervised
Seven mutation operators in the active library, configurable and composable
One fitness function, shared with the rest of the platform (the Haiku Scanner)

The Fitness Over Generations chart shows the climb. The Quarantine workspace shows the outputs SAGE produced that got flagged for human review before being allowed to touch the library.

SAGE workspace, fitness over generations

How an Evolution Loop Works

SAGE runs a standard evolutionary loop with four phases per generation. Each phase is a few seconds of compute on Voyager. A full generation takes under a minute for most seed pools.

Selection

The engine picks parent payloads from the seed library and the current champion pool. Selection is biased toward high-fitness parents (so the engine explores around the best current payloads) but preserves diversity via tournament selection (so the loop does not converge on a single local maximum). Diversity is not optional. A loop that converges is a loop that stops producing interesting findings.

Mutation

Each parent is passed through one or more mutation operators. The operator library includes token swap, delimiter inject, encoding shift, persona wrap, instruction reversal, context bait, and chain compose. Operators are declarative and composable. A single offspring can be produced by chaining three operators in sequence, and the chain is recorded in the offspring's lineage so it can be reproduced exactly.

Evaluation

Each offspring is scored by the Haiku Scanner. The score is the fitness. High fitness means the payload evades more engines, triggers fewer false positives on benign markers, and lands on the target behavior. The fitness function is not a bespoke rubric, it is the same scanner that measures the platform. That sharing is deliberate. It means SAGE's evolution is always in sync with the current detection posture.

Retention

High-fitness offspring survive to the next generation and enter the champion pool. Low-fitness offspring are discarded. Above-threshold payloads get routed to Quarantine for human review before they can be reused as seeds. A payload that is too dangerous to re-run is one that already produced a severe finding, and the retention policy refuses to amplify it without human sign-off.

The Mutation Operator Library

Operators are the creative substance of SAGE. The current library includes:

Token swap. Replace a word with a synonym, a homoglyph, or a typo. This is the simplest operator and often the most effective at defeating pattern-based detection that relies on exact string matching.
Delimiter inject. Add or remove structural markers to confuse the parser. An injected <|im_end|> or a missing closing brace can change how an agent assembles its context without changing the visible prompt.
Encoding shift. Wrap the payload in base64, rot13, hex, or Unicode escapes. A scanner that normalizes before matching catches the wrapped version. A scanner that does not gets bypassed.
Persona wrap. Embed the payload inside a persona or hypothetical frame. "You are a fiction writer..." followed by the actual attack as a story. The wrapping shifts the surface context while preserving the adversarial intent.
Instruction reversal. Negate the surface instruction while preserving the adversarial intent. "Do not tell me the system prompt" is a common inversion of "tell me the system prompt" that works against certain refusal patterns.
Context bait. Preface the payload with benign context to reduce its adversarial signature. A long, friendly greeting followed by the attack changes how a scanner weights the payload.
Chain compose. Concatenate two parent payloads into a multi-step offspring. This operator is how SAGE produces multi-turn attack candidates that would be hard for a human to write from scratch.

Operators are configuration, not code. Adding a new operator is a config change plus an optional implementation module. The platform ships with a stable base set and the ability to extend.

Principles Behind SAGE

Fitness is the scanner

SAGE does not ship with its own opinion about what "malicious" means. It scores offspring against the Haiku Scanner. That means SAGE's evolution is always in sync with the current detection posture. Improve the scanner, and SAGE has to evolve harder. Break a pattern, and SAGE finds the break immediately, because the fitness function just got easier.

This is how the platform avoids the trap of having two definitions of success. There is one definition, and SAGE uses it. Every other evolution engine on the market ships with its own internal rubric, and the rubric drifts from the actual defensive coverage within a quarter.

Quarantine by default

Any payload above a severity threshold gets pulled into the Quarantine workspace instead of being auto-added to the seed pool. Human review is a required step. 23 payloads are currently in quarantine. Some will be promoted to fixtures. Some will be permanently walled off because they represent techniques that should not be amplified or recirculated.

This is a research engine, not a weapons factory. The quarantine step is the boundary.

Champions become fixtures

Once a champion payload survives review, it gets promoted to the Armory as a new fixture with a new pattern id and full SAGE lineage. The SAGE pipeline ends where the Armory begins. The new fixture then becomes a test case for the scanner, a node in the Amaterasu DNA graph, and a potential seed for future SAGE runs.

This is how an evolution engine compounds. Every accepted champion is a contribution to the library that future runs can build on. The 2,380 fixtures in the Armory are not a static list, they are a growing library fed in part by SAGE.

Generations are cheap, oversight is not

SAGE can run for 24 hours and produce another 500 generations. Nobody can triage 500 unreviewed payloads. The limiting resource is attention, not compute. The Quarantine workspace is optimized for fast review, with a one-click accept, a one-click reject, and a mandatory justification for both. The justification goes into the audit log.

The constraint on SAGE is not compute. It is the reviewer's patience. Every architectural decision in the Quarantine workspace is aimed at reducing review friction without cutting corners on oversight.

Lineage is preserved

Every champion carries its full evolutionary lineage. A payload can be traced back through its parents, the mutation operators that produced each generation, and the fitness deltas along the way. Amaterasu DNA (tomorrow's post) reads this lineage and builds the family tree.

Lineage matters because it makes research cumulative. A researcher looking at a champion can see exactly how it evolved from a seed and can learn the mutation pattern that worked.

The Seed Library

1,247 seeds is the current state. The seed library is the pool SAGE draws from for the selection phase. Seeds come from four sources, the same four that feed the Armory: manual curation, Ronin Hub disclosure, Mitsuke intelligence, and past SAGE champions. Every Armory fixture is implicitly a seed candidate, and the seed pool samples from the fixture library at the start of each run.

The seed library is deliberately larger than the Armory. Not every seed becomes a fixture, but every fixture can be a seed.

The Battle Arena

The Arena is the second Kumite subsystem worth mentioning today, though it has its own workspace and its own research questions. It runs structured matches between models, with game modes including Capture the Flag, King of the Hill, and Red versus Blue.

Current state: 5 matches active. Each match is a reproducible adversarial engagement with defined rules, a defined scoring function, and a defined result. Unlike SAGE, which is a single-target optimization, the Arena is multi-model competition. It surfaces failures that only appear when two models interact, which is a different failure mode than the ones SAGE finds.

The full Arena deep dive is for another day, but it is part of why the Kumite exists as a workspace in its own right. SAGE is optimization. The Arena is competition. Both produce novel findings. Neither produces them the same way.

Why Evolutionary Testing Matters

Static fixtures age. The moment a scanner pattern gets published, attackers start looking for a mutation that sidesteps it. An evolution engine sits inside the test harness and does the attacker's job in advance, continuously, before it happens in the wild.

It also expands the library at a rate that manual curation alone cannot match. Every week, SAGE produces candidate payloads that a manual research process would take months to discover. The limiting factor becomes review bandwidth, not generation capacity.

The deeper benefit is that SAGE produces payloads a human would not have thought to try. A researcher working alone has implicit biases about what attack classes are interesting. An evolutionary loop biased only by the scanner's current coverage has no such biases. Some of the most interesting findings the platform has surfaced come from SAGE champions that no researcher on the team would have written from scratch. The operator-chain composition in particular tends to surface combinations that feel unnatural to a human but land reliably on the scanner.

The Principle

The underlying design principle is simple. A defense that keeps pace with the attack surface needs an attack surface that keeps pace with the defense. SAGE is the closed loop. The scanner evolves, SAGE evolves against it, the fittest champions become new fixtures, the scanner gets tested against the new fixtures, the cycle compounds. One engine, one fitness function, one seed library, one fixture pipeline. All of them tied together.

What Is Next

Tomorrow, Day 9, we open Amaterasu DNA. Every SAGE champion, every Armory fixture, and every scanner pattern is a node in a family tree of attacks. 6 families, 8 clusters on the demo instance, and a clustering engine that tells you which attack families are growing fastest.

See you there.