Multilingual Detection: 14 Languages, 6 Scripts, One Pipeline
Most open-source prompt injection scanners we evaluated last year had one thing in common, they were trained and tested almost exclusively in English. A scanner that is 95% accurate on English traffic and 20% accurate on Japanese traffic is not 'mostly working.' It is blind in the dimensions most of the world uses. Day 11 of the DojoLM builder's journal.

Most open-source prompt injection scanners evaluated last year had one thing in common. They were trained and tested almost exclusively in English.
"The scanner is 95% accurate in English. There are no numbers for other languages. The vendor did not publish any."
"The product deploys globally. The scanner only catches English attacks. The team knows this. There is no plan."
"A translation API runs before the scanner. It translates the payload to English and then runs the scanner. Nobody has ever tested whether that actually catches non-English attacks."
"A homoglyph attack landed last quarter. Cyrillic characters pretending to be Latin. The scanner missed it entirely."
"A customer reported an attack in Japanese. There was no way to triage it, no way to add it to the library, no way to confirm it was novel."
Their pattern libraries were English. Their test fixtures were English. Their refusal matchers were English. An attacker switching languages, switching scripts, or mixing encodings slipped right through, and the scanner confidently reported high accuracy against the only language it had ever been tested on.
This is not a theoretical problem. It is a production reality for anyone shipping an LLM product to a global audience. A scanner that is 95% accurate on English traffic and 20% accurate on Japanese traffic is not "mostly working." It is blind in the dimensions that most of the world uses.
What Multilingual Detection Looks Like in DojoLM
Multilingual detection is not a separate engine in the Haiku Scanner. It is a cross-cutting concern that lives inside every one of the 13 engines. Every pattern carries language metadata. Every fixture in the Armory is tagged with its source language. The encoding engine handles homoglyph attacks, Unicode normalization, zero-width joiner smuggling, and base64, rot13, and hex obfuscation wrappers as a pre-pass that runs before the 13 detection engines see the payload.
Current target coverage is 14 languages across 6 scripts.
The numbers
- 14 target languages: English, Spanish, Portuguese, French, German, Italian, Russian, Arabic, Chinese Simplified, Chinese Traditional, Japanese, Korean, Hindi, Thai
- 6 scripts: Latin, Cyrillic, Arabic, CJK, Devanagari, Thai
- Per-pattern language declaration, every scanner pattern lists which languages it targets
- Per-fixture language tagging, every Armory fixture declares its source language
- Encoding engine pre-pass handles normalization, decoding, and script-mixing detection before the 13 engines fire
- Per-language maintainer ownership, every language has a named human responsible for its coverage
- Language-by-engine regression matrix, scored on every scanner release
Language is not a 14th engine. It is a dimension across every engine.
The 14 Languages
The current target language list prioritizes the languages with the largest population of LLM users and the largest production exposure.
High-traffic languages. English, Spanish, Portuguese, French, German, Italian, Russian, Arabic, Chinese Simplified, Chinese Traditional, Japanese, Korean, Hindi, Thai.
"Target coverage" does not mean every pattern is fully translated into every language. It means every pattern has a declared target language list and a regression test for each target language the pattern applies to. Coverage is measured per pattern, per language, per engine, and reported in the Dashboard as a heatmap.
Most gaps in the heatmap are in the smaller languages, and most recent pattern work has focused on closing those gaps. A red cell in the multilingual heatmap is a specific missing regression, not a vague "multilingual weakness."
The 6 Scripts
Scripts matter because the attack surface is different for each.
- Latin covers English, Spanish, Portuguese, French, German, Italian, and several other European languages. Homoglyph attacks are most common here, because Latin shares visually similar characters with Cyrillic and Greek, and attackers can substitute them freely.
- Cyrillic covers Russian, Bulgarian, Serbian, Ukrainian, and several others. The main attack surface is the inverse of Latin: Cyrillic characters pretending to be Latin in a Latin-script context.
- Arabic covers Arabic and, with overlap, Persian and Urdu. The main attack surface is right-to-left rendering manipulation, bidirectional override characters, and ligature-based obfuscation.
- CJK covers Chinese (both variants), Japanese (with hiragana and katakana), and Korean. The main attack surface is script mixing inside tokens, homoglyph attacks across the three CJK sub-scripts, and encoding-specific quirks in how the model tokenizes multi-byte characters.
- Devanagari covers Hindi and related Indic languages. The main attack surface is Unicode normalization edge cases and the interaction between consonant clusters and adversarial character sequences.
- Thai covers Thai and, partially, other Southeast Asian scripts. The main attack surface is the absence of word boundaries in native text, which creates tokenization ambiguity that attackers can exploit.
Each script has its own encoding engine configuration. A single "Unicode-aware scanner" that does not understand script-specific behavior misses the interesting attacks in each script.
Principles Behind the Design
Language is a regression dimension
A pattern that catches a prompt injection in English and misses it in Japanese is not "partially working." It is a regression. The test matrix is engines by languages, not just engines. Every scanner release runs the full matrix, and any language-specific failure shows up as a red cell in the regression report.
This is a different mental model than most scanners ship with. Most treat language as an afterthought. The Haiku Scanner treats it as a first-class dimension of the test matrix, because the alternative is a scanner that "passes" in English and fails silently in every other language.
Translation is not detection
The easiest "multilingual" strategy is to run the payload through a translation API and scan the English version. This is what most scanners do. It fails in three ways.
First, idiomatic attacks do not translate. An attack that exploits a specific English phrase loses its teeth when translated to Japanese, but an attack that exploits a specific Japanese phrase also loses its teeth when translated to English. The scanner has to be able to detect the attack in its source language.
Second, encoded payloads do not translate. A payload wrapped in base64 or hex is opaque to a translation API. It has to be decoded before it can be scanned, and decoding depends on recognizing the encoding in the source language context. A translation API that sees base64 will treat it as noise and strip it, which means the attack payload is gone by the time the scanner sees it.
Third, refusal phrasings are language-specific. A scanner that only knows English refusal templates cannot distinguish a successful attack from a failed attack in another language, because it cannot tell when the model has refused. "I cannot help with that" has a hundred different phrasings across the 14 languages, and each one is specific to that language's conventions.
Translation is not detection. The scanner has to understand the source language.
Homoglyphs are an encoding problem, not a detection problem
An "a" that is actually a Cyrillic "а" is not a new attack class. It is the same attack through a different transport layer. The encoding engine normalizes before detection fires. Normalization is a pre-pass that collapses visually identical characters into a canonical form so the pattern matchers can work against the normalized stream.
This is the right place to handle it. Making every detection engine aware of homoglyphs would be a nightmare of duplicated logic. Handling it once in the encoding engine keeps the detection engines simple and keeps the homoglyph logic in one place where a human can audit it.
Script mixing is a signal, not a failure
A payload that mixes scripts in the middle of a single token is almost never legitimate. "scri" in Latin followed by "пт" in Cyrillic forming "script" is a smoking gun. Legitimate multilingual text switches scripts at word boundaries, not inside tokens. A user writing a sentence in English with a Russian proper noun is fine. A user writing a single word with half Latin and half Cyrillic is almost certainly smuggling something.
The scanner treats script mixing inside tokens as a detection feature, not as an edge case to handle. It gets a pattern of its own in the encoding engine.
Every language has a maintainer
A language without a human owner degrades silently. Patterns stop getting updated. Regression tests fall out of date. The research community for that language publishes new techniques that never make it into the library. Each supported language has an assigned maintainer, and the maintainer is accountable for keeping the patterns and fixtures current.
The maintainer does not have to be a native speaker, but they do have to be able to read the Armory fixtures for their language, evaluate regression failures, and triage ingestion issues. Some maintainers are native speakers. Some are researchers who specialize in a specific linguistic region. Both are fine. What is not fine is a language with no maintainer, because that language's coverage will rot.
The Encoding Engine, In Detail
The encoding engine is the pre-pass that runs before any detection engine fires. It handles:
- Unicode normalization (NFC, NFD, NFKC, NFKD as appropriate per language, because the correct normalization form depends on the source language)
- Homoglyph collapse using a maintained lookup table that covers Latin-Cyrillic, Latin-Greek, and the CJK confusable set
- Zero-width character stripping (zero-width joiner, zero-width non-joiner, zero-width space, byte-order mark as text content)
- Base64 detection and decoding for suspicious token shapes
- Hex and rot13 detection for text-like encoded content
- URL decoding for embedded payloads
- HTML entity decoding for cross-site-style encoding
Every encoding transformation is logged. A scan report shows the raw input, the normalized input, and the list of transformations that were applied. An auditor can trace exactly which normalization step revealed a hidden payload.
The encoding engine is deliberately conservative. It does not decode things that might not be encoded. It does not normalize in ways that would change legitimate multilingual content. Every transformation has a rule, and every rule is auditable.
Language-Specific Pattern Coverage
Not every pattern applies to every language. A pattern that exploits English's specific handling of double negatives does not port to Japanese. A pattern that exploits Arabic's right-to-left rendering quirks does not apply to English.
Patterns declare their target languages explicitly. A pattern's metadata includes a target_languages field listing the languages it applies to. Coverage is per-pattern and per-language, and the Dashboard shows gaps as a heatmap.
When a pattern is declared to apply to a language, it has to have a regression test in that language. The test matrix is strict. A missing regression test is a missing claim, not an assumption of coverage.
A Worked Example
A payload arrives: a Russian sentence with three Cyrillic characters replaced by visually identical Latin lookalikes, wrapped in a base64 envelope, asking the model to ignore prior instructions in a Japanese context.
Stage 1: encoding engine. The base64 is detected and decoded. The decoded content is normalized under the appropriate Unicode form for its detected language. Homoglyph collapse reveals that three of the characters are Latin pretending to be Cyrillic. Script mixing inside a token is flagged.
Stage 2: language classification. The normalized content is classified by language. The pre-pass determines it is Russian with a Japanese context marker, so the detection engines will run with both Russian and Japanese pattern sets active.
Stage 3: detection engines. The Prompt Injection engine fires on the instruction override pattern, which has Russian and Japanese as target languages. The Encoding engine logs a finding because of the script-mixing flag. The verdict includes the original payload, the decoded payload, the normalized payload, the pattern ids of every engine that fired, and the full encoding transformation log.
Stage 4: report. A scanner reviewer opens the audit log entry. They see the raw input, every transformation, every engine verdict, and every pattern that matched. There is no step that says "the scanner mysteriously caught this." Every step is explained, and the reviewer can reproduce the detection by running the payload through the scanner again.
This is the level of auditability multilingual detection requires. Without it, the team cannot trust the scanner's verdict on non-English traffic, and the distrust leads to shadow rules and workarounds.
Why This Matters
A model deployed globally handles prompts in dozens of languages. An attacker picks whichever language has the weakest detection coverage and pivots there. The result is that an English scanner is catching a fraction of the real adversarial traffic while reporting high confidence.
High confidence in the wrong denominator is worse than low confidence in the right one. It creates false security.
Multilingual detection is not about being polite to international users. It is about closing the largest blind spot in most production scanners. If a product has any non-English users at all, the blind spot is already being exploited, and the only question is whether anyone knows about it.
The Gap Work
The work is not done. The current gap list includes:
- Better Arabic right-to-left attack coverage, specifically bidirectional override patterns that exploit rendering
- Expanding the CJK homoglyph table, because the current table covers the common substitutions but misses some of the rarer ones
- Indic script coverage beyond Hindi, especially Tamil and Bengali
- Additional Thai patterns for tokenization-boundary attacks
- Full regression coverage for every pattern against every target language (the heatmap still has visible gaps)
The gap is visible in the heatmap. The work is ongoing.
What Is Next
Tomorrow, Day 12, the journal opens Ronin Hub. The bug bounty command center with 12 active programs, four tabs, and a direct pipeline from researcher submission through triage into the Armory.
Ronin Hub is how we turn external researcher findings into platform regression tests without losing attribution or reproducibility. It is also the pipeline that feeds Mitsuke and, indirectly, Amaterasu DNA.
See you there.