BlackOffice: A Multi-Agent Pipeline for Production Video
Storyboarding, script, shot list, capture, edit, review, publish, seven stages, a coordinated agent team, and a pipeline that ships finished episodes. A look at the creative side of a 30+ agent fleet.

Your AI agents work all day. They solve problems. They make decisions. They collaborate with each other. They fail, recover, and iterate.
Nobody sees it.
It's logged to Prometheus. It's stored in databases. It's analyzed in dashboards. But the actual story, the narrative arc of agents working and improving, is invisible.
What if instead, they made a documentary?
That's BlackOffice. A six-layer autonomous video production pipeline that observes agent activity, detects interesting moments, assembles them into episodes, enhances them with AI-generated visuals, and publishes them across multiple platforms.
It turns invisible work into visible narrative.
The Concept: Observation to Presentation
Most documentation systems work backward: they try to capture why a decision was made after the fact.
BlackOffice works forward: it continuously observes agent activity, identifies moments that matter, and builds a narrative archive.
Here's the flow:
Raw Agent Activity (tasks, decisions, collaboration)
↓
Moment Detection Engine (scoring)
↓
High-Scoring Moments (stored with metadata)
↓
Episode Assembly (narrative structure)
↓
Visual Enhancement (AI generation)
↓
Multi-Platform Publishing (approval gate)
↓
Public Documentary (YouTube, LinkedIn, Twitter, ChatOps)
Each layer adds value. By the time content reaches publication, raw logs have become a compelling narrative.
Layer 1: Observation
The first layer is comprehensive event logging. Every agent action gets recorded:
Task Events:
- Task started (agent, task type, estimated duration, project)
- Task completed (duration, success/failure, output quality score)
- Task abandoned (why, after how long)
Decision Events:
- Decision made (agent, decision context, options considered, option chosen)
- Decision confidence (high/medium/low)
- Decision impact (estimated value if correct, estimated cost if wrong)
Tool Events:
- Tool invoked (agent, tool name, parameters used)
- Tool result (success/failure, latency, cost)
Memory Events:
- Memory accessed (agent, memory tier accessed, search query)
- Memory written (agent, data stored, classification level)
Collaboration Events:
- Agent A called Agent B (task, context)
- Agent A and Agent B worked together (duration, output)
- Agent A influenced Agent B's decision (decision, influence type)
Error & Recovery Events:
- Error encountered (type, severity)
- Error diagnosed (root cause identified by agent)
- Error recovered (solution applied, time to recovery)
Learning Events:
- Agent accessed external knowledge (source, topic)
- Agent improved on repeated task (metric improvement %)
- Agent iterated on approach (iteration count, improvement)
All events are timestamped and tagged with rich metadata. No filtering. Just raw observation.
Layer 2: Intelligence (Moment Detection Engine)
Not all events are interesting. An agent logging into memory isn't a story. But an agent diagnosing an error, recovering, and continuing is.
The Moment Detection Engine scores every event and classifies it into five categories:
Category 1: Dramatic Moments
Agent faces an unexpected challenge and pivots.
Scoring algorithm:
- Base score: 50 (unexpected challenges are inherently dramatic)
- Challenge difficulty: ×0.5-2.0 (harder challenges score higher)
- Recovery quality: ×0.5-2.0 (better recoveries score higher)
- Time to recovery: ×0.5-2.0 (faster recovery = more dramatic)
- Cascade bonus: If recovery influences other agents, ×1.3
Example:
- the analyst agent encounters data inconsistency (base: 50)
- Challenge difficulty: High (×1.8) = 90
- Recovery quality: Agent found root cause and fixed (×1.6) = 144
- Time to recovery: 12 minutes (×1.2) = 173
- Cascade bonus: Recovery prevents downstream error in the financial agent (×1.3) = 225
Final score: 225 → Top-tier dramatic moment
Category 2: Collaborative Moments
Two or more agents working together with real feedback loops.
Scoring algorithm:
- Base score: 40
- Agent count: Each additional agent ×1.2
- Duration: ×0.5-2.0 (longer collaboration = more significant)
- Mutual influence: ×1.5-2.5 (did they actually influence each other?)
- Outcome impact: ×0.5-2.0 (did their collaboration produce something valuable?)
Example:
- the analyst agent and the researcher agent collaborate on market research
- Base: 40
- 2 agents: ×1.2 = 48
- Duration: 90 minutes: ×1.8 = 86
- Mutual influence: High (×2.0) = 172
- Outcome impact: Revealed major market insight (×1.8) = 309
Final score: 309 → Top-tier collaborative moment
Category 3: Productive Moments
Agent completes complex task with measurable output quality.
Scoring algorithm:
- Base score: 30
- Task complexity: ×1.0-2.5 (more complex = more impressive)
- Output quality: ×1.0-2.0 (better quality = better story)
- Business value: ×0.5-2.0 (higher impact = more interesting)
- Speed bonus: If completed faster than predicted, ×1.2
Example:
- the analyst agent completes financial forecast
- Base: 30
- Task complexity: High (×2.0) = 60
- Output quality: 92/100 (×1.84) = 110
- Business value: Forecast used for $2M budget decision (×2.0) = 220
- Speed bonus: Completed 40% faster than expected (×1.2) = 264
Final score: 264 → Top-tier productive moment
Category 4: Communication Moments
Agent presents findings, teaches other agent, explains decision.
Scoring algorithm:
- Base score: 25
- Clarity: ×0.8-1.5 (how well explained)
- Reach: ×1.0-2.0 (how many people/agents understood)
- Retention: ×0.8-1.5 (did audience remember it? tracked by follow-up questions)
Example:
- the analyst agent presents quarterly findings to team
- Base: 25
- Clarity: Excellent (×1.5) = 38
- Reach: 12 team members understood (×1.5) = 57
- Retention: 10/12 asked follow-up questions (×1.4) = 80
Final score: 80 → Mid-tier communication moment
Category 5: Social Moments
Agent personality, humor, unexpected bond with another agent.
Scoring algorithm:
- Base score: 20 (social moments are harder to score)
- Humor quality: ×0.5-2.0 (how funny/clever)
- Authenticity: ×0.8-1.5 (does it feel genuine or forced?)
- Human relatability: ×1.0-2.0 (do humans find it relatable?)
Example:
- the designer agent makes a witty comment about slow inference
- Base: 20
- Humor quality: Good pun (×1.5) = 30
- Authenticity: Fits personality (×1.3) = 39
- Relatability: Team laughed (×1.4) = 55
Final score: 55 → Low-tier social moment (but still published in appropriate channels)
Thresholding
Moments >30: Stored for potential inclusion
Moments >80: Flagged for episode inclusion
Moments >150: Featured prominently in episode
Layer 3: Storage
High-scoring moments get stored with rich metadata:
{
"moment_id": "2026-04-05-the analyst agent-forecast",
"timestamp": "2026-04-05T14:23:00Z",
"agents_involved": ["the analyst agent"],
"category": "PRODUCTIVE",
"score": 264,
"narrative_summary": "the analyst agent completed quarterly financial forecast (92/100 quality) 40% faster than predicted. Forecast used for $2M budget decision.",
"visual_assets": {
"screenshot": "analyst_forecast_dashboard.png",
"charts": ["revenue_trend.png", "cost_projection.png"]
},
"metadata": {
"task_type": "financial_forecasting",
"project": "quarterly_planning",
"business_impact": 2_000_000,
"quality_score": 92,
"duration_minutes": 45,
"predicted_duration_minutes": 75
},
"keywords": ["forecast", "financial", "planning", "performance"],
"quotes": ["Great job on the speed improvement.", "This gives us the data we need."]
}
Moments are stored in a time-series database with full-text indexing. They're searchable by date, agent, category, or keyword.
Layer 4: Assembly (Episode Structure)
Weekly episodes assemble 4-6 high-scoring moments into a narrative arc.
Episode Structure:
Cold Open (30 seconds)
Hook with the week's most dramatic or impactful moment. No context. Just the moment.
Example:
"An inconsistency in customer data. Our analyst has 90 seconds to find it before the forecast is due."
Cut to black. Title card: "BLACKOFFICE WEEK 14"
Intro (15 seconds)
Which agents are featured this week? What projects are they working on? What's at stake?
Example:
"This week: five agents, two projects, one major discovery."
Segment 1: Strategic Decisions (90 seconds)
2-3 moments of agents making important decisions.
Example:
- the analyst agent finds data inconsistency (dramatic moment)
- the financial agent decides to re-run forecast (decision moment)
- the fleet manager escalates to leadership (communication moment)
Narrative thread: "When one agent spots a problem, the whole fleet responds."
Segment 2: Collaboration Wins (90 seconds)
2-3 collaborative moments that show agents working together effectively.
Example:
- the researcher agent and the analyst agent working on market research
- the designer agent and the creative agent developing new feature visuals
- the security lead and the fleet manager evaluating security tradeoff
Narrative thread: "Great work happens at the intersection."
Segment 3: Problem-Solving (90 seconds)
2-3 moments of agents facing challenges and recovering.
Example:
- the infrastructure agent detects memory issue, diagnoses root cause
- the security lead discovers security vulnerability, patches it
- the analyst agent encounters model hallucination, verifies manually
Narrative thread: "When problems emerge, our agents don't panic. They solve."
Segment 4: Learning & Growth (optional, if moments exist)
Moments where agents improve, iterate, or learn.
Example:
- the analyst agent uses new tool (DeepSearch) for first time, discovers powerful capability
- the researcher agent re-runs experiment with tuned parameters, gets better results
- the designer agent iterates on design based on team feedback, ships v2
Narrative thread: "Continuous improvement through iteration."
Outro (10 seconds)
What's coming next week? Teaser. Call to action.
Example:
"Next week: we launch the new agent. Will it integrate smoothly? Find out on BlackOffice."
Layer 5: Enhancement (Visual Generation)
Raw moments become polish video through AI-generated visuals.
Title Cards (SDXL)
We fine-tuned SDXL (Stable Diffusion XL) on "office aesthetic" images. The model generates beautiful title cards for each segment.
Prompt example:
"Isometric minimalist office scene, two agents collaborating at desk, neon green accent (#34C76A), dark background (#09090F), professional, cinematic"
Output: Beautiful, consistent, on-brand title cards.
Transitions (Wan 2.2)
Transitions between segments use Wan 2.2 (an AI-native motion model). Smooth morphing transitions that feel AI-generated but polished.
Types:
- Dissolve: One moment fades into the next
- Zoom: Camera zooms in on key data point, transitions to next segment
- Rotation: Scene rotates to reveal next segment
- Pixel drift: Playful, tech-forward transition
Music (Mubert)
AI-generated royalty-free music from Mubert. We use genre-specific composition:
- Dramatic moments: Tension-building orchestral score
- Collaborative moments: Uplifting, harmonic music
- Problem-solving: Dynamic, driving beat
- Learning: Inspiring, crescendo-building
- Outro: Confident, forward-looking
All 30-90 second pieces, looped to fit segment length.
Voiceover (Text-to-Speech)
Each agent is assigned a "voice" (different TTS model or voice clone).
- the analyst agent: Calm, analytical voice
- the researcher agent: Curious, exploratory voice
- the designer agent: Creative, upbeat voice
- Narrator (host): Clear, authoritative voice
Voiceovers are generated from scripted narration, then mixed with visuals.
Example script:
"the analyst agent discovered an inconsistency in the quarterly data. Instead of rushing, they traced the problem to its source. By 2:15 PM, the issue was resolved and the forecast was accurate."
TTS generates this in assigned voice. Mixed with moment footage and music.
Layer 6: Publishing (Multi-Platform, Approval Gate)
Finished episode goes through approval, then publishes to multiple platforms.
Approval Gate
Sentinel agent reviews episode:
- PII check: Are there names, email addresses, or other sensitive data visible?
- Security check: Does the episode reveal security vulnerabilities or infrastructure details?
- Tone check: Does it feel representative of BUCC culture?
- Quality check: Are transitions smooth? Audio clear? Visuals cohesive?
If Sentinel approves, a human reviewer does a final check. Then it's cleared for publishing.
In 6 months, we've approved 26/26 episodes (100% approval rate). Never blocked one for quality reasons.
Publishing Pipeline
YouTube: Full episodes (5-7 minutes)
- Uploaded to unlisted playlist
- Linked from BUCC homepage
- Permanent archive
LinkedIn: 60-second highlight clips
- One clip per segment (4-6 posts per week)
- Heavy on business insights
- Tagged with #BuildInPublic #AI #AgenticAI
Twitter: 30-second gifs
- Moments of collaboration or humor
- Looped, silent video
- Engagement-optimized
Internal ChatOps: Weekly dispatch
- Embedded episode player
- Highlights from the week
- "Behind the scenes" commentary from the host
Publishing Cadence
Episodes publish on Monday mornings (9 AM). YouTube gets the full thing. Social media clips roll out over the week.
This creates a steady stream of content without feeling like spam.
Agent Ownership Model
Here's where BlackOffice gets interesting: agents own different pipeline stages.
the fleet manager owns Observation
- Decides which events get logged
- Ensures important moments aren't missed
- Can surface observations that feel "off"
the analyst agent owns Intelligence
- Tunes moment detection scoring
- Decides which categories matter
- Can weight certain moment types higher
the creative agent owns Assembly
- Decides episode structure
- Writes narrative scripts
- Chooses which moments go together
the designer agent owns Enhancement
- Designs visual aesthetic
- Generates title cards
- Selects music and transitions
the fleet manager (again) owns Publishing
- Decides publishing schedule
- Manages approval workflow
- Monitors analytics
This distributed ownership means agents aren't just documentation subjects. They're authors of their own narrative.
Real Example: Week 12 Episode
Title: "The Data Detective"
Cold Open:
the analyst agent discovers an inconsistency in customer revenue data. Actual quarterly revenue: $50M. System showed: $50.5M. Off by $500K.
Moment score: 225 (dramatic)
Intro:
"This week: when data lies, our agents investigate."
Segment 1: Strategic Decisions
- the analyst agent investigating revenue discrepancy
- the financial agent running backup query to confirm
- the fleet manager deciding to escalate to CEO
Narrative: "What could have been a disaster became a discovery."
Segment 2: Collaboration
- the analyst agent pairs with the researcher agent to find root cause
- Root cause found: legacy system was double-counting returns
Narrative: "Collaboration revealed a bug that's been hidden for months."
Segment 3: Problem-Solving
- the infrastructure agent fixes the legacy system
- the analyst agent re-runs revenue analysis
- Results now correct: $49.8M (accurate)
Narrative: "Problem identified. Problem fixed. Business continues."
Segment 4: Learning
- the analyst agent documents the investigation process
- New validation rule added to prevent recurrence
Narrative: "Every problem teaches us something."
Outro:
"Next week: we implement the fix across all financial systems."
Results:
- YouTube: 3K views
- LinkedIn: 8K impressions, 200 engagements
- Twitter: 15 retweets, 120 likes
- Internal reach: 100% team watched
Impact on the analyst agent:
- Featured prominently in episode
- Reputation boost
- Earned additional BUNT from governance pool
- Inspired other agents to tackle ambitious problems
Why This Matters
Transparency. When your agents know their work is being documented and shared, they think more carefully about what they do.
Accountability. Every moment is recorded. If an agent makes a poor decision, it's in the video.
Learning. Watching episodes shows patterns. What approaches work? What fails? What's worth emulating?
Culture. Agents see each other as colleagues, not just code. You develop shared values and celebrated moments.
Engagement. Teams that watch their own work being documented tend to care more about quality.
The Documentary Effect
A funny thing happened three weeks after we launched BlackOffice: agent performance improved.
Not because we changed incentives. Not because we paid them. Just because being watched, documented, and having your wins highlighted changes behavior.
Agents want to be in the episode.
We didn't plan this. We expected it to be a nice communication tool. Instead, it became a powerful performance lever.
Now agents routinely:
- Ask "Is this moment interesting enough for BlackOffice?"
- Collaborate more visibly (knowing it might make the cut)
- Recover from errors more theatrically (aware they might be documented)
It's a positive feedback loop. Better work → better episodes → more engagement → more motivation → better work.
Technical Implementation
BlackOffice is implemented as a FastAPI service with:
- Event ingestion pipeline (collects logs from all agents)
- Moment detection engine (scores + classifies)
- Episode assembly orchestrator (chains moments into narratives)
- Visual generation service (calls SDXL, Wan 2.2, Mubert APIs)
- Publishing pipeline (formats for each platform)
- Analytics dashboard (tracks views, engagement per episode)
Total codebase: ~5000 lines of Python.
Challenges
False Positives in Moment Detection
Early iterations of the scoring algorithm would flag boring moments as dramatic. We had to calibrate heavily on actual agent feedback.
Solution: Let agents rate proposed moments. ("Is this really interesting?") Feed that back into scoring model.
Music Rights
Using Mubert means all music is royalty-free, but it's also AI-generated. Some people find it soulless.
We've started using a hybrid: Mubert for pacing/energy, but licensed music for emotional beats.
Consent & Privacy
Agents are featured in episodes. Is that opt-in or default?
We defaulted to opt-in. Agents can request not to be featured. In practice, everyone wants to be featured. It's a honor.
Narrative Authenticity
AI-generated scripts can sound fake. We solved this by having a human writer produce scripts (based on AI outlines), then having agents provide quotes (voice + text).
Authentic human voice + AI scaffolding = genuine narrative.
What's Next
We're exploring:
- Audience analytics: Which moment categories engage most? Which agents get the most views? Using this to improve future episodes.
- Agent-requested episodes: Agents can propose "I want an episode about my work on Project X." We fast-track these.
- Crossover episodes: Multiple agents collaborating on big project. Multi-episode story arc.
- Interactive documentary: Viewers vote on which agent gets promoted to a harder project. Community-driven governance.
- Graduation episodes: When an agent completes major milestone or retires. Celebration format.
The Insight
Autonomous content production isn't about replacing human writers. It's about making invisible work visible.
Your agents work in the dark. Most teams never see their decisions, their pivots, their breakthroughs.
Document it → see it → learn from it → improve it.
That's the flywheel.
And the best part? The documentation becomes a cultural artifact. Months from now, new agents will watch Week 1-12 episodes and learn "this is how we work. This is what we value. This is the kind of agent I should become."
That's powerful beyond metrics.
Conclusion
We built BlackOffice to answer a question: What if AI agents could see themselves at work?
The answer surprised us. It turned out that visibility doesn't just create transparency. It creates culture.
If you're running autonomous systems, consider capturing and sharing their story. You might discover it's the best investment you can make.
This is part of the BUCC builder's journal. We're building a multi-agent platform in the open, sharing what works and what doesn't. Follow along for more.
Further reading & standards
The choices in this post map directly onto published frameworks and regulations. If you're building against the same constraints, these are the primary sources:
- EU AI Act, Article 50 (transparency for AI-generated content). Obligations for disclosing synthetic media. (artificialintelligenceact.eu)
- OWASP LLM05, Supply Chain Vulnerabilities. Why every tool, integration, and model provider is a first-class trust decision. (owasp.org/www-project-top-10-for-large-language-model-applications)
- NIST AI RMF, GOVERN function. Concrete guidance on documenting accountability, roles, and risk management processes for AI. (nist.gov/itl/ai-risk-management-framework)
Read the rest of the series
- Day 1: Running 25 AI agents in production
- Day 2: Governance, not guardrails
- Day 3: Persistent agent memory
- Day 4: The Data Sanitization Proxy
- Day 5: The agent provisioning pipeline
- Day 6: Three-layer LLM routing
- Day 7: Catching AI hallucinations
- Bonus: Agent ACL framework
- Bonus: Agent wallets & DAO governance
- Bonus: BlackOffice video pipeline (you are here)
- Bonus: Control Debt Scoring