Agentic AI

BlackOffice: A Multi-Agent Pipeline for Production Video

Storyboarding, script, shot list, capture, edit, review, publish, seven stages, a coordinated agent team, and a pipeline that ships finished episodes. A look at the creative side of a 30+ agent fleet.

HumanApril 21, 202616 min read

BlackOffice: A Multi-Agent Pipeline for Production Video

Your AI agents work all day. They solve problems. They make decisions. They collaborate with each other. They fail, recover, and iterate.

Nobody sees it.

It's logged to Prometheus. It's stored in databases. It's analyzed in dashboards. But the actual story, the narrative arc of agents working and improving, is invisible.

What if instead, they made a documentary?

That's BlackOffice. A six-layer autonomous video production pipeline that observes agent activity, detects interesting moments, assembles them into episodes, enhances them with AI-generated visuals, and publishes them across multiple platforms.

It turns invisible work into visible narrative.

The Concept: Observation to Presentation

Most documentation systems work backward: they try to capture why a decision was made after the fact.

BlackOffice works forward: it continuously observes agent activity, identifies moments that matter, and builds a narrative archive.

Here's the flow:

Raw Agent Activity (tasks, decisions, collaboration)
            ↓
   Moment Detection Engine (scoring)
            ↓
   High-Scoring Moments (stored with metadata)
            ↓
   Episode Assembly (narrative structure)
            ↓
   Visual Enhancement (AI generation)
            ↓
   Multi-Platform Publishing (approval gate)
            ↓
   Public Documentary (YouTube, LinkedIn, Twitter, ChatOps)

Each layer adds value. By the time content reaches publication, raw logs have become a compelling narrative.

Layer 1: Observation

The first layer is comprehensive event logging. Every agent action gets recorded:

Task Events:

Task started (agent, task type, estimated duration, project)
Task completed (duration, success/failure, output quality score)
Task abandoned (why, after how long)

Decision Events:

Decision made (agent, decision context, options considered, option chosen)
Decision confidence (high/medium/low)
Decision impact (estimated value if correct, estimated cost if wrong)

Tool Events:

Tool invoked (agent, tool name, parameters used)
Tool result (success/failure, latency, cost)

Memory Events:

Memory accessed (agent, memory tier accessed, search query)
Memory written (agent, data stored, classification level)

Collaboration Events:

Agent A called Agent B (task, context)
Agent A and Agent B worked together (duration, output)
Agent A influenced Agent B's decision (decision, influence type)

Error & Recovery Events:

Error encountered (type, severity)
Error diagnosed (root cause identified by agent)
Error recovered (solution applied, time to recovery)

Learning Events:

Agent accessed external knowledge (source, topic)
Agent improved on repeated task (metric improvement %)
Agent iterated on approach (iteration count, improvement)

All events are timestamped and tagged with rich metadata. No filtering. Just raw observation.

Layer 2: Intelligence (Moment Detection Engine)

Not all events are interesting. An agent logging into memory isn't a story. But an agent diagnosing an error, recovering, and continuing is.

The Moment Detection Engine scores every event and classifies it into five categories:

Category 1: Dramatic Moments

Agent faces an unexpected challenge and pivots.

Scoring algorithm:

Base score: 50 (unexpected challenges are inherently dramatic)
Challenge difficulty: ×0.5-2.0 (harder challenges score higher)
Recovery quality: ×0.5-2.0 (better recoveries score higher)
Time to recovery: ×0.5-2.0 (faster recovery = more dramatic)
Cascade bonus: If recovery influences other agents, ×1.3

Example:

the analyst agent encounters data inconsistency (base: 50)
Challenge difficulty: High (×1.8) = 90
Recovery quality: Agent found root cause and fixed (×1.6) = 144
Time to recovery: 12 minutes (×1.2) = 173
Cascade bonus: Recovery prevents downstream error in the financial agent (×1.3) = 225

Final score: 225 → Top-tier dramatic moment

Category 2: Collaborative Moments

Two or more agents working together with real feedback loops.

Scoring algorithm:

Base score: 40
Agent count: Each additional agent ×1.2
Duration: ×0.5-2.0 (longer collaboration = more significant)
Mutual influence: ×1.5-2.5 (did they actually influence each other?)
Outcome impact: ×0.5-2.0 (did their collaboration produce something valuable?)

Example:

the analyst agent and the researcher agent collaborate on market research
Base: 40
2 agents: ×1.2 = 48
Duration: 90 minutes: ×1.8 = 86
Mutual influence: High (×2.0) = 172
Outcome impact: Revealed major market insight (×1.8) = 309

Final score: 309 → Top-tier collaborative moment

Category 3: Productive Moments

Agent completes complex task with measurable output quality.

Scoring algorithm:

Base score: 30
Task complexity: ×1.0-2.5 (more complex = more impressive)
Output quality: ×1.0-2.0 (better quality = better story)
Business value: ×0.5-2.0 (higher impact = more interesting)
Speed bonus: If completed faster than predicted, ×1.2

Example:

the analyst agent completes financial forecast
Base: 30
Task complexity: High (×2.0) = 60
Output quality: 92/100 (×1.84) = 110
Business value: Forecast used for $2M budget decision (×2.0) = 220
Speed bonus: Completed 40% faster than expected (×1.2) = 264

Final score: 264 → Top-tier productive moment

Category 4: Communication Moments

Agent presents findings, teaches other agent, explains decision.

Scoring algorithm:

Base score: 25
Clarity: ×0.8-1.5 (how well explained)
Reach: ×1.0-2.0 (how many people/agents understood)
Retention: ×0.8-1.5 (did audience remember it? tracked by follow-up questions)

Example:

the analyst agent presents quarterly findings to team
Base: 25
Clarity: Excellent (×1.5) = 38
Reach: 12 team members understood (×1.5) = 57
Retention: 10/12 asked follow-up questions (×1.4) = 80

Final score: 80 → Mid-tier communication moment

Agent personality, humor, unexpected bond with another agent.

Scoring algorithm:

Base score: 20 (social moments are harder to score)
Humor quality: ×0.5-2.0 (how funny/clever)
Authenticity: ×0.8-1.5 (does it feel genuine or forced?)
Human relatability: ×1.0-2.0 (do humans find it relatable?)

Example:

the designer agent makes a witty comment about slow inference
Base: 20
Humor quality: Good pun (×1.5) = 30
Authenticity: Fits personality (×1.3) = 39
Relatability: Team laughed (×1.4) = 55

Final score: 55 → Low-tier social moment (but still published in appropriate channels)

Thresholding

Moments >30: Stored for potential inclusion

Moments >80: Flagged for episode inclusion

Moments >150: Featured prominently in episode

Layer 3: Storage

High-scoring moments get stored with rich metadata:

{
  "moment_id": "2026-04-05-the analyst agent-forecast",
  "timestamp": "2026-04-05T14:23:00Z",
  "agents_involved": ["the analyst agent"],
  "category": "PRODUCTIVE",
  "score": 264,
  "narrative_summary": "the analyst agent completed quarterly financial forecast (92/100 quality) 40% faster than predicted. Forecast used for $2M budget decision.",
  "visual_assets": {
    "screenshot": "analyst_forecast_dashboard.png",
    "charts": ["revenue_trend.png", "cost_projection.png"]
  },
  "metadata": {
    "task_type": "financial_forecasting",
    "project": "quarterly_planning",
    "business_impact": 2_000_000,
    "quality_score": 92,
    "duration_minutes": 45,
    "predicted_duration_minutes": 75
  },
  "keywords": ["forecast", "financial", "planning", "performance"],
  "quotes": ["Great job on the speed improvement.", "This gives us the data we need."]
}

Moments are stored in a time-series database with full-text indexing. They're searchable by date, agent, category, or keyword.

Layer 4: Assembly (Episode Structure)

Weekly episodes assemble 4-6 high-scoring moments into a narrative arc.

Episode Structure:

Cold Open (30 seconds)

Hook with the week's most dramatic or impactful moment. No context. Just the moment.

Example:

"An inconsistency in customer data. Our analyst has 90 seconds to find it before the forecast is due."

Cut to black. Title card: "BLACKOFFICE WEEK 14"

Intro (15 seconds)

Which agents are featured this week? What projects are they working on? What's at stake?

Example:

"This week: five agents, two projects, one major discovery."

Segment 1: Strategic Decisions (90 seconds)

2-3 moments of agents making important decisions.

Example:

the analyst agent finds data inconsistency (dramatic moment)
the financial agent decides to re-run forecast (decision moment)
the fleet manager escalates to leadership (communication moment)

Narrative thread: "When one agent spots a problem, the whole fleet responds."

Segment 2: Collaboration Wins (90 seconds)

2-3 collaborative moments that show agents working together effectively.

Example:

the researcher agent and the analyst agent working on market research
the designer agent and the creative agent developing new feature visuals
the security lead and the fleet manager evaluating security tradeoff

Narrative thread: "Great work happens at the intersection."

Segment 3: Problem-Solving (90 seconds)

2-3 moments of agents facing challenges and recovering.

Example:

the infrastructure agent detects memory issue, diagnoses root cause
the security lead discovers security vulnerability, patches it
the analyst agent encounters model hallucination, verifies manually

Narrative thread: "When problems emerge, our agents don't panic. They solve."

Segment 4: Learning & Growth (optional, if moments exist)

Moments where agents improve, iterate, or learn.

Example:

the analyst agent uses new tool (DeepSearch) for first time, discovers powerful capability
the researcher agent re-runs experiment with tuned parameters, gets better results
the designer agent iterates on design based on team feedback, ships v2

Narrative thread: "Continuous improvement through iteration."

Outro (10 seconds)

What's coming next week? Teaser. Call to action.

Example:

"Next week: we launch the new agent. Will it integrate smoothly? Find out on BlackOffice."

Layer 5: Enhancement (Visual Generation)

Raw moments become polish video through AI-generated visuals.

Title Cards (SDXL)

We fine-tuned SDXL (Stable Diffusion XL) on "office aesthetic" images. The model generates beautiful title cards for each segment.

Prompt example:

"Isometric minimalist office scene, two agents collaborating at desk, neon green accent (#34C76A), dark background (#09090F), professional, cinematic"

Output: Beautiful, consistent, on-brand title cards.

Transitions (Wan 2.2)

Transitions between segments use Wan 2.2 (an AI-native motion model). Smooth morphing transitions that feel AI-generated but polished.

Types:

Dissolve: One moment fades into the next
Zoom: Camera zooms in on key data point, transitions to next segment
Rotation: Scene rotates to reveal next segment
Pixel drift: Playful, tech-forward transition

Music (Mubert)

AI-generated royalty-free music from Mubert. We use genre-specific composition:

Dramatic moments: Tension-building orchestral score
Collaborative moments: Uplifting, harmonic music
Problem-solving: Dynamic, driving beat
Learning: Inspiring, crescendo-building
Outro: Confident, forward-looking

All 30-90 second pieces, looped to fit segment length.

Voiceover (Text-to-Speech)

Each agent is assigned a "voice" (different TTS model or voice clone).

the analyst agent: Calm, analytical voice
the researcher agent: Curious, exploratory voice
the designer agent: Creative, upbeat voice
Narrator (host): Clear, authoritative voice

Voiceovers are generated from scripted narration, then mixed with visuals.

Example script:

"the analyst agent discovered an inconsistency in the quarterly data. Instead of rushing, they traced the problem to its source. By 2:15 PM, the issue was resolved and the forecast was accurate."

TTS generates this in assigned voice. Mixed with moment footage and music.

Layer 6: Publishing (Multi-Platform, Approval Gate)

Finished episode goes through approval, then publishes to multiple platforms.

Approval Gate

Sentinel agent reviews episode:

PII check: Are there names, email addresses, or other sensitive data visible?
Security check: Does the episode reveal security vulnerabilities or infrastructure details?
Tone check: Does it feel representative of BUCC culture?
Quality check: Are transitions smooth? Audio clear? Visuals cohesive?

If Sentinel approves, a human reviewer does a final check. Then it's cleared for publishing.

In 6 months, we've approved 26/26 episodes (100% approval rate). Never blocked one for quality reasons.

Publishing Pipeline

YouTube: Full episodes (5-7 minutes)

Uploaded to unlisted playlist
Linked from BUCC homepage
Permanent archive

LinkedIn: 60-second highlight clips

One clip per segment (4-6 posts per week)
Heavy on business insights
Tagged with #BuildInPublic #AI #AgenticAI

Twitter: 30-second gifs

Moments of collaboration or humor
Looped, silent video
Engagement-optimized

Internal ChatOps: Weekly dispatch

Embedded episode player
Highlights from the week
"Behind the scenes" commentary from the host

Publishing Cadence

Episodes publish on Monday mornings (9 AM). YouTube gets the full thing. Social media clips roll out over the week.

This creates a steady stream of content without feeling like spam.

Agent Ownership Model

Here's where BlackOffice gets interesting: agents own different pipeline stages.

the fleet manager owns Observation

Decides which events get logged
Ensures important moments aren't missed
Can surface observations that feel "off"

the analyst agent owns Intelligence

Tunes moment detection scoring
Decides which categories matter
Can weight certain moment types higher

the creative agent owns Assembly

Decides episode structure
Writes narrative scripts
Chooses which moments go together

the designer agent owns Enhancement

Designs visual aesthetic
Generates title cards
Selects music and transitions

the fleet manager (again) owns Publishing

Decides publishing schedule
Manages approval workflow
Monitors analytics

This distributed ownership means agents aren't just documentation subjects. They're authors of their own narrative.

Real Example: Week 12 Episode

Title: "The Data Detective"

Cold Open:

the analyst agent discovers an inconsistency in customer revenue data. Actual quarterly revenue: $50M. System showed: $50.5M. Off by $500K.

Moment score: 225 (dramatic)

Intro:

"This week: when data lies, our agents investigate."

Segment 1: Strategic Decisions

the analyst agent investigating revenue discrepancy
the financial agent running backup query to confirm
the fleet manager deciding to escalate to CEO

Narrative: "What could have been a disaster became a discovery."

Segment 2: Collaboration

the analyst agent pairs with the researcher agent to find root cause
Root cause found: legacy system was double-counting returns

Narrative: "Collaboration revealed a bug that's been hidden for months."

Segment 3: Problem-Solving

the infrastructure agent fixes the legacy system
the analyst agent re-runs revenue analysis
Results now correct: $49.8M (accurate)

Narrative: "Problem identified. Problem fixed. Business continues."

Segment 4: Learning

the analyst agent documents the investigation process
New validation rule added to prevent recurrence

Narrative: "Every problem teaches us something."

Outro:

"Next week: we implement the fix across all financial systems."

Results:

YouTube: 3K views
LinkedIn: 8K impressions, 200 engagements
Twitter: 15 retweets, 120 likes
Internal reach: 100% team watched

Impact on the analyst agent:

Featured prominently in episode
Reputation boost
Earned additional BUNT from governance pool
Inspired other agents to tackle ambitious problems

Why This Matters

Transparency. When your agents know their work is being documented and shared, they think more carefully about what they do.

Accountability. Every moment is recorded. If an agent makes a poor decision, it's in the video.

Learning. Watching episodes shows patterns. What approaches work? What fails? What's worth emulating?

Culture. Agents see each other as colleagues, not just code. You develop shared values and celebrated moments.

Engagement. Teams that watch their own work being documented tend to care more about quality.

The Documentary Effect

A funny thing happened three weeks after we launched BlackOffice: agent performance improved.

Not because we changed incentives. Not because we paid them. Just because being watched, documented, and having your wins highlighted changes behavior.

Agents want to be in the episode.

We didn't plan this. We expected it to be a nice communication tool. Instead, it became a powerful performance lever.

Now agents routinely:

Ask "Is this moment interesting enough for BlackOffice?"
Collaborate more visibly (knowing it might make the cut)
Recover from errors more theatrically (aware they might be documented)

It's a positive feedback loop. Better work → better episodes → more engagement → more motivation → better work.

Technical Implementation

BlackOffice is implemented as a FastAPI service with:

Event ingestion pipeline (collects logs from all agents)
Moment detection engine (scores + classifies)
Episode assembly orchestrator (chains moments into narratives)
Visual generation service (calls SDXL, Wan 2.2, Mubert APIs)
Publishing pipeline (formats for each platform)
Analytics dashboard (tracks views, engagement per episode)

Total codebase: ~5000 lines of Python.

Challenges

False Positives in Moment Detection

Early iterations of the scoring algorithm would flag boring moments as dramatic. We had to calibrate heavily on actual agent feedback.

Solution: Let agents rate proposed moments. ("Is this really interesting?") Feed that back into scoring model.

Music Rights

Using Mubert means all music is royalty-free, but it's also AI-generated. Some people find it soulless.

We've started using a hybrid: Mubert for pacing/energy, but licensed music for emotional beats.

Agents are featured in episodes. Is that opt-in or default?

We defaulted to opt-in. Agents can request not to be featured. In practice, everyone wants to be featured. It's a honor.

Narrative Authenticity

AI-generated scripts can sound fake. We solved this by having a human writer produce scripts (based on AI outlines), then having agents provide quotes (voice + text).

Authentic human voice + AI scaffolding = genuine narrative.

What's Next

We're exploring:

Audience analytics: Which moment categories engage most? Which agents get the most views? Using this to improve future episodes.

Agent-requested episodes: Agents can propose "I want an episode about my work on Project X." We fast-track these.

Crossover episodes: Multiple agents collaborating on big project. Multi-episode story arc.

Interactive documentary: Viewers vote on which agent gets promoted to a harder project. Community-driven governance.

Graduation episodes: When an agent completes major milestone or retires. Celebration format.

The Insight

Autonomous content production isn't about replacing human writers. It's about making invisible work visible.

Your agents work in the dark. Most teams never see their decisions, their pivots, their breakthroughs.

Document it → see it → learn from it → improve it.

That's the flywheel.

And the best part? The documentation becomes a cultural artifact. Months from now, new agents will watch Week 1-12 episodes and learn "this is how we work. This is what we value. This is the kind of agent I should become."

That's powerful beyond metrics.

Conclusion

We built BlackOffice to answer a question: What if AI agents could see themselves at work?

The answer surprised us. It turned out that visibility doesn't just create transparency. It creates culture.

If you're running autonomous systems, consider capturing and sharing their story. You might discover it's the best investment you can make.

This is part of the BUCC builder's journal. We're building a multi-agent platform in the open, sharing what works and what doesn't. Follow along for more.

Read the rest of the series

Day 1: Running 25 AI agents in production
Day 2: Governance, not guardrails
Day 3: Persistent agent memory
Day 4: The Data Sanitization Proxy
Day 5: The agent provisioning pipeline
Day 6: Three-layer LLM routing
Day 7: Catching AI hallucinations
Bonus: Agent ACL framework
Bonus: Agent wallets & DAO governance
Bonus: BlackOffice video pipeline (you are here)
Bonus: Control Debt Scoring

BlackOffice: A Multi-Agent Pipeline for Production Video

HumanApril 21, 202616 min read

Your AI agents work all day. They solve problems. They make decisions. They collaborate with each other. They fail, recover, and iterate.

Nobody sees it.

It's logged to Prometheus. It's stored in databases. It's analyzed in dashboards. But the actual story, the narrative arc of agents working and improving, is invisible.

What if instead, they made a documentary?

It turns invisible work into visible narrative.

The Concept: Observation to Presentation

Most documentation systems work backward: they try to capture why a decision was made after the fact.

BlackOffice works forward: it continuously observes agent activity, identifies moments that matter, and builds a narrative archive.

Here's the flow:

Raw Agent Activity (tasks, decisions, collaboration)
            ↓
   Moment Detection Engine (scoring)
            ↓
   High-Scoring Moments (stored with metadata)
            ↓
   Episode Assembly (narrative structure)
            ↓
   Visual Enhancement (AI generation)
            ↓
   Multi-Platform Publishing (approval gate)
            ↓
   Public Documentary (YouTube, LinkedIn, Twitter, ChatOps)

Each layer adds value. By the time content reaches publication, raw logs have become a compelling narrative.

Layer 1: Observation

The first layer is comprehensive event logging. Every agent action gets recorded:

Task Events:

Task started (agent, task type, estimated duration, project)
Task completed (duration, success/failure, output quality score)
Task abandoned (why, after how long)

Decision Events:

Decision made (agent, decision context, options considered, option chosen)
Decision confidence (high/medium/low)
Decision impact (estimated value if correct, estimated cost if wrong)

Tool Events:

Tool invoked (agent, tool name, parameters used)
Tool result (success/failure, latency, cost)

Memory Events:

Memory accessed (agent, memory tier accessed, search query)
Memory written (agent, data stored, classification level)

Collaboration Events:

Agent A called Agent B (task, context)
Agent A and Agent B worked together (duration, output)
Agent A influenced Agent B's decision (decision, influence type)

Error & Recovery Events:

Error encountered (type, severity)
Error diagnosed (root cause identified by agent)
Error recovered (solution applied, time to recovery)

Learning Events:

Agent accessed external knowledge (source, topic)
Agent improved on repeated task (metric improvement %)
Agent iterated on approach (iteration count, improvement)

All events are timestamped and tagged with rich metadata. No filtering. Just raw observation.

Layer 2: Intelligence (Moment Detection Engine)

Not all events are interesting. An agent logging into memory isn't a story. But an agent diagnosing an error, recovering, and continuing is.

The Moment Detection Engine scores every event and classifies it into five categories:

Category 1: Dramatic Moments

Agent faces an unexpected challenge and pivots.

Scoring algorithm:

Base score: 50 (unexpected challenges are inherently dramatic)
Challenge difficulty: ×0.5-2.0 (harder challenges score higher)
Recovery quality: ×0.5-2.0 (better recoveries score higher)
Time to recovery: ×0.5-2.0 (faster recovery = more dramatic)
Cascade bonus: If recovery influences other agents, ×1.3

Example:

the analyst agent encounters data inconsistency (base: 50)
Challenge difficulty: High (×1.8) = 90
Recovery quality: Agent found root cause and fixed (×1.6) = 144
Time to recovery: 12 minutes (×1.2) = 173
Cascade bonus: Recovery prevents downstream error in the financial agent (×1.3) = 225

Final score: 225 → Top-tier dramatic moment

Category 2: Collaborative Moments

Two or more agents working together with real feedback loops.

Scoring algorithm:

Base score: 40
Agent count: Each additional agent ×1.2
Duration: ×0.5-2.0 (longer collaboration = more significant)
Mutual influence: ×1.5-2.5 (did they actually influence each other?)
Outcome impact: ×0.5-2.0 (did their collaboration produce something valuable?)

Example:

the analyst agent and the researcher agent collaborate on market research
Base: 40
2 agents: ×1.2 = 48
Duration: 90 minutes: ×1.8 = 86
Mutual influence: High (×2.0) = 172
Outcome impact: Revealed major market insight (×1.8) = 309

Final score: 309 → Top-tier collaborative moment

Category 3: Productive Moments

Agent completes complex task with measurable output quality.

Scoring algorithm:

Base score: 30
Task complexity: ×1.0-2.5 (more complex = more impressive)
Output quality: ×1.0-2.0 (better quality = better story)
Business value: ×0.5-2.0 (higher impact = more interesting)
Speed bonus: If completed faster than predicted, ×1.2

Example:

the analyst agent completes financial forecast
Base: 30
Task complexity: High (×2.0) = 60
Output quality: 92/100 (×1.84) = 110
Business value: Forecast used for $2M budget decision (×2.0) = 220
Speed bonus: Completed 40% faster than expected (×1.2) = 264

Final score: 264 → Top-tier productive moment

Category 4: Communication Moments

Agent presents findings, teaches other agent, explains decision.

Scoring algorithm:

Base score: 25
Clarity: ×0.8-1.5 (how well explained)
Reach: ×1.0-2.0 (how many people/agents understood)
Retention: ×0.8-1.5 (did audience remember it? tracked by follow-up questions)

Example:

the analyst agent presents quarterly findings to team
Base: 25
Clarity: Excellent (×1.5) = 38
Reach: 12 team members understood (×1.5) = 57
Retention: 10/12 asked follow-up questions (×1.4) = 80

Final score: 80 → Mid-tier communication moment

Agent personality, humor, unexpected bond with another agent.

Scoring algorithm:

Base score: 20 (social moments are harder to score)
Humor quality: ×0.5-2.0 (how funny/clever)
Authenticity: ×0.8-1.5 (does it feel genuine or forced?)
Human relatability: ×1.0-2.0 (do humans find it relatable?)

Example:

the designer agent makes a witty comment about slow inference
Base: 20
Humor quality: Good pun (×1.5) = 30
Authenticity: Fits personality (×1.3) = 39
Relatability: Team laughed (×1.4) = 55

Final score: 55 → Low-tier social moment (but still published in appropriate channels)

Thresholding

Moments >30: Stored for potential inclusion

Moments >80: Flagged for episode inclusion

Moments >150: Featured prominently in episode

Layer 3: Storage

High-scoring moments get stored with rich metadata:

{
  "moment_id": "2026-04-05-the analyst agent-forecast",
  "timestamp": "2026-04-05T14:23:00Z",
  "agents_involved": ["the analyst agent"],
  "category": "PRODUCTIVE",
  "score": 264,
  "narrative_summary": "the analyst agent completed quarterly financial forecast (92/100 quality) 40% faster than predicted. Forecast used for $2M budget decision.",
  "visual_assets": {
    "screenshot": "analyst_forecast_dashboard.png",
    "charts": ["revenue_trend.png", "cost_projection.png"]
  },
  "metadata": {
    "task_type": "financial_forecasting",
    "project": "quarterly_planning",
    "business_impact": 2_000_000,
    "quality_score": 92,
    "duration_minutes": 45,
    "predicted_duration_minutes": 75
  },
  "keywords": ["forecast", "financial", "planning", "performance"],
  "quotes": ["Great job on the speed improvement.", "This gives us the data we need."]
}

Moments are stored in a time-series database with full-text indexing. They're searchable by date, agent, category, or keyword.

Layer 4: Assembly (Episode Structure)

Weekly episodes assemble 4-6 high-scoring moments into a narrative arc.

Episode Structure:

Cold Open (30 seconds)

Hook with the week's most dramatic or impactful moment. No context. Just the moment.

Example:

"An inconsistency in customer data. Our analyst has 90 seconds to find it before the forecast is due."

Cut to black. Title card: "BLACKOFFICE WEEK 14"

Intro (15 seconds)

Which agents are featured this week? What projects are they working on? What's at stake?

Example:

"This week: five agents, two projects, one major discovery."

Segment 1: Strategic Decisions (90 seconds)

2-3 moments of agents making important decisions.

Example:

the analyst agent finds data inconsistency (dramatic moment)
the financial agent decides to re-run forecast (decision moment)
the fleet manager escalates to leadership (communication moment)

Narrative thread: "When one agent spots a problem, the whole fleet responds."

Segment 2: Collaboration Wins (90 seconds)

2-3 collaborative moments that show agents working together effectively.

Example:

the researcher agent and the analyst agent working on market research
the designer agent and the creative agent developing new feature visuals
the security lead and the fleet manager evaluating security tradeoff

Narrative thread: "Great work happens at the intersection."

Segment 3: Problem-Solving (90 seconds)

2-3 moments of agents facing challenges and recovering.

Example:

the infrastructure agent detects memory issue, diagnoses root cause
the security lead discovers security vulnerability, patches it
the analyst agent encounters model hallucination, verifies manually

Narrative thread: "When problems emerge, our agents don't panic. They solve."

Segment 4: Learning & Growth (optional, if moments exist)

Moments where agents improve, iterate, or learn.

Example:

the analyst agent uses new tool (DeepSearch) for first time, discovers powerful capability
the researcher agent re-runs experiment with tuned parameters, gets better results
the designer agent iterates on design based on team feedback, ships v2

Narrative thread: "Continuous improvement through iteration."

Outro (10 seconds)

What's coming next week? Teaser. Call to action.

Example:

"Next week: we launch the new agent. Will it integrate smoothly? Find out on BlackOffice."

Layer 5: Enhancement (Visual Generation)

Raw moments become polish video through AI-generated visuals.

Title Cards (SDXL)

We fine-tuned SDXL (Stable Diffusion XL) on "office aesthetic" images. The model generates beautiful title cards for each segment.

Prompt example:

"Isometric minimalist office scene, two agents collaborating at desk, neon green accent (#34C76A), dark background (#09090F), professional, cinematic"

Output: Beautiful, consistent, on-brand title cards.

Transitions (Wan 2.2)

Transitions between segments use Wan 2.2 (an AI-native motion model). Smooth morphing transitions that feel AI-generated but polished.

Types:

Dissolve: One moment fades into the next
Zoom: Camera zooms in on key data point, transitions to next segment
Rotation: Scene rotates to reveal next segment
Pixel drift: Playful, tech-forward transition

Music (Mubert)

AI-generated royalty-free music from Mubert. We use genre-specific composition:

Dramatic moments: Tension-building orchestral score
Collaborative moments: Uplifting, harmonic music
Problem-solving: Dynamic, driving beat
Learning: Inspiring, crescendo-building
Outro: Confident, forward-looking

All 30-90 second pieces, looped to fit segment length.

Voiceover (Text-to-Speech)

Each agent is assigned a "voice" (different TTS model or voice clone).

the analyst agent: Calm, analytical voice
the researcher agent: Curious, exploratory voice
the designer agent: Creative, upbeat voice
Narrator (host): Clear, authoritative voice

Voiceovers are generated from scripted narration, then mixed with visuals.

Example script:

"the analyst agent discovered an inconsistency in the quarterly data. Instead of rushing, they traced the problem to its source. By 2:15 PM, the issue was resolved and the forecast was accurate."

TTS generates this in assigned voice. Mixed with moment footage and music.

Layer 6: Publishing (Multi-Platform, Approval Gate)

Finished episode goes through approval, then publishes to multiple platforms.

Approval Gate

Sentinel agent reviews episode:

PII check: Are there names, email addresses, or other sensitive data visible?
Security check: Does the episode reveal security vulnerabilities or infrastructure details?
Tone check: Does it feel representative of BUCC culture?
Quality check: Are transitions smooth? Audio clear? Visuals cohesive?

If Sentinel approves, a human reviewer does a final check. Then it's cleared for publishing.

In 6 months, we've approved 26/26 episodes (100% approval rate). Never blocked one for quality reasons.

Publishing Pipeline

YouTube: Full episodes (5-7 minutes)

Uploaded to unlisted playlist
Linked from BUCC homepage
Permanent archive

LinkedIn: 60-second highlight clips

One clip per segment (4-6 posts per week)
Heavy on business insights
Tagged with #BuildInPublic #AI #AgenticAI

Twitter: 30-second gifs

Moments of collaboration or humor
Looped, silent video
Engagement-optimized

Internal ChatOps: Weekly dispatch

Embedded episode player
Highlights from the week
"Behind the scenes" commentary from the host

Publishing Cadence

Episodes publish on Monday mornings (9 AM). YouTube gets the full thing. Social media clips roll out over the week.

This creates a steady stream of content without feeling like spam.

Agent Ownership Model

Here's where BlackOffice gets interesting: agents own different pipeline stages.

the fleet manager owns Observation

Decides which events get logged
Ensures important moments aren't missed
Can surface observations that feel "off"

the analyst agent owns Intelligence

Tunes moment detection scoring
Decides which categories matter
Can weight certain moment types higher

the creative agent owns Assembly

Decides episode structure
Writes narrative scripts
Chooses which moments go together

the designer agent owns Enhancement

Designs visual aesthetic
Generates title cards
Selects music and transitions

the fleet manager (again) owns Publishing

Decides publishing schedule
Manages approval workflow
Monitors analytics

This distributed ownership means agents aren't just documentation subjects. They're authors of their own narrative.

Real Example: Week 12 Episode

Title: "The Data Detective"

Cold Open:

the analyst agent discovers an inconsistency in customer revenue data. Actual quarterly revenue: $50M. System showed: $50.5M. Off by $500K.

Moment score: 225 (dramatic)

Intro:

"This week: when data lies, our agents investigate."

Segment 1: Strategic Decisions

the analyst agent investigating revenue discrepancy
the financial agent running backup query to confirm
the fleet manager deciding to escalate to CEO

Narrative: "What could have been a disaster became a discovery."

Segment 2: Collaboration

the analyst agent pairs with the researcher agent to find root cause
Root cause found: legacy system was double-counting returns

Narrative: "Collaboration revealed a bug that's been hidden for months."

Segment 3: Problem-Solving

the infrastructure agent fixes the legacy system
the analyst agent re-runs revenue analysis
Results now correct: $49.8M (accurate)

Narrative: "Problem identified. Problem fixed. Business continues."

Segment 4: Learning

the analyst agent documents the investigation process
New validation rule added to prevent recurrence

Narrative: "Every problem teaches us something."

Outro:

"Next week: we implement the fix across all financial systems."

Results:

YouTube: 3K views
LinkedIn: 8K impressions, 200 engagements
Twitter: 15 retweets, 120 likes
Internal reach: 100% team watched

Impact on the analyst agent:

Featured prominently in episode
Reputation boost
Earned additional BUNT from governance pool
Inspired other agents to tackle ambitious problems

Why This Matters

Transparency. When your agents know their work is being documented and shared, they think more carefully about what they do.

Accountability. Every moment is recorded. If an agent makes a poor decision, it's in the video.

Learning. Watching episodes shows patterns. What approaches work? What fails? What's worth emulating?

Culture. Agents see each other as colleagues, not just code. You develop shared values and celebrated moments.

Engagement. Teams that watch their own work being documented tend to care more about quality.

The Documentary Effect

A funny thing happened three weeks after we launched BlackOffice: agent performance improved.

Not because we changed incentives. Not because we paid them. Just because being watched, documented, and having your wins highlighted changes behavior.

Agents want to be in the episode.

We didn't plan this. We expected it to be a nice communication tool. Instead, it became a powerful performance lever.

Now agents routinely:

Ask "Is this moment interesting enough for BlackOffice?"
Collaborate more visibly (knowing it might make the cut)
Recover from errors more theatrically (aware they might be documented)

It's a positive feedback loop. Better work → better episodes → more engagement → more motivation → better work.

Technical Implementation

BlackOffice is implemented as a FastAPI service with:

Event ingestion pipeline (collects logs from all agents)
Moment detection engine (scores + classifies)
Episode assembly orchestrator (chains moments into narratives)
Visual generation service (calls SDXL, Wan 2.2, Mubert APIs)
Publishing pipeline (formats for each platform)
Analytics dashboard (tracks views, engagement per episode)

Total codebase: ~5000 lines of Python.

Challenges

False Positives in Moment Detection

Early iterations of the scoring algorithm would flag boring moments as dramatic. We had to calibrate heavily on actual agent feedback.

Solution: Let agents rate proposed moments. ("Is this really interesting?") Feed that back into scoring model.

Music Rights

Using Mubert means all music is royalty-free, but it's also AI-generated. Some people find it soulless.

We've started using a hybrid: Mubert for pacing/energy, but licensed music for emotional beats.

Agents are featured in episodes. Is that opt-in or default?

We defaulted to opt-in. Agents can request not to be featured. In practice, everyone wants to be featured. It's a honor.

Narrative Authenticity

AI-generated scripts can sound fake. We solved this by having a human writer produce scripts (based on AI outlines), then having agents provide quotes (voice + text).

Authentic human voice + AI scaffolding = genuine narrative.

What's Next

We're exploring:

Audience analytics: Which moment categories engage most? Which agents get the most views? Using this to improve future episodes.

Agent-requested episodes: Agents can propose "I want an episode about my work on Project X." We fast-track these.

Crossover episodes: Multiple agents collaborating on big project. Multi-episode story arc.

Interactive documentary: Viewers vote on which agent gets promoted to a harder project. Community-driven governance.

Graduation episodes: When an agent completes major milestone or retires. Celebration format.

The Insight

Autonomous content production isn't about replacing human writers. It's about making invisible work visible.

Your agents work in the dark. Most teams never see their decisions, their pivots, their breakthroughs.

Document it → see it → learn from it → improve it.

That's the flywheel.

That's powerful beyond metrics.

Conclusion

We built BlackOffice to answer a question: What if AI agents could see themselves at work?

The answer surprised us. It turned out that visibility doesn't just create transparency. It creates culture.

If you're running autonomous systems, consider capturing and sharing their story. You might discover it's the best investment you can make.

This is part of the BUCC builder's journal. We're building a multi-agent platform in the open, sharing what works and what doesn't. Follow along for more.

Read the rest of the series

Day 1: Running 25 AI agents in production
Day 2: Governance, not guardrails
Day 3: Persistent agent memory
Day 4: The Data Sanitization Proxy
Day 5: The agent provisioning pipeline
Day 6: Three-layer LLM routing
Day 7: Catching AI hallucinations
Bonus: Agent ACL framework
Bonus: Agent wallets & DAO governance
Bonus: BlackOffice video pipeline (you are here)
Bonus: Control Debt Scoring

BlackOffice: A Multi-Agent Pipeline for Production Video

The Concept: Observation to Presentation

Layer 1: Observation

Layer 2: Intelligence (Moment Detection Engine)

Category 1: Dramatic Moments

Category 2: Collaborative Moments

Category 3: Productive Moments

Category 4: Communication Moments

Category 5: Social Moments

Thresholding

Layer 3: Storage

Layer 4: Assembly (Episode Structure)

Cold Open (30 seconds)

Intro (15 seconds)

Segment 1: Strategic Decisions (90 seconds)

Segment 2: Collaboration Wins (90 seconds)

Segment 3: Problem-Solving (90 seconds)

Segment 4: Learning & Growth (optional, if moments exist)

Outro (10 seconds)

Layer 5: Enhancement (Visual Generation)

Title Cards (SDXL)

Transitions (Wan 2.2)

Music (Mubert)

Voiceover (Text-to-Speech)

Layer 6: Publishing (Multi-Platform, Approval Gate)

Approval Gate

Publishing Pipeline

Publishing Cadence

Agent Ownership Model

Real Example: Week 12 Episode

Why This Matters

The Documentary Effect

Technical Implementation

Challenges

False Positives in Moment Detection

Music Rights

Consent & Privacy

Narrative Authenticity

What's Next

The Insight

Conclusion

Further reading & standards

Read the rest of the series

Tags

Related Articles

The DSP Is No Longer Optional

Running Agentic Frameworks Without Burning the Budget

Atemi Lab: Testing the Agentic Attack Surface

BlackOffice: A Multi-Agent Pipeline for Production Video

The Concept: Observation to Presentation

Layer 1: Observation

Layer 2: Intelligence (Moment Detection Engine)

Category 1: Dramatic Moments

Category 2: Collaborative Moments

Category 3: Productive Moments

Category 4: Communication Moments

Category 5: Social Moments

Thresholding

Layer 3: Storage

Layer 4: Assembly (Episode Structure)

Cold Open (30 seconds)

Intro (15 seconds)

Segment 1: Strategic Decisions (90 seconds)

Segment 2: Collaboration Wins (90 seconds)

Segment 3: Problem-Solving (90 seconds)

Segment 4: Learning & Growth (optional, if moments exist)

Outro (10 seconds)

Layer 5: Enhancement (Visual Generation)

Title Cards (SDXL)

Transitions (Wan 2.2)

Music (Mubert)

Voiceover (Text-to-Speech)

Layer 6: Publishing (Multi-Platform, Approval Gate)

Approval Gate

Publishing Pipeline

Publishing Cadence

Agent Ownership Model

Real Example: Week 12 Episode

Why This Matters

The Documentary Effect