How We Built an AI Dungeon Master That Plans Behind Your Back
Inside RoleCall's DM system — a story director that plants seeds, runs a shadow world, and never spoils the narrator.
Most AI chat apps have one model doing everything: understanding context, planning the story, writing the prose. It's like asking a single person to be the author, the director, and the actor simultaneously.
At RoleCall, we split those jobs up. The result is a system we call the DM — a dedicated AI story director that plans narratives, manages a hidden world state, and feeds filtered guidance to a separate narrator model. The narrator writes beautiful prose. The DM makes sure that prose is going somewhere.
Here's how it works.
The architecture: two AIs, one story
RoleCall's storytelling runs on a dual-stream architecture:
The DM is the story director. It analyzes each turn, maintains long-term story arcs, plants narrative seeds, and generates filtered directives for the narrator. It plans ahead — it knows where the story could go, but it never forces a path.
The Narrator writes the actual prose. It receives the DM's directive plus curated context, and its job is to write compelling, immersive text. Crucially, the narrator never sees the DM's full plan. It gets hints, not spoilers.
This separation is the core insight. A model that knows the villain's secret identity will accidentally reveal it. A model that only knows "build tension around this character's trustworthiness" will create genuine suspense.
The per-turn pipeline
Every time a player sends a message, it triggers a multi-stage processing pipeline. Here's the full flow:
Let me break down each stage.
1. Load context
The DM's entire planning state — the DMContext — is loaded from storage. This includes active story arcs, planted seeds, the shadow world, core memories, pacing state, and the previous turn's directive (for self-audit). If no context exists, a fresh one is created.
2. Compute momentum
Before the DM model even runs, a deterministic pre-computation layer called the momentum engine analyzes the current story state. This catches things that would otherwise stall:
- Stale arcs — plot threads with no progression for
horizon × 2turns - Forced actions — arcs exceeding
horizon × 3turns that must be resolved this turn - Ripe seeds — foreshadowing that's been reinforced enough and is past its payoff horizon
- Shadow opportunities — hidden world events ready to become visible
- Planting pressure — nudges the DM to plant new seeds if none have been created recently
The momentum report gets injected directly into the DM's prompt. Forced actions aren't suggestions — the DM must address them.
3. Build prompt & call the model
The system prompt is assembled from the story context, the momentum report, a compendium snapshot (read-only character data), and the self-audit section (last turn's directive vs. what the narrator actually wrote).
The DM model runs with tool calling enabled — up to 2 rounds. It can invoke ~50 specialized tools for managing arcs, planting seeds, tracking relationships, running combat, advancing the calendar, and more.
4. The directive firewall
This is where the magic happens. The DM's raw response goes through a three-stage sanitization pipeline before the narrator ever sees it:
Stage 1: Enrich from momentum. If the momentum engine flagged forced actions or ripe seeds, they get injected into the directive as arc directions and callbacks. This ensures the narrator acts on stalled story threads even if the DM model forgot to address them.
Stage 2: Sanitize for player character. The DM must never puppet the player's character. This stage strips interiority claims — any directive telling the narrator how the player thinks, feels, notices, or remembers. It blocks a specific list of verbs:
think, feel, notice, sense, perceive, intuit, remember, recall, realize, understand, recognize, believe, suspect, doubt, wonder, want, hope, fear, trust, decide, choose, reflect, process
Staging verbs are allowed ("Merlin finds", "Place Merlin at") — only inner experience is blocked.
Stage 3: Strip already-delivered callbacks. If the narrator already wrote something that matches a callback, don't ask for it again. The system tokenizes both the callback and the previous prose, computes overlap, and strips callbacks with ≥60% token intersection. This prevents the frustrating loop where the DM keeps prescribing what was already said.
What the narrator actually sees
After the firewall, the narrator receives a NarratorDirective — a carefully filtered set of guidance:
{
// Tone
mood: "ominous",
pacing: "slow",
// What to weave in (no spoilers)
include: {
characters: ["Valen", "The Merchant"],
themes: ["trust", "hidden agendas"],
sensoryDetails: ["flickering candlelight", "distant thunder"],
foreshadowing: ["the locked basement door"]
},
// What to avoid
avoid: {
revelations: ["Valen's true identity"],
characters: ["The Shadow Council"]
},
// Scene-level guidance
arcDirections: [{
arcName: "The Betrayal",
instruction: "Valen hints at divided loyalty",
urgency: "moderate"
}],
// Seed callbacks
callbacks: [{
reference: "the locked room from chapter 2",
instruction: "Have the merchant mention hearing noises"
}],
// NPC behavior
dialogueGuidance: [{
character: "Valen",
tone: "Evasive but sincere",
hints: ["avoids eye contact when trust is mentioned"]
}]
}
The narrator interprets this through prose. It doesn't know why Valen should seem evasive — it just knows to write it that way. The result feels organic rather than scripted.
Story structure: arcs, seeds, and the shadow world
The DM thinks in terms of long-running narrative structures.
Story arcs
Each arc is a plot thread with a state machine:
Arcs have a horizon (how long they're expected to run — from immediate to epic at 100+ turns), a priority level, planned beats, and a nextBeat field that tracks the DM's prescriptive plan for what should happen next.
The nextBeat was a critical fix. Early versions of the DM were journaling — recording what happened after the narrator wrote it. Adding nextBeat made the DM prescriptive: it plans ahead, then compares intent to delivery on the next turn.
Planted seeds
Seeds are the DM's foreshadowing system. A throwaway mention of a locked door, an NPC who seems nervous, a weather change — each is planted with a planned payoff horizon:
- Immediate: 1-5 turns
- Short: 5-15 turns
- Medium: 15-40 turns
- Long: 40-100 turns
- Epic: 100+ turns
Seeds get reinforced — subtle reminders that keep the detail alive without revealing its purpose. The momentum engine tracks reinforcement count alongside turn age to decide when a seed is ripe for payoff.
When a seed finally blooms, the player thinks: "Wait, that was set up ages ago!" It feels like real continuity — because it is.
The shadow world
This is my favorite part. The DM maintains an entire hidden layer of story:
- Invisible arcs: Plot threads unfolding off-screen that the player hasn't encountered yet. Each has a
surfaceConditionandexpectedSurfaceTurn. - Hidden events: Things happening in the background — an army marching, a plague spreading, a political coup.
- NPC off-screen actions: What characters do when the player isn't watching. The innkeeper might be hiding refugees. The guild leader might be negotiating with enemies.
- Parallel timelines: "Meanwhile, elsewhere..." tracking for simultaneous events.
The player might be having a quiet conversation in a tavern while, in the shadow world, an army is marching toward the city. The DM knows. The narrator doesn't. When the player finally discovers the army, it feels like a real revelation — because it was real all along, just hidden.
The DM's toolbox
The DM can call ~50 specialized tools during each turn. These aren't just for narrative planning — they manipulate actual game state:
- Story tools:
create_arc,update_arc,plant_seed,batch_update_arcs - World tools: Locations, calendar progression, weather, time-of-day
- Character tools: Relationships, NPC schedules, secrets, dialogue history
- Game tools: Combat encounters, items, quests, status effects, inventory
- Shadow tools: Invisible arcs, hidden events, NPC off-screen actions
Each tool returns structured results that flow to both the DM's context (for planning) and the client-side game state (for UI updates like combat trackers, relationship meters, and inventory).
interface DMToolResult {
success: boolean;
message: string;
directives: ImmersionDirective[]; // → game state stores
contextUpdates: Partial<DMContext>; // → DM memory
pendingInteraction?: unknown; // → UI prompts
}
When the DM decides to advance the calendar by a week, the tool handles all cascading effects — NPC schedules shift, seasonal changes apply, pending events trigger.
DM personalities
Not every story needs the same director. We built a personality system where each DM personality defines its storytelling philosophy:
- The Weaver — everything connects. Every detail matters, every character relates to every other.
- The Architect — elaborate structures with precise payoffs. Builds intricate plot machines.
- The Oracle — deals in prophecy, fate, and dramatic irony. Loves giving the player clues they don't understand yet.
- The Shadow — tension, paranoia, unreliable narrators. What you don't see is more important than what you do.
- The Jester — comedic timing, absurdist escalation, genre-savvy characters.
Each personality configures improvisation style (rigid/adaptive/fluid), player agency level (guided/collaborative/freeform), information control (tight/moderate/loose), and critically — directive safety rules defining what must never appear in narrator guidance.
Players choose their DM personality, and it fundamentally changes how stories unfold with the same characters and setting.
The self-audit loop
The DM watches itself. After each turn, the previous turn's directive is surfaced alongside what the narrator actually wrote. The DM compares intent to delivery:
- Did the narrator pick up on the tension hint?
- Did it correctly foreshadow the upcoming event?
- Did it accidentally reveal something it shouldn't have?
This feedback loop means the DM adapts to the narrator's tendencies over time. If the narrator consistently ignores subtle hints, the DM learns to be more direct. If the narrator over-interprets guidance, the DM learns restraint.
Putting it all together
Here's how the full story state looks at any given moment:
What we learned
Building the DM system taught us that the hard problem in AI storytelling isn't generating good text — modern models are excellent at that. The hard problem is planning.
Planning requires maintaining state across hundreds of turns. It requires information hiding — knowing things without revealing them. It requires balancing multiple competing narrative threads. And it requires the discipline to let some things simmer for a long time before paying off.
No single model call can do all of this. You need architecture. You need persistent state. You need separation of concerns between the planner and the writer. And you need a momentum engine that prevents the whole thing from stalling out.
The DM is the most complex piece of RoleCall's stack — and it's the reason our stories feel different from a typical AI chat. The narrator writes beautifully. But the DM is why those beautiful words are actually going somewhere.