Anatomy of a Roleplay Agent: How RoleCall Orchestrates AI
RoleCall isn't a chatbot with a system prompt. It's an agentic system where multiple AI models coordinate per turn — planning, retrieving, generating, and updating game state.
When people think of AI roleplay, they picture a chatbot: you type a message, a model generates a response. One prompt in, one completion out. That's how most apps work.
RoleCall doesn't work that way. Under the hood, every turn triggers an orchestrated pipeline of multiple AI models running in parallel and sequence — planning the story, selecting knowledge, generating prose, calling tools, and updating game state. It's not a chatbot. It's an agent.
Here's the full anatomy of what happens when you press send.
The turn lifecycle
A single user action — "I examine the artifact" — triggers a cascade that touches three AI layers, a lorebook database, a tool execution engine, and a game state system. The whole thing takes about 8 seconds. Here's what happens in those 8 seconds:
Let's walk through each layer.
Layer 1: The TurnDM Agent
The TurnDM is the fastest layer — it runs with a 5-second timeout and makes snap decisions about what knowledge the narrator needs for this specific turn.
It receives the player's action, a pool of keyword-matched lorebook entries, and a list of available prose modules. Its job is to select the most relevant subset:
- 2-4 lorebook entries — with insertion positions (before the scene, alongside a character, after the scene)
- 1-2 prose modules — style instructions injected at specific points in the generation (opening, middle, closing)
- Narrative guidance — immediate goals, characters to introduce, plot threads to advance
- Narrator hints — tone adjustments, emotional beats, elements to focus on or avoid
The TurnDM also maintains narrative intentions — a lightweight plan for staged encounters and pending events. These are "intentions, not requirements" — the DM suggests where the story could go, but the narrator and player ultimately decide.
The model chain falls back aggressively: Claude Sonnet → Gemini Flash → OpenRouter. If one model is slow or down, the next picks up within 5 seconds.
Layer 2: The DM Stream
While the TurnDM handles turn-level selection, the DM Stream handles story-level planning. This is the system I wrote about in the DM article — it maintains long-running story arcs, plants narrative seeds, runs the shadow world, and generates filtered directives for the narrator.
The DM Stream runs as a pre-gen agent — a child process of the Narrator Engine that executes before the main generation loop. It has access to 60+ tools for managing story state:
- Arc and chapter management
- Seed planting and payoff tracking
- NPC relationship and off-screen action tracking
- Combat, effects, and resource updates
- Calendar and seasonal progression
- Knowledge journaling and memory management
The tool-calling loop runs up to 2 rounds. In each round, the model can call multiple tools, the engine executes them (reads in parallel, mutations sequentially), and the results feed back into the next round.
The output is a NarratorDirective — filtered story guidance that tells the narrator what to write about without revealing why. Plus an ImmersionDirective that updates the game state (combat, relationships, inventory, etc.).
Layer 3: The Narrator Engine
The Narrator Engine is the core agentic loop. It's modeled on the same pattern as tools like Claude Code's QueryEngine — a class-based loop that runs inference, handles tool calls, and accumulates results across multiple rounds.
The engine runs the narrator model in a loop:
- Call the model with the assembled context — player action, recent history, lorebook entries, DM directive, prose instructions
- Parse the response — extract text content and any tool calls
- If tools were called, execute them with concurrency safety (reads in parallel, mutations sequentially), append results, and loop back
- If no tools, return the accumulated prose
The narrator can call tools mid-generation — most notably search_lorebook, which lets it look up facts it needs while writing. This is TV's Phase 2 (during-gen search), and it acts as a safety net for knowledge the pre-gen retrieval missed.
Concurrency safety
Tool execution isn't naive Promise.all. The engine partitions tools by safety:
// Concurrent-safe tools (reads) run in parallel
const reads = tools.filter(t => t.isConcurrencySafe());
await Promise.all(reads.map(t => t.call()));
// Exclusive tools (mutations) run sequentially after
const mutations = tools.filter(t => !t.isConcurrencySafe());
for (const t of mutations) await t.call();
This matters because some tools read game state while others mutate it. Running a combat tool that reads enemy HP concurrently with a damage tool that changes HP would be a race condition. The partition prevents it.
Context assembly
By the time the narrator starts generating, its context has been assembled from multiple sources:
Each source occupies a different part of the context window:
- World/character info — relatively stable, defines the setting
- Recent history — the last 3-6 turns, providing conversational continuity
- Lorebook entries — selected by TurnDM, positioned at specific insertion points
- DM directive — story guidance injected as system-level instructions
- TV lore — dynamically retrieved entries relevant to this turn
- Prose modules — style instructions that shape how the narrator writes
The total prompt typically lands around 4,000-8,000 tokens. Every piece is there for a reason — nothing is dumped in "just in case."
Immersion directives: the unified game state
All three layers can produce game state updates. The TurnDM might update relationships. The DM Stream might start a combat encounter. The narrator's tools might modify inventory. These all flow through the same structure — the ImmersionDirective:
{
combat: { inCombat: true, enemies: [...], turnOrder: [...] },
relationships: [{ npcName: "Valen", affinityDelta: +10, reason: "..." }],
resources: { hpDelta: -15, reason: "Artifact trap" },
effects: { apply: [{ name: "Cursed", duration: "3 turns" }] },
map: [{ type: "discover", locationId: "...", name: "Hidden Chamber" }],
quests: [{ type: "advance", questId: "...", stage: "artifact_found" }]
}
The Narrator Engine merges directives from all sources — array fields concatenate, object fields merge — and returns a single unified directive to the client. The client applies it to the game state stores, updating combat trackers, relationship meters, maps, and inventory in one pass.
Model fallback and resilience
Every AI call in the pipeline has a fallback chain. If the primary model is slow or errors out, the next model picks up:
- TurnDM: Claude Sonnet → Gemini Flash → OpenRouter (5s timeout per model)
- DM Stream: Nemo utility pool → Claude Sonnet → Gemini Flash → OpenRouter
- Narrator: Configurable per session
Beyond model fallback, the system degrades gracefully at the layer level:
- If lorebook matching fails → continue without lorebook entries
- If TurnDM times out → use keyword-matched entries directly (no smart selection)
- If DM Stream fails → continue without story direction
- If TV fails → narrator generates without retrieved lore
No single failure kills the turn. The narrator can always generate something — the supporting systems make it better, but they're not hard dependencies.
Post-generation: the quiet work
After the narrator finishes, more happens in parallel:
- TV Post-Gen — the sidecar reviews the response and writes new lorebook entries (remember, update, forget)
- Entity extraction — NPCs, items, and locations mentioned in the response are extracted for the UI
- Action suggestions — a lightweight model generates 3-4 suggested next actions
- Scene detection — the system identifies scene transitions for pacing
- Music detection — ambient music mode changes based on narrative mood
These run concurrently and don't block the response — the user sees the prose immediately while the background work completes.
Why this matters
A single model with a system prompt can generate decent roleplay text. But it can't:
- Plan ahead — it has no persistent state, no story arcs, no planted seeds
- Remember everything — it forgets facts as they fall out of the context window
- Manage game state — combat, inventory, relationships need structured tracking
- Recover from failures — if the model is down, everything is down
The agentic approach solves all of these by splitting responsibilities across specialized systems that coordinate per turn. The DM plans the story. TV manages knowledge. The narrator writes prose. Tools update game state. Each does what it's best at.
The result is an AI that doesn't just respond to what you said — it responds in the context of a planned narrative, with full knowledge of the world, while maintaining consistent game state. It's the difference between a chatbot and a game master.
That's what it means to build a roleplay agent.