Engineering2026-04-03·12 min

Anatomy of a Roleplay Agent: How RoleCall Orchestrates AI

RoleCall isn't a chatbot with a system prompt. It's an agentic system where multiple AI models coordinate per turn — planning, retrieving, generating, and updating game state.

When people think of AI roleplay, they picture a chatbot: you type a message, a model generates a response. One prompt in, one completion out. That's how most apps work.

RoleCall doesn't work that way. Under the hood, every turn triggers an orchestrated pipeline of multiple AI models running in parallel and sequence — planning the story, selecting knowledge, generating prose, calling tools, and updating game state. It's not a chatbot. It's an agent.

Here's the full anatomy of what happens when you press send.

The turn lifecycle

A single user action — "I examine the artifact" — triggers a cascade that touches three AI layers, a lorebook database, a tool execution engine, and a game state system. The whole thing takes about 8 seconds. Here's what happens in those 8 seconds:

Let's walk through each layer.

Layer 1: The TurnDM Agent

The TurnDM is the fastest layer — it runs with a 5-second timeout and makes snap decisions about what knowledge the narrator needs for this specific turn.

It receives the player's action, a pool of keyword-matched lorebook entries, and a list of available prose modules. Its job is to select the most relevant subset:

2-4 lorebook entries — with insertion positions (before the scene, alongside a character, after the scene)
1-2 prose modules — style instructions injected at specific points in the generation (opening, middle, closing)
Narrative guidance — immediate goals, characters to introduce, plot threads to advance
Narrator hints — tone adjustments, emotional beats, elements to focus on or avoid

The TurnDM also maintains narrative intentions — a lightweight plan for staged encounters and pending events. These are "intentions, not requirements" — the DM suggests where the story could go, but the narrator and player ultimately decide.

The model chain falls back aggressively: Claude Sonnet → Gemini Flash → OpenRouter. If one model is slow or down, the next picks up within 5 seconds.

Layer 2: The DM Stream

While the TurnDM handles turn-level selection, the DM Stream handles story-level planning. This is the system I wrote about in the DM article — it maintains long-running story arcs, plants narrative seeds, runs the shadow world, and generates filtered directives for the narrator.

The DM Stream runs as a pre-gen agent — a child process of the Narrator Engine that executes before the main generation loop. It has access to 60+ tools for managing story state:

Arc and chapter management
Seed planting and payoff tracking
NPC relationship and off-screen action tracking
Combat, effects, and resource updates
Calendar and seasonal progression
Knowledge journaling and memory management

The tool-calling loop runs up to 2 rounds. In each round, the model can call multiple tools, the engine executes them (reads in parallel, mutations sequentially), and the results feed back into the next round.

The output is a NarratorDirective — filtered story guidance that tells the narrator what to write about without revealing why. Plus an ImmersionDirective that updates the game state (combat, relationships, inventory, etc.).

Layer 3: The Narrator Engine

The Narrator Engine is the core agentic loop. It's modeled on the same pattern as tools like Claude Code's QueryEngine — a class-based loop that runs inference, handles tool calls, and accumulates results across multiple rounds.

The engine runs the narrator model in a loop:

Call the model with the assembled context — player action, recent history, lorebook entries, DM directive, prose instructions
Parse the response — extract text content and any tool calls
If tools were called, execute them with concurrency safety (reads in parallel, mutations sequentially), append results, and loop back
If no tools, return the accumulated prose

The narrator can call tools mid-generation — most notably search_lorebook, which lets it look up facts it needs while writing. This is TV's Phase 2 (during-gen search), and it acts as a safety net for knowledge the pre-gen retrieval missed.

Concurrency safety

Tool execution isn't naive Promise.all. The engine partitions tools by safety:

// Concurrent-safe tools (reads) run in parallel
const reads = tools.filter(t => t.isConcurrencySafe());
await Promise.all(reads.map(t => t.call()));

// Exclusive tools (mutations) run sequentially after
const mutations = tools.filter(t => !t.isConcurrencySafe());
for (const t of mutations) await t.call();

This matters because some tools read game state while others mutate it. Running a combat tool that reads enemy HP concurrently with a damage tool that changes HP would be a race condition. The partition prevents it.

Context assembly

By the time the narrator starts generating, its context has been assembled from multiple sources:

Each source occupies a different part of the context window:

World/character info — relatively stable, defines the setting
Recent history — the last 3-6 turns, providing conversational continuity
Lorebook entries — selected by TurnDM, positioned at specific insertion points
DM directive — story guidance injected as system-level instructions
TV lore — dynamically retrieved entries relevant to this turn
Prose modules — style instructions that shape how the narrator writes

The total prompt typically lands around 4,000-8,000 tokens. Every piece is there for a reason — nothing is dumped in "just in case."

Immersion directives: the unified game state

All three layers can produce game state updates. The TurnDM might update relationships. The DM Stream might start a combat encounter. The narrator's tools might modify inventory. These all flow through the same structure — the ImmersionDirective:

{
  combat: { inCombat: true, enemies: [...], turnOrder: [...] },
  relationships: [{ npcName: "Valen", affinityDelta: +10, reason: "..." }],
  resources: { hpDelta: -15, reason: "Artifact trap" },
  effects: { apply: [{ name: "Cursed", duration: "3 turns" }] },
  map: [{ type: "discover", locationId: "...", name: "Hidden Chamber" }],
  quests: [{ type: "advance", questId: "...", stage: "artifact_found" }]
}

The Narrator Engine merges directives from all sources — array fields concatenate, object fields merge — and returns a single unified directive to the client. The client applies it to the game state stores, updating combat trackers, relationship meters, maps, and inventory in one pass.

Model fallback and resilience

Every AI call in the pipeline has a fallback chain. If the primary model is slow or errors out, the next model picks up:

TurnDM: Claude Sonnet → Gemini Flash → OpenRouter (5s timeout per model)
DM Stream: Nemo utility pool → Claude Sonnet → Gemini Flash → OpenRouter
Narrator: Configurable per session

Beyond model fallback, the system degrades gracefully at the layer level:

If lorebook matching fails → continue without lorebook entries
If TurnDM times out → use keyword-matched entries directly (no smart selection)
If DM Stream fails → continue without story direction
If TV fails → narrator generates without retrieved lore

No single failure kills the turn. The narrator can always generate something — the supporting systems make it better, but they're not hard dependencies.

Post-generation: the quiet work

After the narrator finishes, more happens in parallel:

TV Post-Gen — the sidecar reviews the response and writes new lorebook entries (remember, update, forget)
Entity extraction — NPCs, items, and locations mentioned in the response are extracted for the UI
Action suggestions — a lightweight model generates 3-4 suggested next actions
Scene detection — the system identifies scene transitions for pacing
Music detection — ambient music mode changes based on narrative mood

These run concurrently and don't block the response — the user sees the prose immediately while the background work completes.

Why this matters

A single model with a system prompt can generate decent roleplay text. But it can't:

Plan ahead — it has no persistent state, no story arcs, no planted seeds
Remember everything — it forgets facts as they fall out of the context window
Manage game state — combat, inventory, relationships need structured tracking
Recover from failures — if the model is down, everything is down

The agentic approach solves all of these by splitting responsibilities across specialized systems that coordinate per turn. The DM plans the story. TV manages knowledge. The narrator writes prose. Tools update game state. Each does what it's best at.

The result is an AI that doesn't just respond to what you said — it responds in the context of a planned narrative, with full knowledge of the world, while maintaining consistent game state. It's the difference between a chatbot and a game master.

That's what it means to build a roleplay agent.

← back_to_blog