Multi-Agent Orchestration for AI Chat Systems
How to build a chat system where multiple specialized AI agents handle different types of requests instead of one agent trying to do everything.
I was building a chat interface where users do very different things in the same conversation. Sometimes they ask one type of question, sometimes a completely different type, sometimes they want to trigger actions.
One agent cant handle all of this. An agent optimized for Task A needs a different system prompt and toolset than one built for Task B. Give one agent access to everything and it gets confused about which tool to use.
So I split it into three specialized agents behind an orchestrator.
Three agents, one router
The orchestrator is a Node.js server between the frontend and the agents. Looks at the request, routes it.
Agent A handles general operations — Claude with MCP tools to interact with external services. "Create a new record," "update this thing," that kind of request.
Agent B handles data queries. Instead of Claude generating queries through tool calls (burns tokens, slow back-and-forth), I have a separate service that converts natural language to structured queries and runs them directly.
Agent C handles file processing. User uploads a file, wants structured data out. This agent runs the parsing, extraction, and validation pipeline.
Routing is explicit, not AI-based
I tried having Claude decide which agent to call. Bad idea. Extra LLM call on every message (latency + cost) and it sometimes routed wrong.
Now the frontend sends a mode flag with each message. The UI context determines which flag based on what the user is doing. Otherwise it hits the default agent.
Less cool than "AI figures out the right agent." Way more reliable, way faster. The user already knows what they're doing based on which screen they're on.
Each agent is a separate process
Different agents need different things. Agent B needs data source credentials. Agent C needs processing libraries. Agent A needs MCP tool configs. Keeping them separate means each one only has access to what it needs, and they scale independently. If one agent is the bottleneck, scale that service, leave the others alone.
The orchestrator talks to each agent over HTTP. All three return SSE in the same format, so the frontend doesnt know or care which agent handled the request.
MCP client pooling
Agent A uses MCP to give Claude access to external tools. Each MCP client spawns a child process (stdio transport). Creating a new one per request was adding ~2 seconds of overhead. Painful.
Connection pool fixed it. Clients keyed by chat ID + tenant ID, reused across messages in the same conversation, cleaned up after 5 minutes idle. Per-request overhead dropped to ~100ms.
The tricky part: tenant isolation. Each tenant's MCP tools carry different auth context, so you cant share clients across tenants. Pool key includes tenant ID to prevent cross-contamination.
Message persistence
Every message goes to the database — user messages and assistant responses. For responses with tool calls, I store the full interaction (tool name, arguments, results) as JSON alongside the text.
Two reasons: users see their full conversation history when they come back, including what the AI did. And when something breaks, you can trace exactly which tools were called with what data.
What I learned
Specialized agents with dumb routing beat one general agent. Every time. Agent B generates better queries because its prompt only thinks about data structure. Agent C extracts better because its prompt only thinks about file formats. Agent A makes fewer mistakes because it isnt drowning in tools it doesnt need.
The whole orchestrator is about 2000 lines of TypeScript. The hard part isnt the code. Its figuring out where to draw the boundaries between agents.