When One Agent Isn't Enough — How Multi-Agent Systems Work

2026-03-08·8 min read·aiagentsengineeringarchitecture

A single agent with 50 tools is a mess. Multi-agent systems solve this by specialization. Router + specialists, pipelines, debate patterns, and how shared memory keeps agents coordinated.

A single agent with 50 tools is a mess. The LLM is overwhelmed by choice. It picks the wrong tool 15% of the time. Its system prompt is 3,000 tokens of instructions covering every possible domain.

Multi-agent systems solve this by specialization. Instead of one agent that does everything, you have multiple agents that each do one thing well. A router decides which specialist handles each request.

Pattern 1: Router + Specialists

The most common and most practical pattern.

USER MESSAGE
    |
    v
[ROUTER AGENT]
    |
    |--> Health question    --> [HEALTH AGENT] (tools: medical DB, lab results, medication info)
    |--> Career question    --> [CAREER AGENT] (tools: resume, job search, proposal writer)
    |--> Finance question   --> [FINANCE AGENT] (tools: budget tracker, tax calculator)
    |--> General question   --> [GENERAL AGENT] (tools: web search, file access)
    |
    v
RESPONSE

The router is a lightweight LLM call (or even a classifier) that reads the user's message and decides which specialist handles it. The specialist has a focused system prompt, focused tools, and focused expertise.

Why this is better than one agent:

Each specialist's system prompt is shorter and more focused (500 tokens vs 3,000)
Each specialist sees only relevant tools (5-10 vs 50)
The LLM makes better decisions with fewer options
Specialists can use different models (cheap model for simple tasks, expensive model for complex ones)

How routing works in practice:

Option A: LLM-based routing. The router is an LLM that reads the message and outputs a category. This is flexible (handles ambiguous messages) but adds one LLM call of latency.

Option B: Keyword/embedding-based routing. Fast classification based on keywords or semantic similarity. "Blood pressure" -> health. "Invoice" -> finance. This is faster but brittle for ambiguous messages.

Option C: Native semantic routing. The framework reads agent descriptions and routes semantically. No separate routing step — the LLM's native language understanding decides. This is what Claude Code's subagent system does.

The Nyxa architecture uses this pattern. Seven specialized agents — health, career, finance, marketing, philosophy, presence, spirituality — each with their own system prompt, tools, model selection, and memory. The main LLM (Claude Opus) routes semantically based on agent descriptions.

Pattern 2: Pipeline

Agents in sequence. Each one processes and passes to the next.

[INTAKE AGENT]
    Collects information, validates input
    |
    v
[ANALYSIS AGENT]
    Processes data, runs computations
    |
    v
[QUALITY CHECK AGENT]
    Validates output, catches errors
    |
    v
[DELIVERY AGENT]
    Formats and delivers result

Use when: The task has clear sequential phases. Document processing (extract -> analyze -> validate -> format). Content creation (research -> draft -> edit -> publish). Code review (understand -> analyze -> report -> suggest fixes).

Advantage over single agent: Each phase uses a model and prompt optimized for that specific task. The analysis agent doesn't waste context on formatting instructions. The quality check agent doesn't waste context on data collection.

Disadvantage: Information loss between stages. Each handoff summarizes the previous agent's work, potentially dropping details. And the pipeline is rigid — if the analysis agent discovers the input is invalid, there's no easy way to loop back to intake.

Pattern 3: Debate / Collaboration

Multiple agents discuss and reach consensus.

[AGENT A: Bull Case]  <-->  [AGENT B: Bear Case]
        |                           |
        v                           v
              [JUDGE AGENT]
                   |
                   v
              FINAL ANSWER

Use when: You want diverse perspectives on a complex question. Investment analysis (bull case vs bear case). Legal analysis (prosecution vs defense). Risk assessment (optimist vs pessimist).

Each agent argues a position. They can read each other's arguments and counter them. A judge agent synthesizes the debate into a final recommendation.

Why this works: Single agents have a confirmation bias — they commit to their first conclusion and build supporting arguments. Debate forces counter-arguments to be generated, producing more balanced analysis.

Why this is expensive: Multiple agents, multiple rounds of discussion, a synthesis step. A 3-round debate between 2 agents + 1 judge = ~10 LLM calls. Cost and latency multiply.

Practical reality: This pattern is mostly used in research and high-stakes analysis, not production applications. The cost/benefit ratio only makes sense when the decision is expensive enough to justify thorough analysis.

Shared Memory: How Agents Stay Coordinated

In multi-agent systems, agents need to share information. Three approaches:

Message Passing

Agents communicate by passing messages. Agent A's output becomes Agent B's input. Simple but lossy — information must be serialized into text and parsed by the next agent.

Shared State Object

All agents read from and write to a shared state object (database, Redis, file). Each agent sees the full state and can modify it.

{
  "patient": {"name": "Priya", "id": "P12345"},
  "health": {"latest_labs": "...", "medications": "..."},
  "finance": {"outstanding_balance": 45000},
  "appointments": [{"date": "Mar 27", "type": "monitoring"}]
}

The health agent updates health. The finance agent updates finance. All agents can read everything.

Advantage: No information loss. All context is available to all agents. Disadvantage: State can become large and unwieldy. Agents might conflict (two agents updating the same field simultaneously).

Cross-Domain Memory

Each agent maintains its own memory file. A cross-pollination mechanism detects when one agent's insights are relevant to another.

The Cost Reality

Multi-agent systems multiply costs:

Architecture	LLM Calls per User Query	Relative Cost
Single agent	1-5	1x
Router + specialist	2-6 (1 route + 1-5 specialist)	1.2-1.5x
Pipeline (4 stages)	4-8	2-3x
Debate (3 agents, 3 rounds)	10-15	5-8x
CrewAI crew (3 agents)	6-15	3-6x

For most applications, router + specialist is the sweet spot. You pay a small overhead for routing but each specialist is more efficient (shorter prompts, fewer tools, faster responses).

Debate and pipeline patterns are justified only when task quality needs to be significantly higher than what a single agent produces.

When to Use Multi-Agent (And When Not To)

Use multi-agent when:

Your single agent's system prompt exceeds 2,000 tokens (it's trying to do too much)
Your single agent has 20+ tools (too many options, decision quality drops)
Different tasks genuinely need different models (cheap for simple, expensive for complex)
Domain expertise varies significantly across tasks (health vs finance vs career)
You need an audit trail showing which specialist handled what

Don't use multi-agent when:

Your task is simple (3-5 tools, one domain)
Latency is critical (routing adds 1-2 seconds)
You're prototyping (start with one agent, split later when you see the pain)
The agents would have identical tools and prompts (splitting for its own sake)

Building Multi-Agent in Practice

A minimal multi-agent system in Python:

agents = {
    "health": {
        "system_prompt": "You are a health advisor...",
        "tools": [get_labs, get_medications, schedule_appointment],
        "model": "gpt-4o"
    },
    "finance": {
        "system_prompt": "You are a financial advisor...",
        "tools": [get_balance, calculate_emi, get_insurance],
        "model": "gpt-4o-mini"  # simpler tasks, cheaper model
    },
    "general": {
        "system_prompt": "You are a helpful assistant...",
        "tools": [web_search, file_read],
        "model": "gpt-4o-mini"
    }
}

def route(message):
    # Simple LLM routing
    category = llm.classify(
        f"Classify this message into: health, finance, general.\n{message}"
    )
    return agents[category]

def handle(message):
    agent = route(message)
    return agent_loop(agent, message)

30 lines. Three specialized agents. Semantic routing. Different models per agent.

You don't need LangGraph, CrewAI, or AutoGen to build multi-agent. You need a routing function, a set of agent configurations, and the same agent loop from the previous post.

Summary

Multi-agent is specialization applied to AI:

Router + specialists for domain routing (most practical, use this by default)
Pipeline for sequential processing
Debate for high-stakes analysis requiring diverse perspectives

The architecture should follow the need. Start with one agent. Split when it gets unwieldy. Route semantically. Keep shared state minimal. Monitor costs.

The best multi-agent system is the one where each agent is so focused that it handles its domain better than any single generalist could. Not because there are more agents. Because each agent is better at its job.

This is post 16 of the AI Engineering Explained series.

Next post: RAG — Retrieval-Augmented Generation. How to give AI knowledge it doesn't have, and why it breaks more often than you'd expect.

← More posts Home