Multi-AI Agent Patterns: Sub-agents vs Agent Teams — Architecture Deep Dive (2026)

The question we hear most often from engineers who have moved past the basics is not “what are subagents?” — it is “how do I decide which pattern to use, and when is the overhead not worth it?” That is the question this article answers.

We already have introductory coverage of subagents best practices and an Agent Teams practical guide. This article is not a repeat. It is the architecture layer above those — the decision framework, the cost reality, and the failure taxonomy that only becomes visible once you have run real workloads.

If you are building or evaluating a multi-agent system and need to make concrete design decisions today, start here.

The Fundamental Distinction: What Each Mechanism Is Actually For

Before any decision framework, get this distinction clear.

Sub-agents (invoked via the Task tool or AGENTS.md delegation) are a context isolation mechanism. Their primary purpose is not parallelism — it is giving a bounded piece of work its own fresh context window so it can complete without polluting or being polluted by the orchestrating session.

Agent Teams is a coordination mechanism. Its purpose is distributing a body of work across multiple independent sessions that self-organize through git-based task claiming and branch merging. Parallelism is a side effect of coordination, not the goal in itself.

This distinction has a practical consequence: you can use sub-agents in a sequential, non-parallel workflow and still get value — because isolation is valuable even when you do not need speed. Agent Teams, on the other hand, is only worth its setup cost when you have genuine parallelism to exploit.

Treating sub-agents as “poor man’s parallelism” and Agent Teams as “fancier sub-agents” leads to architectures that are neither fast nor clean.

Decision Matrix

Use this matrix before reaching for any multi-agent pattern. Each axis is a real design constraint.

Constraint	Lean sub-agent	Lean Agent Teams	Lean single-agent
Task granularity	Single focused task with clear I/O	Many tasks of similar size, parallel workload	One task, manageable scope
Context dependency	Task is self-contained or needs controlled handoff	Tasks are largely independent	Everything needs shared context
Parallelism value	Low-to-medium (sequential is fine)	High (real wall-clock savings)	Not applicable
State sharing	Isolated or read-only shared state	File-system separable (different modules/dirs)	Shared mutable state throughout
Cost tolerance	Moderate overhead acceptable	High parallel token spend acceptable	Minimal cost priority
Error tolerance	Partial failure acceptable with defined fallbacks	Independent task failure isolated to branch	Single failure = whole task fails
Determinism requirement	Predictable I/O expected	Output quality acceptable with some variation	High determinism required
CLAUDE.md maturity	Any	Well-structured CLAUDE.md essential	Not critical

Read this matrix as tendency, not prescription. A task that leans “Agent Teams” on three axes and “sub-agent” on two still needs judgment. The matrix surfaces the tensions you need to resolve, not the answer.

The two questions that cut through the matrix

When time-pressed, ask just these two:

Can the work be split at a file or module boundary? If yes and there are at least three such chunks, Agent Teams is worth considering. If no, sub-agents or single-agent.
Does correctness require a specific execution order? If yes throughout, use sequential sub-agents. If no for some steps, parallel sub-agents or Agent Teams.

Pattern Catalog

Seven patterns, ordered from simplest to most complex. Each includes a real-world trigger, the coordination model, a code sketch, and the cost/complexity tradeoff.

Pattern 1: Single Sub-Agent (TaskTool Delegation)

When to use: You have one focused task that would consume disproportionate context in the main session, or that benefits from a clean environment without the orchestrator’s accumulated state.

Structure: Orchestrator → one sub-agent → result returned → orchestrator continues.

# In CLAUDE.md or inline prompt

## doc-generator
Generates API documentation from TypeScript source files.

### Input
JSON object:
- files: string[] — list of .ts files to document
- output_dir: string — where to write output
- format: "markdown" | "jsdoc"

### Output
- Writes one .md or .jsdoc file per input file to output_dir
- Returns: { status: "done" | "failed", files_written: string[], errors: string[] }

### Constraints
- Read source files only, do not modify them
- Max 30 source files per invocation
- If a file cannot be parsed, log the error and continue — do not halt

Token profile: One additional context window, sized to the sub-agent’s task. The main session keeps its original context state. Net overhead: the coordination prompt (~200–500 tokens) plus whatever the sub-agent returns.

Where this breaks: Sub-agent scope creep. If the task description allows open-ended decisions, the sub-agent may do far more than intended and return an oversized result that burdens the orchestrator’s context. Always set explicit scope limits.

Pattern 2: Sequential Sub-Agents

When to use: A pipeline where step N depends on step N-1’s output, and each step benefits from its own isolated context.

Structure: Orchestrator → sub-agent A → handoff → sub-agent B → handoff → sub-agent C → final result.

# Pseudocode: sequential pipeline orchestration

async def run_pipeline(source_dir: str) -> dict:
    # Step 1: Analysis
    analysis = await invoke_subagent(
        agent="codebase-analyzer",
        input={"dir": source_dir, "depth": "deep-dive"}
    )
    if analysis["status"] == "failed":
        return {"status": "failed", "stage": "analysis", "reason": analysis["reason"]}

    # Step 2: Refactor planning (uses analysis output)
    plan = await invoke_subagent(
        agent="refactor-planner",
        input={
            "analysis": analysis["findings"],
            "constraints": {"max_changes_per_file": 50, "preserve_api": True}
        }
    )

    # Step 3: Implementation (uses plan)
    result = await invoke_subagent(
        agent="refactor-implementer",
        input={"plan": plan["tasks"], "verification_cmd": "npm test"}
    )

    return result

Handoff discipline is the critical variable. Each step receives only a structured summary of the previous step’s output — not the raw output. A raw dump of 3,000 words from step A into step B’s context wastes tokens and dilutes focus. Summarize at handoff.

Token profile: N context windows, each independent. If step A produces 1,500 tokens of output that step B ingests as a 200-token structured summary, you have saved 1,300 tokens per handoff. For a three-step pipeline, that compounds. Summarize aggressively.

Where this breaks: Error propagation. If step A’s output contains a subtle error, steps B and C will build on that error. Build explicit validation between steps: check the output format, run a sanity test, halt on anomalies before proceeding.

Pattern 3: Parallel Sub-Agents (Fan-Out / Fan-In)

When to use: Multiple independent tasks that each need focused context, where wall-clock time matters.

Structure: Orchestrator fans out to N sub-agents simultaneously, waits for all, fans in to an aggregator.

import asyncio

async def parallel_analysis(repo_dirs: list[str]) -> dict:
    # Fan-out: launch sub-agents concurrently
    tasks = [
        invoke_subagent(
            agent="security-reviewer",
            input={"dir": d, "output": f"review-output/security-{i}.md"}
        )
        for i, d in enumerate(repo_dirs)
    ]
    results = await asyncio.gather(*tasks, return_exceptions=True)

    # Fan-in: aggregate
    successful = [r for r in results if isinstance(r, dict) and r["status"] == "done"]
    failed = [r for r in results if isinstance(r, Exception) or r.get("status") == "failed"]

    aggregated = await invoke_subagent(
        agent="findings-aggregator",
        input={
            "reports": [r["output_path"] for r in successful],
            "failure_count": len(failed)
        }
    )
    return aggregated

The aggregator sub-agent is not optional. Without it, the main session ingests all N results and its context grows proportionally. An aggregator sub-agent takes the N outputs and returns one structured summary — the main session’s context stays bounded regardless of how many parallel branches you ran.

Token profile: N parallel context windows. Total token consumption is roughly N × (per-task tokens). Wall-clock time is max(individual task times) instead of sum(individual task times) — the parallelism benefit. The cost is real: running five parallel sub-agents costs approximately five times what a single sequential run would cost for the same total work. Factor this in before reaching for fan-out.

Where this breaks: Forgetting that parallel ≠ cheaper. Fan-out reduces time, not tokens. If your bottleneck is cost rather than latency, sequential may be better even when the tasks are independent.

Pattern 4: Agent Teams (Multi-Session Parallel)

When to use: A large, chunky body of work where tasks can be defined at a file or module boundary, each task needs real development work (not just a response), and you want the team to self-coordinate without micromanaging assignments.

Structure: Human describes goal and team size → Claude Code creates N worktrees, each agent claims tasks from a shared task list → agents commit to branches → continuous merge to integration branch.

# Enable the experimental feature
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

# Invoke a team
# Precise scope definition is the most important variable

Create an agent team with 4 teammates to migrate all REST endpoints in /src/api/
from Express to Hono.

Scope per teammate:
- Teammate 1: /src/api/users/ and /src/api/auth/
- Teammate 2: /src/api/products/ and /src/api/orders/
- Teammate 3: /src/api/payments/ and /src/api/webhooks/
- Teammate 4: /src/api/admin/ and all shared middleware in /src/api/middleware/

Each teammate must:
1. Migrate all routes in their scope to Hono syntax
2. Run: cd src/api && npm test -- --scope=<their-directory>
3. Fix any test failures before merging
4. Not touch files outside their assigned scope

Integration order: Teammate 4 (shared middleware) must complete and merge before others merge.

The key difference from parallel sub-agents: Agent Teams agents persist between task completions. They do not shut down after returning a result — they claim the next available task. This makes Agent Teams efficient for large workloads with many similar tasks (20+ endpoint migrations, for example) where the per-task overhead of spawning and contextualizing a new sub-agent would add up.

Token profile: Each agent runs a full independent session. A four-agent team running for 30 minutes each consumes roughly 4× the tokens of a single 30-minute session doing the same work sequentially. The wall-clock time collapses to roughly 1/4, but the cost does not. This trade-off is only favorable if the time savings are valuable to you (deadline pressure, human waiting on output).

Where this breaks: Shared mutable state. If three agents all need to modify config.ts or run database migrations, you do not have independent tasks — you have a coordination problem. Serialize those steps explicitly or handle them before launching the team.

Pattern 5: Hybrid — TaskTool Inside Agent Teams

When to use: Agent Teams for the bulk parallel work, with individual agents delegating their own sub-tasks via TaskTool when a specific phase warrants deeper isolation.

Structure: Agent Teams sets up N parallel sessions → each teammate uses TaskTool for its own focused sub-tasks → results merge back.

This is the pattern for complex refactors where each “module owner” agent needs to do something that benefits from further delegation — for example, running a documentation generator as a sub-agent rather than inline, to keep the teammate’s context focused on code logic.

## In AGENTS.md (each teammate loads this)

## subtask-doc-writer
Writes API documentation for a single module after refactoring is complete.
Invoke with TaskTool when: you have finished the refactor and tests pass.

### Input
- module_dir: path to the module directory
- style: "openapi" | "markdown"

### Output
- Writes docs to docs/<module_dir>/README.md
- Returns { status, words_written }

### Constraints
- Read-only access to module_dir (no code changes)
- Max output 2,000 words

Why this pattern exists: A teammate in an Agent Teams session has its own context window, which grows as it works. Offloading a well-defined sub-task (documentation, validation, reporting) to a TaskTool invocation keeps the teammate’s context focused on its primary work and avoids context exhaustion in long-running sessions.

Token profile: Agent Teams base cost × N, plus the sub-task cost per sub-delegation. Most expensive pattern in the catalog. Use it when the work is genuinely large enough to justify both the team setup overhead and the sub-agent overhead within each teammate.

Pattern 6: Stateless Microagent Pool

When to use: Repeated invocations of a well-defined, stateless operation across many inputs — validation, classification, extraction, transformation.

Structure: Orchestrator maintains a queue → spawns sub-agents on demand or in batch → collects results → no state persists between invocations.

# Batch validation pattern

async def validate_all(files: list[str], batch_size: int = 10) -> dict:
    results = {}
    for batch in chunked(files, batch_size):
        batch_results = await asyncio.gather(*[
            invoke_subagent(
                agent="schema-validator",
                input={"file": f, "schema": "product-v2.json"}
            )
            for f in batch
        ])
        for f, r in zip(batch, batch_results):
            results[f] = r

    return {
        "valid": [f for f, r in results.items() if r["valid"]],
        "invalid": [(f, r["errors"]) for f, r in results.items() if not r["valid"]]
    }

This is the most token-efficient multi-agent pattern. Because the agent is stateless and the task is tightly bounded, each invocation is small. The overhead is proportional only to the I/O, not to accumulated context.

Where this breaks: When “stateless” is a lie. If your microagent actually needs to read context from previous invocations to do its job correctly, it is not stateless — and treating it as such produces inconsistent results. Make the state explicit in the input instead.

Pattern 7: Orchestrator-as-Router

When to use: A multi-domain task where the right sub-agent depends on classifying the input first. The orchestrator routes, sub-agents execute, results aggregate.

Structure: Orchestrator receives diverse inputs → classifies each → routes to domain-specific sub-agents → collects and unifies outputs.

async def route_and_execute(tasks: list[dict]) -> list[dict]:
    # Phase 1: classify all tasks (single sub-agent)
    classified = await invoke_subagent(
        agent="task-classifier",
        input={"tasks": tasks}
        # Returns: [{task, agent_type: "security"|"perf"|"api"|"data"}]
    )

    # Phase 2: group by agent type, execute in parallel groups
    groups = group_by(classified["tasks"], key="agent_type")
    group_results = await asyncio.gather(*[
        invoke_subagent(agent=agent_type, input={"tasks": group})
        for agent_type, group in groups.items()
    ])

    return flatten(group_results)

The classification step is the key. Without explicit routing, an orchestrator that tries to handle all input types itself will either write a bloated CLAUDE.md (trying to cover all domains) or produce generic output. Dedicated classifiers + domain specialists produce more accurate results with cleaner context.

Cost and Token Analysis

This is the part most guides skip. Real numbers change decisions.

Baseline measurements

These measurements come from our own workloads (Claude Sonnet 4.5 on Claude Code Max). They are directional, not precise benchmarks — your results depend on task complexity, CLAUDE.md size, and output verbosity.

Pattern	Relative token cost	Wall-clock vs sequential	Break-even threshold
Single sub-agent	1.1–1.3×	Same (sequential)	Any task >500 tokens that benefits from isolation
Sequential sub-agents (3 steps)	1.2–1.5×	Same	When each step’s context savings exceed 0.2× per step
Parallel sub-agents (4 fan-out)	3.8–4.2×	~0.25× (4× faster)	When latency reduction > cost increase
Agent Teams (4 agents)	3.5–4.5×	~0.3× (3× faster)	When task is >~20 parallelizable work units
Hybrid TaskTool+Teams	5–7×	~0.25×	Large repos with genuinely hierarchical parallelism
Stateless microagent pool	1.05–1.1× per call	Configurable	Whenever repeated transformation is needed
Orchestrator-as-router	1.2–2× depending on diversity	Similar to parallel	When domain specialization produces meaningfully better output

The token efficiency formula

For any multi-agent design, estimate before building:

Expected total tokens =
  orchestration_overhead           # ~200–800 tokens per agent spawned
  + sum(per_agent_context_windows) # each agent's full context
  - context_isolation_savings      # tokens NOT added to main session
  + handoff_overhead               # structured summaries passed between agents

If (context_isolation_savings - orchestration_overhead - handoff_overhead) < 0:
  Multi-agent is not cost-efficient for this task.

The formula is rough but forces the question most engineers skip: are you actually saving context, or just redistributing it?

When parallelism pays for itself

The real calculation for parallel patterns is about opportunity cost, not just token cost:

If a single-agent run takes 40 minutes and blocks a human from proceeding, and a 4-agent parallel run takes 12 minutes and costs 4× the tokens — the question is what the human’s 28-minute wait costs. For code review blocking a deploy, the parallelism almost always wins. For background processing with no deadline pressure, it usually doesn’t.
Claude Code Max plan costs are flat-rate within plan limits. Token efficiency matters more when you are near plan limits or on pay-per-token API usage.

Failure Mode Taxonomy

Every multi-agent system has failure modes that are distinct from single-agent failure. Knowing the taxonomy prevents building systems that fail in hard-to-debug ways.

F1: Context bleed

What it is: State from one agent session leaks into another — not through intent, but through shared file reads, environment variables, or implicit assumptions baked into CLAUDE.md.

Example: Two parallel agents both read config.yaml at the start of their session. Agent A modifies it during its task. Agent B, which started after A’s modification but has not re-read, makes decisions based on the pre-modification state.

Prevention: Agents that operate on shared mutable resources must re-read before acting. Document this explicitly in AGENTS.md: “Re-read [file] immediately before any operation that depends on its current state.”

F2: Cascading specification error

What it is: A misspecified sub-agent produces output with a subtle error. The next agent in the pipeline accepts it as valid and builds on it. The error amplifies through the chain.

Example: A research agent returns findings that misclassify a third-party library as actively maintained (it is not). The implementation agent, trusting the research output, integrates the library. The test agent, following the implementation, writes tests against the library. All three agents “succeed” — the error is invisible until production.

Prevention: Insert validation checkpoints between pipeline stages. Validate format and, where possible, semantics before passing output to the next stage. Do not trust sub-agent output blindly.

F3: Parallel write conflict (silent data loss)

What it is: Two parallel agents both write to the same output location. The second write silently overwrites the first. Neither agent errors — the conflict is invisible in the agent’s own view.

Example: Fan-out with four agents all writing to output/results.json. Each agent’s write looks successful from its perspective. The final file contains only the last agent’s output.

Prevention: Use unique output paths per agent (agent ID in the filename). Aggregate explicitly in a fan-in step. Never assume a shared write path is safe in a parallel system.

F4: Token budget exhaustion mid-pipeline

What it is: A sequential pipeline runs out of context budget partway through, truncating the last agent’s available context or causing the session to end prematurely.

Example: A three-step pipeline where step 1 generates a 4,000-token output, step 2 accepts it, generates another 3,000 tokens of output, and step 3 is supposed to synthesize both — but by step 3 the context window is at 95% capacity and the agent produces degraded output.

Prevention: Budget context at design time. Sum the expected output sizes of each stage plus the handoff overhead. If the total approaches the context limit, either compress handoffs more aggressively or split the pipeline across separate orchestration runs.

F5: Task claim race condition

What it is: In Agent Teams, two agents simultaneously claim the same task before the lock file protocol fully commits.

Example: Two teammates both check the task list, both see task-7 as unclaimed, both begin creating the claim file. Depending on file system timing, one or both may believe they own the task. Result: duplicate work, potential conflicts on the same files.

Prevention: Use atomic file operations for task claiming. The recommended approach is creating the claim file with an exclusive write — if the file already exists, the second agent fails the create and moves to the next available task. This is more reliable than check-then-write.

# Atomic claim: fails if file exists (ln is atomic on most filesystems)
ln /dev/null .tasks/claimed/task-7.lock 2>/dev/null || echo "already claimed, skipping"

F6: Agent drift in long-running teams

What it is: In Agent Teams, agents that run for a long time accumulate context and may drift away from the original task spec as their session grows.

Example: An agent 45 minutes into a large refactor has accumulated so much context about edge cases and decisions it has made that its behavior on new sub-tasks starts to deviate from the original spec. It makes architectural choices that contradict the system’s conventions because its local context has overridden them.

Prevention: Explicitly re-state key constraints in the CLAUDE.md that agents read at session start. Critical constraints (naming conventions, do-not-modify boundaries, verification commands) should be visible in the CLAUDE.md rather than only in the initial task description, which gets diluted as context grows.

F7: Aggregation hallucination

What it is: An aggregator sub-agent synthesizes N results and, under context pressure, fabricates or conflates findings from multiple sources.

Example: A research team with five agents each produces a report. The aggregator receives all five and synthesizes a unified analysis. Under pressure to be concise, it attributes a finding from report 3 to report 1’s domain, or invents a conclusion that is not explicitly present in any report.

Prevention: Structure aggregator input so each source is clearly labeled. Require the aggregator to cite its source for each synthesized claim: "Finding: X (Source: report-2.md, section: 'Performance')." Verification against source material becomes possible.

When NOT to Multi-Agent

This section is the one most architects skip, and skipping it is expensive.

The complexity tax

Every multi-agent system adds overhead: more prompts to maintain, more failure modes to handle, more coordination logic to reason about, more token spend. This tax is real and ongoing. Before committing to any multi-agent pattern, the expected benefit must exceed the complexity tax.

Red flags that indicate single-agent is better:

The task is context-dependent throughout. If each step genuinely needs to know what every previous step did — not just a summary, but the actual decisions and reasoning — context isolation hurts more than it helps. The overhead of structured handoffs exceeds the savings.

The task is small. Sub-agent overhead becomes meaningful when the sub-task itself is at least several hundred tokens of real work. If you are delegating a task that takes 50 tokens to complete, the coordination overhead is larger than the task itself.

You do not have a well-structured CLAUDE.md. Agent Teams without a strong CLAUDE.md is not Agent Teams — it is expensive context exploration multiplied by N. If your CLAUDE.md is not already in good shape, fix that first. Do not add agents on top of unclear instructions.

The failure mode is not recoverable. For tasks where a wrong step cannot be undone (sending emails, modifying production data, committing to a release branch without review), multi-agent introduces more ways to proceed incorrectly before a human catches the error. Add human checkpoints explicitly, or use single-agent with explicit confirmation prompts.

You need tight output determinism. Multi-agent systems produce more variance than single-agent. If you need the same input to produce the same output reliably, parallel and team patterns add variance through non-deterministic execution ordering. Single-agent, potentially with explicit sampling parameters, gives better determinism.

The “it feels faster” illusion

Running four parallel agents feels productive. The dashboard shows activity. Things are happening. This feeling is real but misleading: the question is not whether things are happening, but whether the aggregate output quality and cost are better than what a single well-specified agent would produce.

For exploratory work, brainstorming, and research across diverse domains, parallel agents genuinely perform better — the diversity of context windows produces more independent findings. For focused implementation work in a well-understood codebase, a single agent with a good CLAUDE.md frequently outperforms a team — because it maintains coherent architectural intent across the whole work.

Measure both the output and the cost before concluding that the multi-agent version is better.

FAQ

Q: Can I mix sub-agents and Agent Teams in the same workflow?

Yes, and Pattern 5 (Hybrid) shows why this is sometimes the right answer. The key is being intentional about which layer handles which kind of work. Agent Teams handles the bulk parallel execution. TaskTool sub-agents handle focused isolation within each teammate’s session. Do not use the hybrid pattern by accident — use it because you have a specific reason both layers are needed.

Q: How do I know if my task decomposition is actually parallelizable?

Draw the dependency graph. Each node is a task, each edge is a dependency. If the graph is a DAG (directed acyclic graph) with multiple branches that do not converge until the end, it is genuinely parallelizable. If every node depends on the previous node, it is sequential. If the graph has cycles, your task definition has a problem.

Q: Should I write AGENTS.md before or after getting basic single-agent workflows working?

After. Build the workflow with a single agent first. Identify the specific points where context isolation would help — where the agent’s accumulated context is slowing it down or producing worse results. Write AGENTS.md entries for exactly those points. Writing AGENTS.md speculatively produces over-engineered delegation that never gets used.

Q: What is the practical context window limit before I should consider sub-agents?

We have found that single-agent performance starts degrading noticeably around 60–70% context utilization. When a session is spending meaningful tokens re-reading context it already processed earlier, or when the model starts contradicting earlier decisions, it is time to delegate the next major task to a sub-agent rather than continuing inline.

Q: How do I handle a sub-agent that returns malformed output?

Validate immediately on return, before passing to any subsequent step. Define a JSON schema for all sub-agent outputs and validate against it. Treat schema validation failure the same as a hard error — do not attempt to recover from malformed structure in-flight, as this produces the cascading specification error failure mode (F2 above).

Q: Is Agent Teams actually production-ready in 2026?

Production-ready with caveats. The coordination mechanism (worktree isolation, task claiming, branch merging) is stable enough for real codebases. The experimental flag signals that the invocation API and default behaviors may change between Claude Code releases, not that the underlying mechanics are broken. Pin your Claude Code version if you are building automation on top of it, and re-test after updates.

Q: Can I use Agent Teams without git?

Not effectively. The coordination mechanism is git-based — worktrees, branches, and merges are the primitives that prevent parallel agents from stepping on each other. If your workflow does not involve a git repository, use parallel sub-agents with explicit output path isolation instead.

Q: What is the largest practical Agent Teams team size?

In practice, 3–5 agents covers most parallel workloads without the coordination overhead dominating the actual work. Beyond 5, the complexity of defining non-overlapping task boundaries, preventing conflicts, and monitoring progress typically exceeds the parallelism benefit. The rare exception is highly regularized work (migrating 200 identical endpoints, processing 500 independent files) where task boundaries are completely mechanical and agent coordination is minimal.

Browse the Prompt Shelf rules gallery for real-world AGENTS.md files from production codebases. For the cost side of multi-agent systems, see our Claude Code cost optimization guide.