Запихнуть retrieval, coding и review в один LLM Agent — на scale получите context overflow и SPOF. Этот гайд для AI engineers и архитекторов: по состоянию на июнь 2026 — 6 паттернов оркестрации, LangGraph/CrewAI/AutoGen benchmark, MCP+A2A dual-layer protocol, production engineering, observability, 4 антипаттерна и decision tree. Runnable code examples + рекомендации по remote Mac как 7×24 execution layer.
2024–2025: AI agents вышли из lab в production. Большинство команд быстро увидели: всё в один LLM Agent = collapse на scale. Проблема не в модели — в архитектуре.
Context window bottleneck: промежуточные результаты complex tasks забивают context — quality inference деградирует.
Dilution специализации: один agent делает search + code + review — всё посредственно.
Serial execution overhead: subtasks строго последовательно — total time = sum(steps), zero parallelism.
SPOF: agent упал — весь pipeline стоит.
MLflow Report 2026: Google internal Agent Bake-Off — distributed multi-agent architecture снизила processing time с 1 часа до 10 минут (6×+). AdaptOrch (2026): выбор orchestration topology влияет на performance сильнее, чем выбор base model — на SWE-bench правильная topology даёт +12–23%.
«Orchestration topology > model selection — как организована collaboration важнее, чем какой LLM под капотом.»
MAS — набор независимых AI agents, координируемых через communication protocol и orchestration mechanism для задач, которые single agent не тянет эффективно. На agent: role specialization, tool access, state isolation, replaceability.
| Control mode | Topology | Pros | Cons |
|---|---|---|---|
| Centralized | Orchestrator → A/B/C | Auditable, controllable | Orchestrator bottleneck |
| Decentralized | Agent-to-agent P2P | High elasticity, low latency | Hard to debug, high nondeterminism |
| Hierarchical | Top Orchestrator → Team Lead → Worker | Balanced tradeoff | Medium design complexity |
Шесть паттернов ниже закрывают 95%+ multi-agent production scenarios. Знать, когда какой применять — core skill в agentic AI engineering.
| Pattern | Core idea | Use case | Framework API |
|---|---|---|---|
| 1. Sequential pipeline | A output → B input, strict linear | Hard dependencies (content, code review) | LangGraph add_edge |
| 2. Parallel fan-out/fan-in | Concurrent agents, merge node | Independent subtasks, latency reduction | LangGraph Send API + Reducer |
| 3. Hierarchical supervisor-worker | Supervisor decomposes + routes | Multi-domain, dynamic routing | Keyword fast-path + LLM router |
| 4. Swarm | P2P handoff, no central coordinator | Multi-round debate (review, evaluation) | AutoGen GroupChat |
| 5. Blackboard | Shared workspace, conditional triggers | Long-running async (hours to days) | Shared state + precondition check |
| 6. Hybrid | Pattern composition | Enterprise content: intent routing + parallel research + QA | Supervisor + pipeline combo |
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
class PipelineState(TypedDict):
query: str; retrieved_docs: str; analysis: str; final_report: str
def retrieval_agent(state): return {"retrieved_docs": search_knowledge_base(state["query"])}
def analysis_agent(state): return {"analysis": llm.invoke(f"Analyze: {state['retrieved_docs']}").content}
def writer_agent(state): return {"final_report": llm.invoke(f"Write: {state['analysis']}").content}
builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()
Total time = max(T1, T2, ..., Tn), не sum. LangGraph Send API возвращает list of Send objects — subgraphs реально parallel; с Annotated[list, operator.add] Reducer branches merge без manual locks.
Layer 1: keyword fast-path (zero LLM call, <1 ms). Layer 2: LLM precision router для complex/ambiguous intents — типично для Replit code assistant, enterprise support.
AutoGen GroupChat + max_round=6 как hard cap против infinite loops. Warning: high nondeterminism — в production осторожно; hierarchical patterns обычно safer.
Blackboard — для long-running workflows с unpredictable routing. Самый частый hybrid: «Intent router → simple query direct answer / complex report via Supervisor + parallel research fan-out + QA pipeline + human review».
| Dimension | LangGraph | CrewAI | AutoGen (Microsoft) |
|---|---|---|---|
| Paradigm | State machine graph | Role-based team | Conversational multi-agent |
| State management | Native | DIY | Limited |
| Human-in-the-Loop | Native interrupt() | DIY | Supported |
| Observability | LangSmith (commercial) | Limited | Azure Monitor |
| Production readiness | 5/5 | 3/5 | 4/5 |
| Rapid prototyping | 3/5 | 5/5 | 4/5 |
| Best for | Complex stateful workflows, compliance verticals | Role-based content pipelines | Dialog collaboration, Azure stack |
LangGraph: production reliability, complex state persistence, fine-grained HITL, conditional branches/loops. CrewAI: prototype за 1–2 дня, teams интуитивно понимают «roles». AutoGen: Microsoft/Azure stack, multi-round debate + iterative inference.
2026: multi-agent communication стандартизирована в два complementary layers под Linux Foundation Agentic AI Foundation (AAIF):
/.well-known/agent.json Agent Card — orchestrator discovers + delegates via JSON-RPC 2.0.// /.well-known/agent.json — A2A Agent Card example
{
"name": "ResearchAgent", "version": "1.0",
"description": "Specialized retrieval and summarization agent",
"url": "https://research-agent.internal/a2a",
"capabilities": { "streaming": true, "async": true },
"skills": [
{ "id": "web_research", "name": "Web research", "tags": ["research", "web"] },
{ "id": "academic_search", "name": "Academic literature search" }
]
}
State persistence + checkpoint resume: LangGraph PostgresSaver checkpoints; thread_id cross-process recovery — process restart не теряет state.
Human-in-the-Loop: interrupt() pause на high-risk ops (prod DB mutation) — ждёт human approve/reject.
Circuit breaker + retry: CLOSED/OPEN/HALF_OPEN — threshold failures → temporary block, cascade prevention.
Token budget control: TokenBudgetManager pre-check remaining budget per agent call; overflow → BudgetExceededException.
MAST study (1,642 execution traces) — failure distribution в multi-agent systems:
| Failure type | Share | Description |
|---|---|---|
| System design issues | 41.77% | Duplicate steps, wrong tool selection, context overflow, missing termination |
| Inter-agent misalignment | 36.94% | Context loss at handoff, hallucination becomes next agent's «fact» |
| Task validation failure | 21.30% | Premature termination, incomplete validation |
57% orgs run agents in production, only 8% shipped full LLM observability — errors return HTTP 200: dashboard green, output wrong. Core metrics: E2E task completion (>85%), P95 latency (<30s), per-agent error rate (<5%), LLM-as-Judge quality score.
Context contamination: Agent A hallucination propagates to B, C. Mitigation: schema validation + confidence threshold (<0.7 reject) на каждом handoff point.
Infinite loops + cost runaway: Hard caps: MAX_ITERATIONS=10, MAX_TOOL_CALLS_PER_AGENT=20, MAX_TOTAL_TOKENS=50_000; interrupt_before на expensive tools.
Over-engineering: Simple 2-step LLM chain → 8 agents. Rule: start sequential pipeline; optimal agent count в production обычно 3–8.
Demo-to-prod gap: Add ProductionGuardrails — input length limit, prompt injection detection, PII filter, harmful content detection.
LangGraph parallel branch sync bug: после Send API dispatch Supervisor может re-run пока slow branch не finished — duplicate execution. Fix: defer=True на Supervisor node = explicit sync barrier.
Clear linear dependency? Yes → subtasks parallelizable? No → sequential pipeline; Yes → parallel fan-out + pipeline hybrid.
No linear dep → authoritative decision agent? Yes → need sub-teams? No → Supervisor-Worker; Yes → hierarchical (Supervisors of Supervisors).
No authority → long async? Yes → blackboard; No → agents ≤5 + clear termination? Yes → swarm (hard cap); No → refactor to hierarchical.
Framework: compliance/finance/healthcare → LangGraph; rapid prototype/role content → CrewAI; Azure stack/debate → AutoGen.
Protocols: greenfield → MCP (tool integration) + A2A (inter-agent delegation) сразу — избегайте migration tax.
Production deploy: PostgreSQL checkpoints + OpenTelemetry distributed tracing + LLM-as-Judge eval + remote Mac 7×24 execution layer.
2–3 agents на laptop — trivial demo. Long multi-agent sessions + parallel subprocesses + stacked stdio MCP servers = 16 GB machine в constant swap; cheap Linux VPS не host'ит macOS toolchains для build agents. Pure local fails на session stability, Keychain isolation, lid-close interrupt.
Команды, которые крутят multi-agent как production infra + параллельно Cursor / Claude Code agents и iOS CI, выигрывают от dedicated cloud Mac как execution host. NodeMini Mac Mini cloud rental = 7×24 agent execution layer: swap LLM/orchestration framework — SSH nodes и tool config не трогаем. Specs: тарифы аренды; onboarding: Help Center.
«Сначала sequential pipeline — докажите core value. Parallelism и hierarchy только по concrete need. Production sweet spot: 3–8 agents.»
Multi-agent = несколько role-specific independent agents с orchestration, isolated context и toolset. Single-agent = всё в один LLM — на scale: context overflow, skill dilution, SPOF. Google Bake-Off: distributed architecture = 6× speedup.
LangGraph — complex stateful workflows, regulated verticals (finance, healthcare). CrewAI — 1–2 day prototype, role-based content pipelines. AutoGen — Microsoft/Azure stack, debate-style collaboration. Hardware recs: тарифы аренды.
MCP = vertical layer — agent ↔ tools/external systems («write once, use everywhere»). A2A = horizontal layer — agent ↔ agent task delegation + capability discovery. Complementary, AAIF/Linux Foundation governance. См. MCP protocol deep dive.
Light prototypes — local OK. Long sessions + parallel subprocesses + MCP servers → dedicated remote Mac 7×24. Onboarding: Help Center.