LangGraph, CrewAI, Claude Code — How Agent Frameworks Actually Differ

2026-03-05·8 min read·aiagentsengineeringframeworks

Three dominant approaches to building agents. Graph-based, role-based, and code-execution. Each makes different tradeoffs. Choosing wrong means rewriting everything in three months.

Last post: what agents are. The loop, the tools, the memory, the planning.

This post: how they get built. Three dominant approaches exist. Each makes different tradeoffs. Choosing wrong means rewriting everything in three months.

The Three Patterns

Pattern 1: Graph-Based (LangGraph)

LangGraph models an agent as a directed graph. Nodes are steps. Edges are conditions. The agent flows through the graph based on decisions at each node.

START
  |
  v
[CLASSIFY_INTENT] -- "scheduling" --> [LOOKUP_CALENDAR]
                  -- "medical"    --> [ESCALATE_TO_NURSE]
                  -- "billing"    --> [QUERY_BILLING_SYSTEM]
  |
  v
[LOOKUP_CALENDAR] --> [PROPOSE_TIMES] --> [CONFIRM_WITH_USER] --> [BOOK] --> END

Each node is a function. The LLM runs inside specific nodes (intent classification, response generation). Other nodes are pure code (database queries, API calls, validation).

LangGraph's architecture:

from langgraph.graph import StateGraph

# Define state (shared across all nodes)
class AgentState(TypedDict):
    messages: list
    intent: str
    patient_id: str
    appointment: dict

# Define nodes (functions that modify state)
def classify_intent(state):
    # LLM classifies user intent
    intent = llm.classify(state["messages"])
    return {"intent": intent}

def lookup_calendar(state):
    # Pure code: query database
    appointments = db.get_appointments(state["patient_id"])
    return {"appointment": appointments}

# Build graph
graph = StateGraph(AgentState)
graph.add_node("classify", classify_intent)
graph.add_node("lookup", lookup_calendar)
graph.add_edge("classify", "lookup", condition=lambda s: s["intent"] == "scheduling")

Strengths:

Explicit control flow. You can see exactly what happens in what order.
Deterministic where you want it (code nodes), flexible where you need it (LLM nodes).
State management is built-in. All nodes share a typed state object.
Debugging is straightforward. You can inspect state at every node.
Cycles are supported (loops, retries, re-planning).

Weaknesses:

Boilerplate. Even simple agents require graph definition, state typing, edge conditions.
Rigid for open-ended tasks. If you can't predict the flow in advance, the graph becomes unwieldy.
Learning curve. Graph-based thinking is unfamiliar to many developers.

When to use: You have a well-defined workflow with predictable branches. Customer support routing, IVF clinic call handling, document processing pipelines. Anywhere the "happy path" is known but the LLM needs to make specific decisions at specific points.

Pattern 2: Role-Based (CrewAI)

CrewAI models agents as team members with roles. Each agent has a persona, a goal, and access to specific tools. They collaborate on a task.

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Market Researcher",
    goal="Find comprehensive data on voice AI in India",
    tools=[web_search, pdf_reader]
)

analyst = Agent(
    role="Business Analyst",
    goal="Analyze market data and identify opportunities",
    tools=[spreadsheet_tool, chart_tool]
)

writer = Agent(
    role="Technical Writer",
    goal="Write a clear, actionable market report",
    tools=[file_writer]
)

task = Task(
    description="Research Indian voice AI market and produce a competitive analysis",
    agents=[researcher, analyst, writer],
    process="sequential"  # or "hierarchical"
)

crew = Crew(agents=[researcher, analyst, writer], tasks=[task])
result = crew.kickoff()

The mental model: Instead of one agent doing everything, you decompose the task into roles. The researcher finds data. The analyst processes it. The writer produces the output. Each role has focused tools and expertise.

Process types:

Sequential: Researcher -> Analyst -> Writer. Output of one feeds the next.
Hierarchical: A "manager" agent delegates to workers and synthesizes results.
Consensual: Agents discuss and reach agreement (experimental).

Strengths:

Intuitive for non-engineers. "I need a researcher, an analyst, and a writer" is easy to understand.
Role specialization. Each agent's system prompt is focused, leading to better outputs than one agent trying to do everything.
Reusable agents. Define a "researcher" once, use it across many crews/tasks.

Weaknesses:

Token cost multiplier. Each agent is a separate LLM call. A 3-agent crew costs 3x a single agent.
Inter-agent communication overhead. Passing context between agents loses information.
Less control over the exact flow. You define roles and goals; the framework handles the coordination.
Debugging multi-agent interactions is harder than debugging a single agent's decisions.

When to use: Complex research or analysis tasks that naturally decompose into roles. Content creation pipelines. Multi-step business processes where different skills are needed at different stages.

Pattern 3: Code-Execution (Claude Code, Devin, OpenHands)

The agent writes and executes code to accomplish tasks. Instead of calling pre-defined tools, it writes arbitrary code, runs it, reads the output, and continues.

User: "Analyze the sales data in data.csv and create a visualization."

Agent:
  1. Reads data.csv (tool: file_read)
  2. Writes Python script to load CSV, compute statistics
  3. Executes the script (tool: bash)
  4. Reads the output: "Revenue up 23% in Q3"
  5. Writes another script to create matplotlib visualization
  6. Executes it, saves chart.png
  7. "Here's your analysis: Revenue increased 23% in Q3. Chart saved to chart.png."

The primary tool is a code execution environment. The agent writes whatever code it needs — data analysis, API calls, file manipulation, system administration.

Claude Code's architecture:

The LLM (Claude) has access to: bash, file read/write, web search, and other tools
It writes code, executes it, reads the output, and decides next steps
The human reviews and approves actions (permission system)
Context includes the full file system, git history, and project structure

Strengths:

Unbounded capability. If it can be coded, the agent can do it.
No pre-defined tool limitations. The agent adapts to novel tasks.
Code is inspectable. You can read exactly what the agent did.

Weaknesses:

Security risk. Arbitrary code execution requires sandboxing and permission systems.
Expensive. Code generation + execution + result analysis = many LLM calls.
Requires a strong LLM. Weaker models write buggy code that fails and wastes iterations.
Harder to constrain. The agent might take unexpected approaches.

When to use: Software engineering tasks. Data analysis. System administration. Any task where the solution is best expressed as code rather than as a sequence of pre-defined API calls.

Framework Comparison

Dimension	LangGraph	CrewAI	Claude Code
Mental model	Graph of nodes	Team of agents	Code execution loop
Control	High (explicit edges)	Medium (role-based)	Low (LLM decides everything)
Flexibility	Medium	Medium	High
Cost efficiency	High (minimal LLM calls)	Low (multi-agent overhead)	Medium-High
Debugging	Easy (state inspection)	Hard (multi-agent traces)	Medium (code is readable)
Learning curve	High (graph concepts)	Low (intuitive roles)	Low (just chat)
Best for	Defined workflows	Research/analysis	Engineering tasks
Worst for	Open-ended exploration	Cost-sensitive apps	Non-technical tasks

The Honest Take on Frameworks

Most agent frameworks are over-engineered for what people actually need.

If your task is: "User says X, call tool Y, respond with Z" — you don't need a framework. A simple loop with tool calling and a well-crafted system prompt handles 80% of agent use cases:

messages = [system_prompt, user_message]

while True:
    response = llm.generate(messages, tools=available_tools)

    if response.has_tool_call:
        result = execute_tool(response.tool_call)
        messages.append(response)
        messages.append(tool_result(result))
    else:
        # Text response - task complete
        return response.text

That's 15 lines. It handles multi-step tool calling, context accumulation, and autonomous action. No framework needed.

Production Considerations

Observability

In production, you need to see what the agent is doing:

Which tools were called, in what order, with what parameters
What the LLM decided and why (the reasoning in its output)
Where failures occurred
How long each step took
How much each step cost

LangSmith (by LangChain/LangGraph) provides this for LangGraph agents. For custom agents, you need to build logging into the loop.

Guardrails

The agent should not be able to:

Call tools it shouldn't have access to
Pass parameters that are out of bounds
Execute more than N steps (infinite loop prevention)
Take actions without user approval (for destructive operations)
Exceed a cost budget (cap total LLM spending per task)

Error Handling

Tools fail. APIs time out. Databases return unexpected data. The agent needs to handle these gracefully:

Retry with backoff: Transient failures (API timeouts) should be retried
Alternative approaches: If tool A fails, try tool B
Graceful degradation: If the agent can't complete the full task, deliver what it has
Inform the user: "I couldn't access the calendar system. Here's what I can tell you without it."

Next post: Multi-agent systems. When one agent isn't enough — router patterns, debate patterns, and how Nyxa orchestrates 7 specialized agents with shared memory.

← More posts Home