When to add LangGraph to a LangChain app

Originally published at: https://varunyadav.com/when-to-add-langgraph-to-a-langchain-app/
LangChain got the agent to use tools. That was the first win.
The next problem was harder: the agent needed to process a queue, survive partial failures, and avoid sending the final summary too early.
That is the point where prompts stop being a good workflow engine.
The short version
Use LangChain when the main job is model interaction: call the model, give it tools, let it decide which tool to use, and return an answer.
Use LangGraph directly when the job needs a workflow you can control: discover work, choose the next step, enforce completion rules, and prove that every item in a queue was handled before the final step runs.
In this project, LangChain still does the model and tool work. LangGraph owns the order of operations around it.
That distinction matters.
If the model owns the loop, you are relying on instructions like "do this for every item" and "do not summarize early."
The graph changes the contract. The finalization step is not reachable until the controller has recorded an outcome for every item.
What the app was doing
The project is a custom RAG and agent runtime. The code is available in Oracle-26ai-LangChain-RAG.
A user can ask a question, and the system can combine several capabilities:
Retrieval from Oracle 26ai Vector Search
Normal chat answers
MCP tool calls
Native structured tool calling
Streamed responses through FastAPI
Citations and tool metadata in the response
Traces in Langfuse
Service-level telemetry through OpenTelemetry
For a basic question, this is a familiar LangChain shape. The model receives a prompt, sees the available tools, calls the right tools, and returns a response.
That part worked.
The trouble started when the request stopped being "answer this."
It became "do this for every item in this set, survive partial failures, and then send one final report."
A working agent is not automatically a workflow
Tool calling and workflow control are related, but they are not the same job.
Tool calling asks: "Which function should the model call next?"
Workflow control asks different questions:
What is the full queue of work?
Which item is being processed now?
Which items completed?
Which items were skipped?
Which items failed?
Has the final summary already run?
If the process stops halfway through, where should it resume?
Those facts need a home. A bigger prompt is a poor home because prompts describe behavior; they do not enforce state transitions.
Tools are the wrong home too. A list tool should list. A create tool should create. An email tool should send the summary when the workflow is ready. None of those tools should secretly own the whole business process.
The workflow needed a controller.
The failure that made the boundary obvious
The agent had an instruction to process every document and continue even if one document failed.
In one run, the list tool returned malformed MCP content. The prompt also contained a sample document reference. The agent recovered by processing the sample, created one transaction, sent the summary, and reported that one document had been reviewed.
No crash. No obvious error in the final answer. The output looked plausible.
But the workflow never established the queue.
This is the dangerous kind of agent failure. The system does something incomplete, then explains it confidently.
At that point, "make the prompt clearer" was not a serious fix. The prompt already said what to do. The missing piece was runtime control: the system needed to make early finalization impossible.
What LangGraph added
LangGraph added a small workflow controller around the LangChain agent.
The flow looks like this:
The API/runtime receives the user request.
The runtime decides whether this is a repeated workflow.
The LangGraph controller and agent discover the full queue of work.
The LangGraph controller gives the LangChain agent one work unit to process.
The controller records the outcome.
The controller repeats the process until every work unit is terminal.
The controller allows the agent to run the final summary step.
The flow is intentionally small:
Decide whether the request is a repeated workflow.
Discover the queue.
Process exactly one work unit.
Record the outcome.
Repeat until every work unit is terminal.
Run the final summary.
The model still helps with interpretation and tool choice. It no longer owns the loop counter.
The workflow needed explicit progress, not just memory
LangChain can maintain conversation context. That was not the gap.
The gap was workflow progress. The app needed a structured record of what work existed, which item was active, what happened to each item, and whether finalization was allowed.
Once LangGraph owned that progress, the workflow became a normal engineering object instead of a hope embedded in a prompt.
class RepeatedWorkflowState(TypedDict, total=False):
workflow_id: str
work_units: list[WorkUnit]
current_index: int
completed: list[WorkOutcome]
skipped: list[WorkOutcome]
failed: list[WorkOutcome]
finalized: bool
That object made the behavior testable:
If the workflow discovers three work units, it should produce three terminal outcomes before finalization.
If one work unit fails, the next work unit should still run.
Finalization should be blocked until the controller has a terminal outcome for every work unit.
Before LangGraph, those were instructions to the model. After LangGraph, they became assertions in the app's test suite.
LangChain still does the model work
The wrong lesson is "LangGraph replaces LangChain."
In this architecture, LangChain still handles the model-tool loop. The agent still gets the prompt, chat history, tools, tracing callbacks, run config, and native tool schemas.
LangGraph decides when to call that agent and what job the agent gets at that point in the workflow:
During discovery, the agent can use queue-discovery tools.
During processing, the agent receives one work unit.
During finalization, the agent can send the summary or report.
That smaller job is better for the model.
Instead of asking the model to remember "do this for every item and do not summarize early," the controller gives it a narrow task:
Process exactly one work unit from the discovered queue.
Do not call final summary, notification, report, or communication tools in this step.
The model still reads the user request, extracts details, chooses tools, compares results, and explains outcomes. The graph handles the order of operations.
When LangChain is enough
You probably do not need LangGraph if every request can be handled as one agent turn.
LangChain may be enough when your app looks like this:
Answer a question from retrieved context
Call one or two tools and respond
Classify an input
Generate a document or message
Run a short assistant flow where failure can be returned directly to the user
In those cases, adding a graph can become ceremony. The agent loop is already the right unit of work.
When LangGraph earns its place
LangGraph starts to pay for itself when the workflow has rules the model should not be responsible for remembering.
Add it when you need one or more of these:
A queue of items that must all be processed.
Branching paths where the next step depends on structured state.
Retries that should not repeat completed work.
Resumability after an interruption.
Checkpoints for auditability or debugging.
A final step that must run only after earlier steps are complete.
The practical test is simple: if you find yourself writing a long prompt that asks the model to act like a project manager, you probably need a graph.
LangChain middleware can still be the right answer for many agent-level concerns, including retries, tool-call policies, and human review of specific tool calls. Reach for LangGraph when those concerns become a workflow with its own state, transitions, and completion rules.
Why not just add more middleware?
LangChain middleware is useful. In this app, it handles agent-level concerns such as retries, tool-call limits, tracing metadata, and policies around tool execution.
But middleware wraps the agent loop. It does not naturally define a business workflow.
Middleware can say, "do not call more than N tools."
It cannot cleanly enforce this sequence: discover all work units, process each one, then allow final notification only after every work unit has a terminal outcome.
This is graph territory.
Middleware protects the loop. LangGraph describes the workflow.
The app needed both.
Why not hardcode the workflow?
Hardcoding the first demo would have been faster. It also would have trapped the architecture inside one business process.
The useful pattern was generic:
Discover a queue.
Convert queue entries into work units.
Process one work unit at a time.
Record
completed,skipped, orfailed.Finalize only after all work units are terminal.
That pattern works for documents, tickets, files, approvals, records, and other requests where the user says "do this for each item."
So the controller works with generic WorkUnit and WorkOutcome objects. The model and tools provide the domain behavior.
That kept the LangGraph layer small and stopped the workflow code from becoming a pile of if invoice, if folder, if payment terms, and if email summary branches.
Checkpoints were useful, but they were not the main reason
Checkpoints helped. They made it possible to persist progress locally and inspect what had already happened.
While developing the app, I used SQLite-backed LangGraph checkpoints for local persistence. The runtime stores thread and workflow progress through LangGraph's checkpoint API instead of inventing a separate persistence model.
But checkpoints were not the architectural reason for the move.
The architectural reason was control. The app needed a runtime layer that could say:
Discovery must happen before processing.
Processing handles one work unit at a time.
Each work unit must end as completed, skipped, or failed.
Summary tools are unavailable until every work unit is terminal.
Conversation memory can help answer, "what did the user ask earlier?" It does not, by itself, enforce that lifecycle.
The code split followed the responsibility split
The repo became easier to reason about once the architecture had clearer boundaries:
api/owns FastAPI routes and request/response wiring.src/rag_agent/runtime/owns chat runtime orchestration.src/rag_agent/workflows/owns reusable LangGraph workflow controllers.
That split sounds mundane, but it changed how bugs got fixed.
Before the split, route code, runtime code, and workflow logic were too close together. A product bug could easily turn into another condition in the wrong layer.
After the split, the placement question became sharper:
Is this HTTP behavior? Put it in
api.Is this chat-mode orchestration? Put it in
runtime.Is this repeated workflow control? Put it in
workflows.
Good architecture often feels boring after you name the boundaries.
What I would tell another team
Do not start with LangGraph just because agents are involved. Start with the simplest shape that matches the job.
If the job is "answer this request with tools," LangChain is usually the right place to begin.
If the job is "move this process through explicit steps with rules about what can happen next," add LangGraph before prompts become your workflow engine.
The clean mental model became:
LangChain for model invocation, tools, middleware, and native tool calling.
LangGraph for orchestration, loops, branching, checkpoints, and completion rules.
Langfuse for inspecting the agentic run.
OpenTelemetry for inspecting the service around it.
FastAPI for keeping the external API contract stable.
The move to LangGraph was not a rejection of LangChain. LangGraph was the next layer after the agent's job changed.
When the agent only needed to answer, LangChain was enough. When the agent needed to finish every item before sending a summary, the workflow needed an owner.
LangGraph became that owner.
FAQ
Does LangGraph replace LangChain?
No. In this architecture, LangChain still handles the model-tool loop. LangGraph wraps workflow control around that loop when a request needs repeated work, branches, explicit transitions, or completion rules.
When should I add LangGraph to a LangChain agent?
Add it when the workflow needs branching, resumability, multi-step control, or repeated processing across a queue. If every request can be handled as one agent turn, LangChain may be enough.
Why not keep everything in the prompt?
Prompts can describe requirements, but they do not enforce state transitions. If a final summary tool is available too early, the model can call it too early. A graph can make that impossible by keeping the workflow out of the finalization step until all work units are terminal.
Is this about LangChain not having memory?
No. LangChain can keep conversation context, and LangChain agents are built on LangGraph under the hood. The reason to use LangGraph directly is control over the workflow: explicit steps, branching, queue progress, and rules that prevent the final summary from running too early.
Is LangGraph worth it for small apps?
Not always. If the app is a simple chatbot or a single-tool assistant, a graph may be extra machinery. The trade-off changes when the agent starts doing work with partial completion, retries, failure handling, or finalization rules.
