← Back to blog

Building a Production SQL Agent with LangGraph

How we built a production-ready SQL agent using LangGraph, covering pipeline architecture, retries, schema caching, and the hard lessons we learned shipping it.

Building a Production SQL Agent with LangGraph

When you stare at a relational database you quickly realize that most people never write SQL. They have data, they have questions, and they expect answers. The friction of learning a query language is the first barrier — it costs time and talent. A SQL agent removes that barrier by exposing a natural-language interface: you type a question and the system translates it into a query, runs it, and returns a plain-English answer.

A naïve approach would be to send the user prompt straight to a large language model and ask it to produce SQL. In practice that single call is brittle. The model can hallucinate table names, forget a required WHERE clause, or generate a dialect your database rejects. That's where LangGraph becomes valuable. Rather than a single monolithic prompt, LangGraph lets you assemble a stateful graph of nodes — each node performs a focused task and passes its result along an edge to the next. Because each step is isolated, you can add validation, retries, and conditional routing without rewriting a massive prompt.

The pipeline

The pipeline we settled on mirrors a classic conversational query flow. First, an intent classifier decides whether the user is asking about data or just making small talk. This matters because running a full pipeline for casual chatter wastes tokens and can inadvertently hit your database.

If the intent is a data question, we move to table selection. Handing the LLM every table in the catalog inflates the prompt and confuses the model. By filtering down to a handful of likely tables we keep the token budget low and improve accuracy.

Next, we retrieve the schema for those tables. Instead of hitting the database on every request we cache the DDL and a few sample rows. The cached schema gives the model the context it needs — column names, types, and example data — without the latency of a live query.

With that context in hand, the SQL generation node crafts a dialect-aware statement. The prompt explicitly tells the model to avoid INSERT, UPDATE, DELETE and to include a sensible LIMIT clause. We also embed a small injection-hardening instruction that forces the model to treat any user-supplied value as a parameter, not raw text.

Static analysis before execution

Before we ever send the statement to the database we run static analysis. A quick parse catches hallucinated tables, disallowed DML commands, and multiple statements that could be a security risk. Only if the query passes these checks do we execute it.

Any runtime errors are caught and routed back into a regeneration node. That node attaches the error message to the top of the prompt and asks the model to produce a corrected query. Retry counters are scoped per node — a validation failure retries up to three times, an execution failure has a tighter budget of two, since those errors tend to require more structural changes.

Answer generation

Finally, the answer generation node receives the raw result set and the original natural-language question. By separating result formatting from the query step we keep the LLM focused on turning rows into a concise English summary. The user sees a friendly paragraph rather than a table of numbers.

Lessons learned in production

Building this in production taught us a few hard lessons.

Schema caching gave us the biggest latency win. Without it, fetching DDL and sample rows on every request added 400–800ms to the first hop. With a warm in-memory cache keyed by (connection_id, table_name), schema retrieval completes in under 10ms. We invalidate on a 24-hour TTL and when the user manually refreshes their schema.

Retry counters must be scoped per node, not per conversation turn. Global retry state means an earlier unrelated failure can exhaust the budget before the real problem occurs.

Prompt injection is real. We now run a cheap heuristic pre-check that strips suspicious characters before the model ever sees the user input. The static analysis node catches multi-statement payloads, but defense in depth matters.

LangChain tools surface errors as plain strings, not exceptions. You have to inspect the result text to decide whether to retry. Without an explicit is_error() check, the graph will silently treat an error message as a valid SQL result and pass garbage to the answer generation node.

The result

All of this machinery lets a developer expose their data behind a simple chat window. Users never touch SQL, they never see error stacks, and the system remains safe and performant. The stateful graph gives you the observability to see exactly where a query failed — which makes debugging in production far less painful than debugging a monolithic prompt.

Ready to try it yourself? Connect your PostgreSQL database, paste your bring your own API key key, and let your team ask the first question. Check our documentation to get started or view pricing options.

Start free trial →