Prompt Engineering for SQL Agents: What Actually Works

When you ask a large language model to write SQL, you quickly discover why it is a hard prompting problem. The model must output syntactically correct statements, obey the exact dialect of your PostgreSQL database, and avoid injection-prone constructions — all from an ambiguous natural-language request. A missing underscore or a stray backtick can cause the whole query to fail, and a single wrong table name leads to an empty result set that leaves the user frustrated.

Start with intent classification

The first guard we put in place is intent classification. Before we even consider building a query we ask a tiny model whether the user is asking for data or simply chatting. A one-word response — "query" or "chat" — lets us short-circuit the expensive pipeline for casual conversation.

In practice the obvious cases are caught with cheap heuristics: the presence of a question mark, common data verbs, or the words "how many", "list", or "average". Those simple checks save tokens and keep the database untouched when a user just wants to say hello.

Table selection before SQL generation

If the intent is a data question, the next step is table selection. Giving the language model the full PostgreSQL catalog is wasteful — the prompt balloons, token costs rise, and the model can become confused by irrelevant schema.

Instead we feed it a succinct list of candidate tables — usually a handful that match the user's keywords — and ask it to return a comma-separated list of the ones it thinks are relevant. The response is easy to parse, and the cost of a missing table is far lower than the cost of passing an entire schema that may be hundreds of tables long.

The SQL generation prompt

With the relevant tables in hand we move to SQL generation. The prompt we have found most reliable starts with a clear role definition: the model is a SQL-only assistant that must never produce DML statements. We follow that with explicit rules:

Use PostgreSQL dialect
Never exceed the row limit
Always wrap user-supplied values in $1, $2 placeholders
Never produce INSERT, UPDATE, DELETE, or DROP

The schema is injected directly after the rules, formatted as a series of column listings plus a few example rows per table. Those sample rows matter — they tell the model that a column called status contains "active" and "churned", not 1 and 0.

When a retry is needed, we prepend the previous error message at the very top of the prompt. This makes the model see the failure context immediately, rather than burying it after the schema where it tends to be ignored.

We also keep the answer-generation prompt separate. We never ask the model to explain its own syntax errors — instead we give it just the result set and let it turn that into a human-readable paragraph.

Common failure modes and fixes

Even a well-crafted prompt will sometimes be tripped up. Here are the failures we hit most often and how we handle them.

Markdown code fences in the output. The model frequently wraps its SQL in triple backticks despite instructions not to. The fix is simple post-processing: strip any leading and trailing fenced blocks before the query reaches the validator.

Hallucinated table names. Static analysis of the generated SQL catches references to tables that don't exist in the schema. We feed that error back into the regeneration node with the hallucinated name highlighted.

The model including the error message in the answer. Occasionally when we ask the model to answer from the result set it will surface the previous SQL error instead. Ensuring the answer node receives only the raw result rows — not the SQL text or the error — keeps the final response focused on the data.

Ignored LIMIT clauses. The trick is to make the limit non-negotiable: state it in uppercase and provide a concrete number (always include LIMIT 100). When the model still omits it, the validator rejects the query and triggers a retry. After a few failed attempts we fall back to a safe default — a generic SELECT * FROM <table> LIMIT 10 — which guarantees a response even if the model cannot parse the original request.

Prompts alone are not enough

All of this reinforces a simple but often overlooked truth: prompts alone are not enough. Robust validation, layered retry logic, and disciplined post-processing do more for reliability than any amount of prompt tuning. The prompt is the first line of defense, but the surrounding plumbing — schema caching, static analysis, error-aware re-prompting — is what keeps broken SQL from reaching your PostgreSQL database.

When these pieces work together, a developer can hand a user a clean chat experience that reliably turns natural language into safe, dialect-correct SQL without the user ever seeing an error message or writing a single line of code.

Ready to see it in action? Connect your PostgreSQL database, paste your bring your own API key key, and let your team ask the first question. Check our documentation to get started or view pricing options.

Start free trial →