04. Text-to-SQL
Converting natural language questions to SQL queries enables non-technical users to access data directly. Local models can generate SQL without data leaving the environment, addressing privacy concerns that prevent using external services for query generation.
Text-to-SQL involves understanding the question, mapping concepts to table structures, and constructing valid SQL syntax. Local models can learn schema details and generate queries that respect table relationships and constraints.
Schema-Aware Query Generation
Effective query generation requires the model to understand table structures, column meanings, and relationships. Providing schema context significantly improves query accuracy.
import ollama
import sqlite3
def generate_sql(question: str, schema: str, examples: list = None) -> str:
"""Generate SQL query from natural language question."""
prompt = f"""Given this database schema:
{schema}
Generate a SQL query for this question:
{question}
Return ONLY the SQL query, no explanation."""
if examples:
prompt = f"Examples of correct queries:\n{examples}\n\n{prompt}"
response = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': prompt}]
)
return response['message']['content'].strip()
# Example schema
schema = """
CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT,
signup_date DATE,
region TEXT
);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date DATE,
total DECIMAL(10,2),
status TEXT
);
"""
# Generate query
question = "How many customers signed up in each region last month?"
sql = generate_sql(question, schema)
print(sql)
The model should return something like:
SELECT c.region, COUNT(*) as customer_count
FROM customers c
WHERE c.signup_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
AND c.signup_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY c.region
Validating Generated Queries
Models sometimes generate syntactically valid but semantically incorrect SQL. Validation catches these errors before they cause problems.
def validate_sql(sql: str, conn: sqlite3.Connection) -> dict:
"""Validate SQL query syntax and execution."""
result = {
'valid': False,
'error': None,
'row_count': 0,
'sample_results': None
}
try:
cursor = conn.cursor()
cursor.execute(sql)
# Verify execution succeeds
columns = [desc[0] for desc in cursor.description]
rows = cursor.fetchmany(5) # Fetch sample results
result['valid'] = True
result['columns'] = columns
result['sample_results'] = rows
except Exception as e:
result['error'] = str(e)
return result
# Test with SQLite
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE test (a INTEGER, b TEXT)")
conn.execute("INSERT INTO test VALUES (1, 'x'), (2, 'y')")
# Validate a generated query
validation = validate_sql("SELECT * FROM test", conn)
print(f"Valid: {validation['valid']}")
print(f"Columns: {validation.get('columns')}")
Handling Query Failures
Common failure modes include incorrect table references, missing JOIN conditions, wrong aggregation logic, and misunderstood column meanings. Debugging involves comparing generated SQL against expected results and providing corrective feedback.
def debug_sql(question: str, sql: str, error: str, schema: str) -> str:
"""Debug failed SQL generation."""
prompt = f"""Question: {question}
Generated SQL: {sql}
Error: {error}
Schema: {schema}
What is wrong with this query? Provide corrected SQL with explanation."""
response = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': prompt}]
)
return response['message']['content']
Iterative debugging with specific error feedback produces progressively better queries.
Create a small SQLite database with 3-4 related tables. Generate queries for 10 different questions. Identify which queries fail and why. Build a prompt template that improves generation accuracy.