Tips SQL formatting database developer

SQL Formatting Guide: How to Write Clean, Maintainable SQL Queries

· 7 min read · Max P

SQL is probably the most widely written programming language in the world, yet it's also the most commonly written badly. Developers who would never ship a 200-line Python function without structure routinely write massive SQL queries as impenetrable walls of text. The result: queries that are hard to debug, impossible to review, and a nightmare to modify six months later.

This guide covers the formatting conventions that make SQL readable, maintainable, and review-friendly. Whether you're writing analytics queries, backend ORM escapes, or migration scripts, these practices will make your SQL dramatically clearer. Use our SQL Formatter to auto-format any query instantly.

Why SQL Formatting Matters

Consider this query that a data analyst might encounter in a production codebase:

select u.id,u.name,u.email,count(o.id) as order_count,sum(o.total) as lifetime_value from users u join orders o on u.id=o.user_id where u.created_at>='2024-01-01' and o.status='completed' group by u.id,u.name,u.email having sum(o.total)>1000 order by lifetime_value desc limit 50;

Now compare the formatted version:

SELECT u.id, u.name, u.email, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_value FROM users AS u JOIN orders AS o ON u.id = o.user_id WHERE u.created_at >= '2024-01-01' AND o.status = 'completed' GROUP BY u.id, u.name, u.email HAVING SUM(o.total) > 1000 ORDER BY lifetime_value DESC LIMIT 50;

Same query, same result. But the second version is immediately scannable — you can see the columns, the join, the filters, and the aggregation at a glance. When a bug report comes in about incorrect totals, the formatted version takes seconds to debug; the unformatted version takes minutes.

Keyword Capitalization

The most visible formatting choice is whether SQL keywords are uppercase or lowercase. There are three common conventions:

Style Example Used By
UPPERCASE keywords SELECT id FROM users WHERE active = TRUE Most SQL style guides, SQLFluff default, traditional convention
lowercase keywords select id from users where active = true dbt community, modern data teams
Mixed (Title Case) Select id From users Where active = True Rare — avoid this

Recommendation: Use UPPERCASE for SQL keywords. It provides the clearest visual distinction between keywords and identifiers. The dbt community prefers lowercase, which is also valid — the key is consistency across your team. This guide uses UPPERCASE in all examples.

Indentation and Line Breaks

The core principle: each major clause starts on a new line, and continuation lines are indented.

SELECT Clause

Put each column on its own line. This makes it easy to add, remove, or reorder columns and produces clean diffs in version control:

-- BAD: hard to see all columns, messy diffs SELECT u.id, u.name, u.email, u.created_at, u.plan_type -- GOOD: one column per line SELECT u.id, u.name, u.email, u.created_at, u.plan_type

JOIN Clauses

Each JOIN gets its own line. The ON condition is indented beneath it:

FROM orders AS o JOIN users AS u ON o.user_id = u.id LEFT JOIN products AS p ON o.product_id = p.id AND p.active = TRUE

Always specify the JOIN type explicitly (JOIN, LEFT JOIN, INNER JOIN, CROSS JOIN). Never use implicit joins with comma-separated tables in the FROM clause — they're harder to read and easier to accidentally cross-join.

-- BAD: implicit join (old syntax, error-prone) SELECT * FROM orders o, users u WHERE o.user_id = u.id -- GOOD: explicit join (clear intent) SELECT * FROM orders AS o JOIN users AS u ON o.user_id = u.id

WHERE Clause

Place each condition on its own line with the logical operator at the start:

WHERE u.status = 'active' AND u.created_at >= '2024-01-01' AND u.plan_type IN ('pro', 'enterprise') AND u.email NOT LIKE '%@test.com'

Placing AND at the start of the line (rather than the end) makes it trivial to comment out individual conditions during debugging:

WHERE u.status = 'active' -- AND u.created_at >= '2024-01-01' -- temporarily disabled AND u.plan_type IN ('pro', 'enterprise')

CASE Expressions

CASE statements should be formatted with each WHEN on its own indented line:

SELECT u.name, CASE WHEN u.lifetime_value >= 10000 THEN 'platinum' WHEN u.lifetime_value >= 5000 THEN 'gold' WHEN u.lifetime_value >= 1000 THEN 'silver' ELSE 'bronze' END AS customer_tier

CTEs vs. Subqueries

Common Table Expressions (CTEs, aka WITH clauses) are one of the biggest improvements you can make to SQL readability. They let you break a complex query into named, logical steps.

Subquery Version (Hard to Follow)

SELECT user_segments.segment, COUNT(*) AS user_count, AVG(user_segments.total_spent) AS avg_spent FROM ( SELECT u.id, SUM(o.total) AS total_spent, CASE WHEN SUM(o.total) >= 5000 THEN 'high_value' WHEN SUM(o.total) >= 1000 THEN 'medium_value' ELSE 'low_value' END AS segment FROM users AS u JOIN orders AS o ON u.id = o.user_id WHERE o.status = 'completed' AND o.created_at >= '2024-01-01' GROUP BY u.id ) AS user_segments GROUP BY user_segments.segment ORDER BY avg_spent DESC;

CTE Version (Much Clearer)

WITH user_spending AS ( SELECT u.id, SUM(o.total) AS total_spent FROM users AS u JOIN orders AS o ON u.id = o.user_id WHERE o.status = 'completed' AND o.created_at >= '2024-01-01' GROUP BY u.id ), user_segments AS ( SELECT id, total_spent, CASE WHEN total_spent >= 5000 THEN 'high_value' WHEN total_spent >= 1000 THEN 'medium_value' ELSE 'low_value' END AS segment FROM user_spending ) SELECT segment, COUNT(*) AS user_count, AVG(total_spent) AS avg_spent FROM user_segments GROUP BY segment ORDER BY avg_spent DESC;

The CTE version reads top-to-bottom like a recipe: first calculate spending, then assign segments, then aggregate. Each step is testable independently — you can run just the user_spending CTE to verify intermediate results.

When to use subqueries instead: For simple, one-off inline lookups (WHERE id IN (SELECT ...)) or correlated subqueries that reference the outer query, subqueries are sometimes cleaner. Use CTEs when you have multiple transformation steps or when the same derived result is referenced more than once.

Common Anti-Patterns to Avoid

1. SELECT *

-- BAD: fetches all columns — breaks when schema changes, -- wastes bandwidth, makes joins ambiguous SELECT * FROM users JOIN orders ON users.id = orders.user_id -- GOOD: explicit columns — self-documenting, predictable SELECT u.id, u.name, o.id AS order_id, o.total FROM users AS u JOIN orders AS o ON u.id = o.user_id

SELECT * is fine for quick ad-hoc exploration. In production queries, views, and application code, always list columns explicitly.

2. Missing Table Aliases

-- BAD: ambiguous — which table does "name" come from? SELECT name, total FROM users JOIN orders ON users.id = orders.user_id -- GOOD: aliased and qualified SELECT u.name, o.total FROM users AS u JOIN orders AS o ON u.id = o.user_id

Always use short, meaningful aliases (u for users, o for orders, p for products). Qualify every column with its alias, even when there's no ambiguity — it tells the reader exactly where each column comes from.

3. Implicit Type Coercion

-- BAD: comparing string to integer — implicit cast, potential index bypass WHERE user_id = '123' -- GOOD: use the correct type WHERE user_id = 123

4. NOT IN With NULLs

-- DANGEROUS: if the subquery returns any NULL, the entire -- NOT IN evaluates to UNKNOWN and returns zero rows SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM blocked_users) -- SAFE: NOT EXISTS handles NULLs correctly SELECT * FROM users AS u WHERE NOT EXISTS ( SELECT 1 FROM blocked_users AS b WHERE b.user_id = u.id )

5. Functions on Indexed Columns

-- BAD: wrapping the column in a function prevents index usage WHERE YEAR(created_at) = 2024 -- GOOD: compare against a range (index-friendly) WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

Formatting Complex Queries Step by Step

Here's a systematic approach to formatting any complex query:

  1. Start with the goal. Write a one-line comment explaining what the query produces: -- Monthly revenue by product category for active customers
  2. Identify the logical steps. Break the problem into pieces: "First get completed orders, then join with products, then aggregate by month and category."
  3. Write each step as a CTE. Give each CTE a descriptive name (completed_orders, order_products, monthly_revenue).
  4. Format each CTE independently. Each one should be a clean, simple query.
  5. Write the final SELECT. The final query should be short and obvious, simply aggregating or joining the CTEs.

The result reads like a narrative: "here's the data, here's how I filtered it, here's how I transformed it, and here's the output."

-- Monthly revenue by product category for active customers WITH completed_orders AS ( SELECT o.id AS order_id, o.user_id, o.total, o.created_at, DATE_TRUNC('month', o.created_at) AS order_month FROM orders AS o JOIN users AS u ON o.user_id = u.id WHERE o.status = 'completed' AND u.status = 'active' ), order_products AS ( SELECT co.order_month, p.category, oi.quantity * oi.unit_price AS line_total FROM completed_orders AS co JOIN order_items AS oi ON co.order_id = oi.order_id JOIN products AS p ON oi.product_id = p.id ) SELECT order_month, category, SUM(line_total) AS revenue, COUNT(DISTINCT order_month) AS months_active FROM order_products GROUP BY order_month, category ORDER BY order_month DESC, revenue DESC;

SQL Formatting Checklist

For automated enforcement, use SQLFluff as a linter in your CI pipeline, or paste any query into our SQL Formatter for instant reformatting.

Frequently Asked Questions

Should SQL keywords be uppercase or lowercase?

Either works — the key is consistency. UPPERCASE keywords is the traditional convention endorsed by most SQL style guides (SQL Style Guide by Simon Holywell, SQLFluff defaults, Mazur's SQL style guide). It provides a clear visual separation between keywords and identifiers. The dbt community prefers lowercase, arguing it's faster to type and less "shouty." Pick whichever your team agrees on and enforce it with a linter.

Are CTEs slower than subqueries?

In most modern databases (PostgreSQL 12+, MySQL 8.0+, SQL Server, BigQuery, Snowflake), the query optimizer inlines CTEs and produces the same execution plan as an equivalent subquery. In older PostgreSQL versions (before 12), CTEs were treated as optimization fences and could be slower. If you're on a current version of any major database, use CTEs freely — the readability benefit is significant and the performance is identical. Always check with EXPLAIN ANALYZE if you're concerned about a specific query.

Should I use trailing commas or leading commas in column lists?

Trailing commas (column1, at the end) are the standard convention and what most formatters produce. Some analysts prefer leading commas (, column1 at the start) because it makes commenting out the last column easier and produces slightly cleaner git diffs. Both are valid — pick one and be consistent. This guide uses trailing commas, which is the more widely adopted style.

How do I format really long CASE expressions?

If a CASE expression has many branches, consider extracting the logic into a CTE or a lookup table. A CASE with 15+ WHEN clauses is a code smell — it often means the mapping belongs in a dimension table that you JOIN against, not in procedural SQL logic. For moderate-length CASE expressions (3-8 branches), the standard WHEN/THEN/ELSE indentation shown in this guide works well. Align the THEN keywords vertically for extra readability.

What SQL formatter or linter should I use?

SQLFluff is the most popular open-source SQL linter. It supports multiple SQL dialects (PostgreSQL, MySQL, BigQuery, Snowflake, etc.) and is highly configurable. Install it with pip install sqlfluff and add it to your CI pipeline. For quick one-off formatting, use our SQL Formatter in the browser. Other good options include pgFormatter for PostgreSQL-specific formatting and Prettier with the SQL plugin for teams already using Prettier.