SQL style guide

General guidelines

Optimize primarily for readability, maintainability, and robustness rather than for fewer lines of code.

Newlines are cheap; people's time is expensive.

Avoid having enormous select statements.

If a select statement is so large it can't be easily comprehended, it would be better to refactor it into multiple smaller CTEs that are later joined back together.

Lines should ideally not be longer than 120 characters.

Very long lines are harder to read, especially in situations where space may be limited like on smaller screens or in side-by-side version control diffs.

Identifiers such as aliases and CTE names should be in lowercase snake_case.

It's more readable, easier to keep consistent, and avoids having to quote identifiers due to capitalization, spaces, or other special characters.

Never use reserved words as identifiers.

Otherwise the identifier will have to be quoted everywhere it's used.

Never use tab characters.

It's easier to keep things consistent in version control when only space characters are used.

Syntax

Keywords and function names should all be lowercase.

Lowercase is more readable than uppercase, and you won't have to constantly be holding down a shift key.

Use != instead of <>.

!= reads like "not equal" which is closer to how we'd say it out loud.

Use || instead of concat.

|| is a standard SQL operator, and in some databases like Redshift concat only accepts two arguments.

Use coalesce instead of ifnull or nvl.

  • coalesce is universally supported, whereas Redshift doesn't support ifnull and BigQuery doesn't support nvl.

  • coalesce is more flexible and accepts an arbitrary number of arguments.

Use is null instead of isnull, and is not null instead of notnull.

isnull and notnull are specific to Redshift.

Use a case statement instead of iff or if.

case statements are universally supported, whereas Redshift doesn't support iff, and in BigQuery the function is named if instead of iff.

Always use the as keyword when aliasing columns, expressions, and tables.

Always alias grouping aggregates and other column expressions.

Use where instead of having when either would suffice.

Queries filter on the where clause earlier in their processing, so where filters are more performant.

Use union all instead of union unless duplicate rows really do need to be removed.

union all is more performant because it doesn't have to sort and de-duplicate the rows.

Use select distinct instead of grouping by all columns.

This makes the intention clear.

Avoid using an order by clause unless it's necessary to produce the correct result.

There's no need to incur the performance hit. If consumers of the query need the results ordered they can normally do that themselves.

For functions that take date part parameters, specify the date part as a string rather than a keyword.

  • While some advanced SQL editors can helpfully auto-complete and validate date part keywords, if they get it wrong they'll show superfluous errors.

  • Less advanced SQL editors won't syntax highlight date part keywords, so using strings helps them stand out.

  • Using a string makes it unambiguous that it's not a column reference.

Always use /* */ comment syntax.

This allows single-line comments to naturally expand into multi-line comments without having to change their syntax.

When expanding a comment into multiple lines:

  • Keep the opening /* on the same line as the first comment text and the closing */ on the same line as the last comment text.

  • Indent subsequent lines by 4 spaces, and add an extra space before the first comment text to align it with the text on subsequent lines.

Use single quotes for strings.

Some SQL dialects like BigQuery support using double quotes or even triple quotes for strings, but for most dialects:

  • Double quoted strings represent identifiers.

  • Triple quoted strings will be interpreted like the value itself contains leading and trailing single quotes.

Joins

Don't use using in joins.

  • Having all joins use on is more consistent.

  • If additional join conditions need to be added later, on is easier to adapt.

  • using can produce inconsistent results with outer joins in some databases.

Use inner join instead of just join.

It's better to be explicit so that the join type is crystal clear.

In join conditions, put the table that was referenced first immediately after on.

This makes it easier to determine if the join is going to cause the results to fan out.

When joining multiple tables, always prefix the column names with the table name/alias.

You should be able to tell at a glance where a column is coming from.

When inner joining, put filter conditions in the where clause instead of the join clause.

Only join conditions should be put in a join clause. All filter conditions should be put together in the where clause.

CTEs

  • Where performance permits, CTEs should perform a single, logical unit of work.

  • CTE names should be as verbose as needed to convey what they do.

  • CTE names should not be prefixed or suffixed with cte.

  • CTEs with confusing or notable logic should be commented.

Use CTEs rather than subqueries.

CTEs will make your queries more straightforward to read/reason about, can be referenced multiple times, and are easier to adapt/refactor later.

Naming

Name single-column primary keys id.

This allows us to easily tell at a glance whether a column is a primary key, helps us discern whether joins are one-to-many or many-to-onearrow-up-right, and is more succinct than other primary key naming conventions (particularly in join conditions).

Date/time column names:

  • Date columns based on UTC should be named like <event>_date.

  • Date columns based on a specific timezone should be named like <event>_date_<timezone indicator> (e.g. order_date_et).

  • Date+time columns based on UTC should be named like <event>_at.

  • Date+time columns based on a specific timezone should be named like <event>_at_<timezone indicator> (e.g created_at_pt).

  • US timezone indicators:

    • et = Eastern Time.

    • ct = Central Time.

    • mt = Mountain Time.

    • pt = Pacific Time.

Boolean column names:

  • Boolean columns should be prefixed with a present or past tense third-person singular verb, such as:

    • is_ or was_.

    • has_ or had_.

    • does_ or did_.

Columns which represent numeric values with a known unit should be suffixed with that unit.

Some examples:

  • price_usd

  • weight_oz

  • weight_kg

  • weight_grams

Avoid using unnecessary table aliases, especially initialisms.

Suggested guidelines:

  • If the table name consists of 3 words or less don't alias it.

  • Use a subset of the words as the alias if it makes sense (e.g. if partner_shipments_order_line_items is the only line items table being referenced it could be reasonable to alias it as just line_items).

Formatting

An overarching pattern is:

  • If there's only one thing, put it on the same line as the opening keyword.

  • If there are multiple things, put each one on its own line (including the first one), indented one level more than the opening keyword.

Left align everything.

This is easier to keep consistent, and is also easier to write.

Indents should generally be 4 spaces.

Never end a line with an operator like and, or, +, ||, etc.

If code containing such operators needs to be split across multiple lines, put the operators at the beginning of the subsequent lines.

  • You should be able to scan the left side of the query text to see the logic being used without having to read to the end of every line.

  • The operator is only there for/because of what follows it. If nothing followed the operator it wouldn't be needed, so putting the operator on the same line as what follows it makes it clearer why it's there.

Using leading commas.

If code containing commas needs to be split across multiple lines, put the commas at the beginning of the subsequent lines, followed by a space.

  • This makes it easier to spot missing commas.

  • Version control diffs will be cleaner when adding to the end of a list because you don't have to add a trailing comma to the preceding line.

  • The comma is only there for/because of what follows it. If nothing followed the comma it wouldn't be needed, so putting the comma on the same line as what follows it makes it clearer why it's there.

select clause:

  • If there is only one column expression, put it on the same line as select.

  • If there are multiple column expressions, put each one on its own line (including the first one), indented one level more than select.

  • If there is a distinct qualifier, put it on the same line as select.

from clause:

  • Put the initial table being selected from on the same line as from.

  • If there are other tables being joined:

    • Put each join on its own line, at the same indentation level as from.

    • If there is only one join condition, put it on the same line as the join.

    • If there are multiple join conditions, end the join line with on and put each condition on its own line (including the first one), indented one level more than the join.

where clause:

  • If there is only one condition, put it on the same line as where.

  • If there are multiple conditions, put each one on its own line (including the first one), indented one level more than where.

group by and order by clauses:

  • If grouping/ordering by column numbers, put all numbers on the same line as group by/order by.

  • If grouping/ordering by column names/aliases:

    • If there is only one column, put it on the same line as group by/order by.

    • If there are multiple columns, put each on its own line (including the first one), indented one level more than group by/order by.

CTEs:

  • Start each CTE on its own line, indented one level more than with (including the first one, and even if there is only one).

  • Use a single blank line around CTEs to add visual separation.

  • Put any comments about the CTE within the CTE's parentheses, at the same indentation level as the select.

case statements:

  • You can put a case statement all on one line if it only has a single when clause and doesn't cause the line's length to be too long.

  • For multi-line case statements:

    • when:

      • when clauses should start on their own line, indented one level more than the case statement.

      • If a when clause has multiple conditions, keep the first condition on the same line as when and put subsequent conditions on their own lines.

      • If a when clause has multiple lines, all its subsequent lines should be indented at least one level more than when.

    • then:

      • then clauses can go on the same line as a single-line when clause if it doesn't cause the line's length to be too long.

      • Otherwise, then clauses should go on their own line, indented one level more than its associated when.

      • If a then clause has multiple lines, all its subsequent lines should be indented at least one level more than then.

    • else:

      • else clauses should go on their own line, at the same indentation level as the when clauses.

      • If an else clause has multiple lines, all its subsequent lines should be indented at least one level more than else.

    • end:

      • end should go on its own line, at the same indentation level as case.

      • If the case starts after a leading comma and space, align end with case by adding two extra spaces before it.

    • If a multi-line case statement is within a function call, case and end should go on their own lines, indented one level more than the function call.

  • If using case <expression> syntax, keep the expression on the same line as case.

Window functions:

  • You can put a window function all on one line if it doesn't cause the line's length to be too long.

  • If breaking a window function into multiple lines:

    • Put each sub-clause within over () on its own line, indented one level more than the window function:

      • partition by

      • order by

      • rows between / range between

    • Put the closing over () parenthesis on its own line at the same indentation level as the window function.

in lists:

  • Break long lists of in values into multiple indented lines with one value per line.

Don't put extra spaces inside of parentheses.

Last updated