How to generate a series of numbers in SQL
Generating a sequence of numbers in a query is fairly hard and time consuming in SQL, but super useful for things like generating data and avoiding loops. A common solution is to use a prebuilt numbers table.
At Narrator we frequently use a sequence to generate test data, so we have a numbers table of our own:
number |
---|
1 |
2 |
3 |
... |
9999999 |
We generated the data in Python as a dict – we have a backend system that can insert tables.
But what if you wanted to do this in SQL? Here's how to do it for a few different warehouses.
General-purpose SQL
The following approach uses a cross join to generate roughly 60m numbers. It's very standard and should work in nearly all warehouses. I've tested it on Azure SQL, BigQuery, Postgres, Redshift, and Snowflake.
It's easy to tune it to create a desired number of rows – either add a limit or adjust the number of cross joins. Each new select will multiply the total row count by 36.
Warehouse-specific approaches
Some warehouses have their own ways to directly create sequences of numbers. Those queries are a bit easier to understand but otherwise will have the same result.
Snowflake
Snowflake has a generator function to do this automatically. The following code is basically straight from the docs. Here we're selecting the seq4()
function which generates a 4-byte integer sequence.
Postgres
Postgres supports generate_series() so this is fairly straightforward.
SELECT * FROM generate_series(0,10000000);
I haven't tested the performance of this but if you're using numbers this large it's probably best to create a table or materialized view.
Redshift
For Redshift the general cross-join SQL is the best to use.
Some people also use generate_series(), but that's officially unsupported and won't work on queries that insert data.
Azure SQL
I'm not aware of any specific way to generate numbers with Microsoft SQL Server. The general-purpose cross-join approach is the best.
Big Query
StackOverflow has an answer suggesting the use of the generate_array
function but I haven't had a chance to try out this approach. It looks like it's limited to about 1M rows per call – and I have no idea how it performs.
Other
If you have a pointer to a snippet for any other warehouses please feel free to contact me and I'll update the post.