postgresqlid generationbackenddatabaseecommerce

How We Built a Snowflake-Style ID Generator in PostgreSQL

Milan Ghoghari
Milan GhoghariSDE-2
Mar 18, 2026·6 min read
How We Built a Snowflake-Style ID Generator in PostgreSQL

TL;DR

  • UUID's random structure caused index fragmentation and made time-based sorting expensive on our orders table
  • We built a custom 64-bit ID generator inside PostgreSQL, modeled on Twitter's Snowflake
  • The ID encodes a millisecond timestamp (42 bits) and a sequence counter (12 bits), producing up to 4,096 unique IDs per millisecond
  • IDs are monotonically increasing, so ORDER BY id DESC replaces ORDER BY created_at DESC without an extra index
  • BIGINT is 8 bytes vs UUID's 16 bytes -- half the storage, better cache utilization, faster index operations
  • The entire implementation is 10 lines of SQL with no external service or dependency

Why We Needed Better IDs

We are building an order and product management platform for ecommerce -- the kind of backend that handles orders, KYC records, invoices, and billing events across multiple vendors at scale. At 10,000 orders per day, with the expectation that this grows, the choice of primary key strategy affects insert performance, query latency, and index maintenance in ways that compound over time.

When we started, we defaulted to UUID v4. It is the safe choice: globally unique, no collisions, works everywhere. We used it as most teams do:

id UUID PRIMARY KEY DEFAULT gen_random_uuid()

The problems showed up gradually.

The Problem with UUID

UUID v4 is random by design. Every new row gets an ID that lands somewhere arbitrary in the B-tree index. At low volume this is invisible. As our orders table grew, we started running into two concrete issues.

Index fragmentation. PostgreSQL's B-tree index organizes rows by key value. With sequential keys, each new insert appends near the end of the index -- predictable, cache-friendly, minimal page splits. With random UUID values, each insert can land anywhere, causing page splits and leaving index pages partially full. Over millions of rows, this becomes an index maintenance problem.

Expensive time-based queries. Our most common query is "show the latest orders." With UUID, you cannot ORDER BY id -- the values are random and carry no time information. Every paginated time-based query required a separate index on created_at and a sort that could not use the primary key. Two indexes maintained for what should be one operation.

We wanted IDs that are time-ordered by construction.

What We Considered

SERIAL / BIGSERIAL. Simple and fast. The problem: not safe across multiple services. If two application replicas insert concurrently through a single database, SERIAL works fine. But SERIAL gives you no way to generate IDs outside the database, and it requires a centralized sequence that becomes a bottleneck if you ever shard.

UUID v7. Time-ordered UUID -- the right modern answer to this problem. We evaluated it. It is still 16 bytes (128 bits), still requires extension support in PostgreSQL before version 17, and does not give you integer ordering semantics natively. We wanted something we owned entirely.

External ID service. A centralized Snowflake-compatible service adds a network hop to every insert and a new infrastructure dependency. For the problem we were solving, that trade-off was not justified.

Custom BIGINT generator in PostgreSQL. Eight bytes. No external dependencies. Lives entirely as a PostgreSQL sequence and function. This is what we built.

How the ID Is Structured

The ID is a 64-bit integer with three parts packed together using bitwise operations:

| timestamp (42 bits) | sequence (12 bits) | reserved (10 bits) |

The timestamp is milliseconds elapsed since a custom epoch -- January 1, 2025, 00:00:00 UTC (1735689600000 in Unix milliseconds). Using a recent epoch keeps the numbers smaller and gives approximately 70 years of range before the 42-bit region overflows.

The sequence is a PostgreSQL sequence modulo 4096 (2^12). Within any given millisecond, up to 4,096 unique IDs can be generated without collision.

The reserved bits are currently unused. They are available if we extend this to a full multi-node setup by adding a machine or shard identifier -- the exact pattern used in Twitter's original Snowflake implementation.

The SQL

Two database objects: a sequence and a function.

CREATE SEQUENCE public.common_seq;
-- 42 bits for timestamp, 12 bits for sequence, 10 bits reserved
CREATE OR REPLACE FUNCTION public.next_id(OUT result bigint) AS $$
DECLARE
    start_epoch bigint := 1735689600000;
    seq_id      bigint;
    now_millis  bigint;
BEGIN
    SELECT MOD(nextval('public.common_seq'), 4096) INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;

    result := (now_millis - start_epoch) << (64 - 42);
    result := result | (seq_id << (64 - 42 - 12));
END;
$$ LANGUAGE PLPGSQL;

One detail worth calling out: clock_timestamp() instead of now(). PostgreSQL's now() returns the transaction start time -- every row in a bulk insert inside a single transaction gets the same timestamp. clock_timestamp() returns the actual wall clock time at the moment of each call. For an ID generator, this distinction matters.

To use it on a table:

CREATE TABLE orders (
    id      BIGINT PRIMARY KEY DEFAULT next_id(),
    -- other columns
);

In Prisma:

model Order {
  id BigInt @id @default(dbgenerated("next_id()"))
}

No application-layer changes needed. The database generates the ID on insert, and the value comes back in the query result exactly as any other default would.

Results

We did not run isolated benchmarks before and after the migration, so what follows is directional rather than a controlled measurement. The improvements are consistent with what PostgreSQL's own documentation and the broader community documents for sequential vs random primary keys:

MetricUUID v4Custom BIGINT
Storage per ID16 bytes8 bytes
Index localityRandom page accessSequential append
Time-based sortRequires created_at indexORDER BY id DESC
Max throughputUnlimited4,096 IDs/ms

For our query pattern -- paginating the latest orders -- ORDER BY id DESC on the primary key outperforms ORDER BY created_at DESC on a secondary index. We removed one index from the orders table. The same next_id() function is now the default for orders, KYC records, invoices, and billing events, giving consistent time-ordering semantics across all tables without any additional indexing strategy.

At 10,000 orders/day (roughly one insert every 8 seconds on average), the per-query savings are small. The benefit is that the index stays clean as volume grows, and the created_at index maintenance cost disappears entirely.

What We Got Wrong

The bit layout comment does not match the code. The original comment in the function said "48 bits for timestamp, 12 bits for sequence, 4 bits reserved." The actual shift expression (64 - 42) produces a 42-bit timestamp region, leaving 10 bits reserved -- not 4. If you copy this function, verify that the shift values match your intended layout before deploying. The uniqueness guarantees hold either way, but the timestamp range and reserved bit count differ from what the comment says.

Clock dependency. If the system clock goes backward -- NTP adjustment, VM live migration, container restart on a host with clock skew -- IDs generated during that window can break monotonic ordering. The sequence prevents collisions, but ordering is not guaranteed for the affected milliseconds. We have not hit this, but it is a real failure mode to plan for in production.

4,096 IDs/ms is not infinite. At our current scale we average well under 1 insert per second. But any batch job that inserts 10,000+ rows in a tight loop will exhaust the per-millisecond capacity and wrap the sequence within a single millisecond. Uniqueness is preserved -- the timestamp changes before the sequence can produce a duplicate -- but strict ordering within that millisecond is not guaranteed.

This is a single-database design. One PostgreSQL instance, one common_seq, no coordination required. If you shard writes across multiple databases or move to a distributed setup, the sequence is no longer shared and ID uniqueness breaks. The reserved 10 bits exist to solve exactly this problem -- encode a machine or shard ID there -- but that requires incrementally more setup and is not what we have today.

For our use case -- a single PostgreSQL instance handling 10,000 orders/day with room to grow -- the trade-offs are worth it. If write volume grows into the millions per day or we move to multi-primary replication, we will revisit the machine-ID extension.

Tags:postgresqlid generationbackenddatabaseecommerce