OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

OpenAI has recently shared insights on how it successfully scaled PostgreSQL to accommodate millions of queries per second, particularly for its ChatGPT service and API platform, which collectively serve hundreds of millions of users worldwide. This endeavor underscores the limits of a single-primary PostgreSQL instance, especially when faced with write-intensive workloads that necessitate the exploration of distributed solutions. The journey highlights the critical design trade-offs and operational guidelines essential for delivering a low-latency service that is globally accessible.

Scaling Challenges and Solutions

As the demand on PostgreSQL surged more than tenfold over the past year, OpenAI collaborated with Azure to enhance its deployment on Azure Database for PostgreSQL. This optimization enabled the system to effectively support 800 million ChatGPT users while maintaining a single-primary instance with ample headroom for growth. Key optimizations were implemented across both application and database layers, including increasing instance size, refining query patterns, and leveraging additional read replicas. By minimizing redundant writes through application-level tuning and directing new write-heavy workloads to sharded systems like Azure Cosmos DB, PostgreSQL was preserved for relational workloads that demand strong consistency.

The primary PostgreSQL instance is bolstered by nearly 50 geo-distributed read replicas on Azure Database for PostgreSQL. This architecture allows for the distribution of read operations across replicas, ensuring that p99 latency remains in the low double-digit milliseconds range, while writes are centralized with strategies in place to mitigate unnecessary load. Techniques such as lazy writes and application-level optimizations further alleviate pressure on the primary instance, thereby ensuring consistent performance even during global traffic surges.

Operational Challenges and Mitigations

As traffic levels increased, OpenAI encountered several operational challenges, including cache-miss storms and complex multi-table join patterns often triggered by ORMs. These issues, along with service-wide retry loops, were identified as frequent points of failure. In response, OpenAI shifted some computational tasks to the application layer, enforced stricter timeouts on idle and long-running transactions, and refined query structures to minimize disruption to autovacuum processes.

A pivotal strategy involved reducing write pressure on the PostgreSQL instance. The MVCC model of PostgreSQL can lead to increased CPU and storage overhead under heavy update conditions due to version churn and vacuum costs. To counteract this, OpenAI migrated shardable workloads to distributed systems, implemented rate-limiting for backfills and high-volume updates, and adhered to disciplined operational policies to prevent cascading overloads.

In a recent LinkedIn post, Microsoft Corporate Vice President Shireesh Thota remarked,

Every database is optimized differently and needs the right tuning to get it to work at scale.

Connection pooling and workload isolation emerged as essential components of the scaling strategy. By utilizing PgBouncer in transaction-pooling mode, OpenAI effectively managed PostgreSQL’s connection limits, reducing connection setup latency and preventing spikes in client connections. This approach allowed for the isolation of critical and non-critical workloads, thereby mitigating the impact of noisy neighbor effects during peak demand periods.

Future Directions and Innovations

Scalability constraints also presented challenges with read replication. As the number of replicas increased, the primary instance faced additional CPU and network overhead due to the need to stream the WAL to each replica. To alleviate this burden, OpenAI is exploring cascading replication, where intermediate replicas can relay WAL downstream, thus reducing the load on the primary instance while accommodating future growth. These innovative strategies enable PostgreSQL to handle exceptionally large-scale, read-heavy AI workloads across geo-distributed regions, while sharded systems manage write-intensive operations to ensure stability and performance.

OpenAI continues to assess avenues for extending PostgreSQL’s scalability envelope, including the potential for sharded PostgreSQL deployments and alternative distributed systems. This ongoing evaluation aims to strike a balance between strong consistency guarantees and the increasing demands of global traffic and diverse workloads as the platform evolves.

About the Author

Leela Kumili

Show moreShow less

Tech Optimizer