How OpenAI Scaled to 800 Million Users on PostgreSQL

OpenAI has successfully scaled PostgreSQL to accommodate over 800 million active users of ChatGPT, showcasing one of the largest PostgreSQL deployments globally. This achievement illustrates that the strategies employed can be beneficial for teams operating at various scales, from thousands to millions of users.

The Challenge: 800 Million Users on PostgreSQL

ChatGPT’s rapid growth necessitated a robust database capable of handling millions of concurrent connections and an enormous volume of requests per second. OpenAI opted to stick with PostgreSQL, a decision rooted in its proven reliability and extensive tooling, rather than transitioning to a NoSQL solution.

Metric	Scale
Active users	800+ million
Concurrent connections	Millions
Requests per second	Very high
Data growth	Massive

Strategy 1: Connection Pooling with PgBouncer

At scale, the primary bottleneck is often not the speed of queries but the number of connections. Each PostgreSQL connection can consume significant memory, making it impractical to maintain thousands of concurrent connections directly from application servers.

The Solution: PgBouncer

PgBouncer serves as a lightweight connection pooler that sits between the application and PostgreSQL. This allows multiple application instances to share a reduced pool of database connections, significantly optimizing resource usage.

flowchart LR
    subgraph apps[Application Servers]
        A1[fa:fa-server App 1]
        A2[fa:fa-server App 2]
        A3[fa:fa-server App 3]
        A4[fa:fa-server App N]
    end
    
    subgraph pooler[Connection Pooler]
        PG[fa:fa-water PgBouncer]
    end
    
    subgraph database[PostgreSQL]
        DB[(fa:fa-database Primary)]
    end
    
    A1 --> PG
    A2 --> PG
    A3 --> PG
    A4 --> PG
    PG --> DB
    
    style A1 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style A2 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style A3 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style A4 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style PG fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style DB fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

By implementing PgBouncer, OpenAI achieved a reduction in database connections from 10,000 to just 200, enhancing efficiency by a factor of 50.

PgBouncer Configuration

A basic configuration for PgBouncer can be set up as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [databases] myapp = host=localhost port=5432 dbname=myapp [pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt ; Pool mode: transaction is best for web apps pool_mode = transaction ; Connection settings max_client_conn = 10000 ; Max connections FROM app servers TO PgBouncer default_pool_size = 100 ; Connections FROM PgBouncer TO PostgreSQL (per database/user pair) min_pool_size = 10 ; Keep at least this many connections open reserve_pool_size = 5 ; Extra connections for burst traffic

Strategy 2: Read Replicas

Given that most applications experience read-heavy workloads, OpenAI employs read replicas to manage the load effectively. This architecture allows the primary database to handle writes while distributing read requests across multiple replicas.

How Read Replicas Work

flowchart LR
    subgraph App[" "]
        direction TB
        W[✏️ Writes]
        R[📖 Reads]
    end
    
    P[(🗄️ Primary)]
    
    subgraph Replicas[" "]
        direction TB
        R1[(Replica 1)]
        R2[(Replica 2)]
        R3[(Replica 3)]
    end
    
    W -->|write| P
    R -->|read| Replicas
    P -.->|sync| R1
    P -.->|sync| R2
    P -.->|sync| R3
    
    style W fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#991b1b
    style R fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#166534
    style P fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#92400e
    style R1 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
    style R2 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
    style R3 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af

Writes are directed to the primary database.
Reads are distributed among the replicas.
Changes made on the primary are synchronized with all replicas.

Strategy 3: Horizontal Sharding

When a single PostgreSQL instance reaches its limits, horizontal sharding becomes essential. This technique involves partitioning data across multiple instances based on a shard key, typically user_id or tenant_id.

Choosing a Shard Key

The choice of shard key is critical for ensuring even data distribution and maintaining related data together. Good shard keys include:

Good Shard Keys	Bad Shard Keys
user_id	created_at (hot spots)
tenant_id	country (uneven distribution)
organization_id	status (low cardinality)

Strategy 4: Query Optimization

As the scale increases, poorly optimized queries can lead to significant performance degradation. It’s essential to analyze slow queries using the EXPLAIN ANALYZE command to identify bottlenecks.

Index Strategies

Creating appropriate indexes based on query patterns is vital for maintaining performance at scale. Regularly review and optimize indexes to ensure they align with evolving access patterns.

Strategy 5: Connection Management

Effective connection management is crucial at scale. Implementing aggressive timeouts and setting connection limits can prevent overload and ensure system stability.

Strategy 6: Caching

Implementing caching strategies, such as application-level caching with Redis, can significantly reduce database load by serving frequently accessed data without hitting the database.

Strategy 7: Monitoring and Observability

Monitoring is essential for identifying issues before they escalate. Key metrics to track include connection counts, query latency, replication lag, and cache hit rates. Utilizing tools like pg_stat_statements can provide insights into query performance.

OpenAI’s architecture effectively combines these strategies to create a scalable PostgreSQL environment capable of supporting an immense user base while maintaining performance and reliability.

Tech Optimizer