‘PostgreSQL Eats the World, But CockroachDB Digests It’

The database market is currently experiencing transformative shifts, largely influenced by the escalating demands for scalability, resilience, and the rise of AI agents. In a recent conversation with AIM, CockroachDB CEO Spencer Kimball emphasized the growing importance of distributed SQL databases, particularly those built on a robust PostgreSQL foundation, for businesses of all sizes—not just the tech behemoths.

The hallmark of CockroachDB is its horizontal scaling capabilities. While it aims to preserve a PostgreSQL-like interface, the nature of distributed operations necessitates a unique approach. “Cockroach didn’t reject Postgres. It re-architected it from the ground up to meet the scale, distribution, and the consistency AI demands,” Kimball articulated. He further elaborated that achieving a 100x scale on a monolithic architecture is virtually unattainable, which is where distributed SQL databases like CockroachDB come into play, engineered for “serious scale, like hundreds of terabytes into petabytes” of operational data. “Postgres may be eating the world, but AI needs a database that can digest,” he added.

Kimball clarified that his focus is on operational databases rather than analytical ones. “It’s about the metadata that tracks the product or service, all the activity, and the high level of concurrent operations that demand strong consistency,” he noted. Both humans and AI agents will access this data, with these agents functioning at high speeds and continuously engaged in repetitive tasks throughout the day, thereby generating an ever-increasing volume of traffic.

What’s Next from Cockroach

Looking ahead, Kimball envisions AI playing a pivotal role in observability and support. “AI can move much faster. If you give it the right scenarios and train it, then what could have taken several hours to fix might only take several minutes,” he explained. Another focal point for CockroachDB is vector indexing. “Customers want nearest-neighbour search in high-dimensional spaces at scale. They want it fast and consistent, even as data changes,” he stated.

However, Kimball was quick to clarify that CockroachDB is not positioning itself as a general-purpose vector database. “We’re not trying to compete with OpenSearch, Elastic, or MongoDB on vector search. If you’re already using CockroachDB for mission-critical relational workloads, you want vector support there. Not everyone needs that, but for our users, it’s essential.” He reiterated that while they are not aiming to dominate the vector index market, it remains a significant modality.

Kimball also discussed the importance of cost reduction. “Nobody wants to pay 10x more because their workload scales 10x. CockroachDB can improve utilization with multi-tenancy,” he explained. By averaging out peaks and troughs across a large cluster, customers can enhance utilization from 10% to 50-60%. The company is also exploring cloud cost efficiencies, with Kimball asserting that CockroachDB’s architecture allows for the use of spot instances, disaggregated storage, and storage tiering. “We believe we can reduce costs by 10 to 16x in the next few years,” he added.

Moat of Cockroach

Kimball pointed out that CockroachDB’s strength lies in its geographic scalability. “We have customers in the EU, the US, and India. If you want to make your service span all of those places, Cockroach has some really interesting capabilities that are different.” He illustrated this with an example from the US sports betting sector, where customers utilize Cockroach nodes across multiple states to comply with data locality laws, ensuring that data is processed where bets are placed.

Furthermore, he highlighted CockroachDB’s cloud-agnostic nature and support for hybrid deployments. “Big banks and tech companies use private data centers and all three major clouds. We let customers run the database wherever their business needs it.” One of the key challenges, he noted, is the integration of AI into database operations. “It’s not easy to run distributed systems. When something goes wrong, you want the root cause before a human even looks at it. AI can help.”

Regarding competition with cloud vendors, Kimball remarked, “They’re both competitors and partners. Big clouds don’t want to serve self-hosted enterprise customers, and those customers don’t want to be tied to one cloud. CockroachDB fits well there.” He mentioned that clouds often refer such customers to CockroachDB, stating, “They say, ‘We can’t run this in your data center, but CockroachDB can.’ That’s why the partnership works.”

Why Postgres

Kimball elaborated on how CockroachDB strives to maintain a close resemblance to the Postgres experience while adapting key behaviors for scalability in distributed environments. “So well, it tries to look as much like Postgres as possible.” A notable example is ID generation. Traditional Postgres allows for monotonically increasing sequences, such as auto-incrementing IDs for user records, which function smoothly in monolithic systems but falter at massive scales.

“In a monolithic system… that counter, it’s all just in one place… But once you say, I want to do 10 million of these concurrently… you don’t want them all going to one node that holds a counter.” CockroachDB employs a distributed approach to sequence generation, making it more scalable while sacrificing some linearity. “It will look the same as a sequence. But… we have a more distributed mechanism to assign IDs… they’re not just counting 1,2,3,4,5.”

He acknowledged the distinctions between Postgres and MySQL users, noting that both databases serve structured data effectively. “There’s room for both,” he stated. Kimball identified the greater challenge as being how databases are operated rather than how they are utilized by applications. He remarked that system administrators and DBAs experienced with one database may face a steeper learning curve when transitioning to another due to differences in tools, management styles, and best practices. “If you’re very good as a system administrator or like a DBA using Postgres, then it’s a lot more new stuff to learn.”

Ultimately, he suggested that the decision often hinges on what teams are already accustomed to operating. “If you’re good at MySQL, moving to distributed MySQL, then TiDB makes sense,” he referenced TiDB CTO Ed Huang’s belief that MySQL will power AI agents.

Journey of the Cockroach

Cockroach Labs was established in 2015 by former Google employees Kimball, Peter Mattis, and Ben Darnell, drawing inspiration from Google’s Bigtable and Spanner databases. Kimball recounted that in the early 2000s, systems like Google’s Bigtable opted for a non-SQL approach not out of disdain for SQL, but to simplify the focus on scalability. “It was just easier not to have to do all that stuff and also build something that is elastically scalable and more survivable.”

Over time, however, the industry began reintroducing SQL features. MongoDB incorporated transactions, while Google layered SQL atop Spanner with F1. “They created a whole new distributed architecture, but they left all of the hard stuff and started adding the hard stuff back on top of it,” Kimball noted.

He pointed out that NoSQL systems, such as Cassandra, offer flexibility and scalability but often lack in consistency and schema management. “If you have 50 people working on a complex, mission-critical product… it just becomes impossible.” By 2015, the CockroachDB team had a clear understanding of their target users, which included major banks, tech firms, and other high-stakes organizations. Instead of creating a new SQL dialect, they opted for PostgreSQL, believing it to be the cleanest and most suitable choice with the most upward momentum.

Tech Optimizer
‘PostgreSQL Eats the World, But CockroachDB Digests It’