Postgres has long been a favored choice for developers looking to bootstrap applications, thanks to its reputation for flexibility and reliability. However, as applications scale, they often encounter limitations with Postgres, particularly when faced with workloads it wasn’t originally designed to handle. This challenge has become more pronounced in the age of AI, where the pace of growth can accelerate dramatically, pushing developers to reach the limits of Postgres much sooner than before.
To address these challenges, a growing trend has emerged: the combination of Postgres and ClickHouse. In this architecture, Postgres continues to manage transactional workloads while ClickHouse takes on analytics. Both databases are open source, and their integration has fostered an ecosystem that enhances their capabilities.
Scaling Beyond PostgreSQL
In today’s AI-driven landscape, the growth trajectory of applications has shifted from years to mere months. This rapid development cycle leads to an increased demand for analytical capabilities, which often outstrip what Postgres can provide alone. User-facing applications now require real-time dashboards, recommendation systems, and extensive data searches, all of which depend on swift analytical queries. As these features become integral to user experience, the need for low-latency access to high-volume data becomes critical, and Postgres alone may not suffice.
How Postgres + ClickHouse Work Together
Integrating Postgres with ClickHouse presents two primary challenges: data integration and application integration.
Data Integration
There are two prevalent methods for integrating ClickHouse with PostgreSQL:
- Split or dual-write: In this approach, applications write data directly to both databases based on specific use cases. The split-write pattern directs data only to the necessary database, while the dual-write pattern sends all data to both systems simultaneously. This method is effective when there is a clear distinction in data usage.
- Change data capture (CDC): Here, all writes occur in PostgreSQL, which serves as the source of truth. A CDC process streams changes into ClickHouse, ensuring that analytical queries reflect the latest state without burdening the transactional database. This method is particularly useful for operational analytics, where consistency is vital, and analytical performance is paramount.
Application Integration
The integration of Postgres and ClickHouse aims to leverage each database’s strengths. Consequently, some queries will remain on Postgres, while others will transition to ClickHouse. While many applications utilize object-relational mappers (ORMs) with Postgres, this is less common with analytical databases. However, projects like MooseStack offer an ORM-like experience for ClickHouse.
To facilitate this integration, developers must identify which queries will migrate, particularly those involving large aggregate operations. API routes for these queries will need to be updated to direct SQL requests to ClickHouse. A backward-compatible pattern can allow for testing routes that switch between Postgres and ClickHouse, as demonstrated by clickhouse.build.
Another option is to use a foreign data wrapper (FDW) within Postgres, enabling queries to be sent to ClickHouse transparently, thereby simplifying the integration process.
An Open Source Ecosystem
The ecosystem surrounding Postgres and ClickHouse has matured into a robust stack. Many teams now default to pairing these databases, supported by a suite of open source and commercial tools designed to facilitate production-scale operations. These tools focus on reliable Postgres replication, efficient ingestion into ClickHouse, and seamless integration with existing Postgres workflows.
PeerDB
PeerDB is an open source project that provides high-throughput PostgreSQL CDC and dependable replication into ClickHouse. It effectively manages large update streams and schema changes without imposing additional load on the transactional database.
PostgreSQL Extensibility and FDWs
The extensibility of PostgreSQL allows teams to shift analytical workloads to ClickHouse without altering application code. FDWs enable external systems to be represented as standard PostgreSQL tables, allowing applications to continue issuing familiar SQL while heavy analytical queries run in ClickHouse.
ORMs and Developer Tooling
Projects like MooseStack illustrate that developer tooling is evolving to meet the needs of modern applications, facilitating the use of ClickHouse in environments where ORMs or schema-first development patterns are prevalent.
The ecosystem surrounding Postgres and ClickHouse is not merely a collection of tools; it represents a cohesive stack tailored for teams that have outgrown a single online transaction processing (OLTP) database and require a fast analytical engine without sacrificing the familiar Postgres development workflow.
The Future
As applications increasingly begin with Postgres and later adopt ClickHouse, the timeline for this transition is shrinking. Embracing this architecture from the outset of product development is becoming more advantageous. The future points towards managed services, hosted replication, and deeper integrations that promise a seamless experience where transactional and analytical systems operate in harmony.
The fundamental principle remains: Postgres and ClickHouse are not adversarial technologies; rather, they complement each other, forming the backbone of a modern open source data architecture that is adaptable, transparent, and production-ready.