Postgres and the Lakehouse Are Becoming One System — Here’s What Comes Next

The architecture of contemporary data systems is witnessing a transformative evolution. Developers today increasingly rely on a combination of Postgres for application needs and lakehouse technologies for analytics and data science. Postgres, traditionally the go-to for transactional workloads, has matured into a versatile operational database. Its reliability, adaptability, and extensibility make it suitable for a wide array of applications, from customer transactions and CRUD operations to real-time dashboards and AI-driven features. The ecosystem surrounding Postgres has expanded to include tools for real-time analytics, geospatial data, and advanced search capabilities.

Simultaneously, the emergence of open lakehouse technologies has revolutionized how organizations manage and analyze data at scale. With disaggregated storage, open table formats like Iceberg, structured data catalogs, and composable query engines, organizations can now analyze petabyte-scale data with remarkable precision and governance. This architecture not only mitigates vendor lock-in but also offers data teams the flexibility to choose their preferred tools.

What is particularly noteworthy is the increasing synergy between these technologies. Organizations are now tasked with supporting both operational workloads, driven by databases, and non-operational workloads, powered by lakehouses, often drawing from the same data sources. However, these systems frequently operate in silos, managed by different teams, leading to friction in their integration.

We advocate for a seamless integration of these systems. A new architectural paradigm is emerging, one that perceives Postgres and the lakehouse as complementary layers within a cohesive, modular framework designed to address the full spectrum of operational and analytical requirements.

The Limits of the OLTP vs OLAP Dichotomy

Historically, the database landscape was characterized by a clear division: OLTP systems for transactions and OLAP systems for analysis. Postgres was employed to power applications, while nightly ETL jobs transferred data to a data warehouse for reporting. This model served well in a simpler era when applications were less data-intensive and internal reporting could afford a slower pace. However, the landscape has shifted dramatically.

  • A financial application may operate a trading engine requiring millisecond access to customer portfolios while simultaneously generating real-time risk reports and internal dashboards.
  • A SaaS application goes beyond mere click storage; it calculates usage metrics, triggers alerts, and delivers personalized models.
  • An industrial monitoring system might process millions of sensor readings per hour, facilitating anomaly detection and alerting while archiving telemetry for long-term analytics and AI model training.

These scenarios are not anomalies; they are rapidly becoming the standard. We are witnessing a more pragmatic division: operational databases that drive products and lakehouses that empower organizations. Despite the distinct ownership of these systems—product-engineering teams managing operational systems and data teams overseeing lakehouse services—effective communication and collaboration between them are essential. By working with the same data and sharing underlying schemas, these teams can enhance system resilience and capability.

An Operational Medallion Architecture

A compelling trend gaining traction is what we refer to as an operational medallion architecture. Drawing inspiration from the medallion models prevalent in data engineering, this architecture incorporates bronze, silver, and gold layers, not only for internal analytics but also for real-time, user-facing systems.

  • Bronze Layer: This layer consists of raw data stored in Parquet or Iceberg files on cost-effective storage solutions like AWS S3. The data is typically immutable, append-only, and accessible via various query engines, including AWS Athena, DuckDB, and Trino, or directly from an operational database like Postgres.
  • Operational Silver Layer: Here, cleaned, filtered, validated, and deduplicated data is written into Postgres to support real-time analytics, dashboards, or application logic for user-facing products.
  • Operational Gold Layer: This layer features pre-aggregated data derived from the silver layer (such as materialized views in Postgres) to deliver low-latency, high-concurrency product experiences, typically maintained within the database for consistency.

Each layer remains queryable, facilitating bidirectional data movement. Raw or transformed data can be pulled from S3 directly into Postgres, and aggregates can be rolled up from Iceberg into Postgres tables. This integration eliminates the need to replicate identical pipelines across both systems, reducing complexity and enhancing consistency.

A common pattern observed in applications requiring real-time data involves writing from upstream streaming systems like Kafka or Kinesis simultaneously to both S3 (for unmodified bronze data) and Postgres (utilizing database schemas for validation). Subsequently, silver tables and gold aggregates in the database can be exported back to S3, providing data teams access to the “ground truth data” served to customers. This approach allows each system to maintain its distinct responsibilities while ensuring that the operational database remains secure and efficient.

Why Now? Technical Forces Driving the Shift

Several key developments are facilitating the transition from siloed operational databases and lakehouses to integrated systems. First, Iceberg has matured into a robust and flexible table format that supports schema evolution, ACID transactions, and efficient compaction. It allows multiple compute engines to interact with the same datasets, with catalog layers that manage metadata and enforce governance.

Second, Postgres continues to evolve as a platform. With extensions for columnar storage, time-series data, and hybrid search capabilities, Postgres now supports a variety of products that incorporate real-time analytics and dynamic workflows. Emerging support for querying S3 and Iceberg data directly from Postgres further enhances its role, allowing it to serve both transactional and analytical data.

Third, there is a growing demand for composability among developers. While some organizations may still rely on legacy monolithic data platforms, the majority of developers and data scientists seek the flexibility to create customized stacks that align with their application needs. The shift towards open formats and disaggregated storage resonates with this desire for control, particularly in regulated environments where data sovereignty is paramount.

In essence, the market is gravitating towards modular, open, and developer-friendly architectures.

What Comes Next

The future of data infrastructure is poised to be shaped by systems that integrate operational and analytical layers more profoundly—systems that view Postgres and the lakehouse as interconnected components of a unified framework. This evolution will not arise from another monolithic solution but rather through carefully crafted interfaces, incremental synchronization, shared catalogs, and unified query surfaces. An architectural philosophy that embraces heterogeneity will be essential.

We are actively developing innovative solutions in this space, leveraging the strengths of Postgres and Iceberg to create a tightly integrated environment with existing lakehouse systems. Our goal is to simplify the construction of full-stack data systems that maintain operational and analytical fidelity. This initiative transcends traditional ETL processes, focusing instead on establishing a coherent modern data architecture that serves both operational and non-operational use cases seamlessly.

Stay tuned for further developments in this exciting domain.

Tech Optimizer
Postgres and the Lakehouse Are Becoming One System — Here’s What Comes Next