Introducing pg_lake: Integrate Your Data Lakehouse with Postgres

In the realm of data management, pg_lake emerges as a transformative tool, offering a seamless native Postgres experience for engaging with open data lakehouse formats such as Iceberg and Parquet.

The journey to `pg_lake`

The inception of pg_lake can be traced back to a keen observation made by the talented team at Crunchy Data, now part of Snowflake. With a rich history of developing foundational Postgres extensions like PostGIS, Citus, pg_cron, and pg_partman, this team has been intimately acquainted with the challenges faced by customers. A common issue was the fragmentation of data between Postgres databases and object storage, creating a persistent barrier to efficient data management.

Recognizing this gap, the team set out to create a solution that would provide a native and intuitive Postgres experience. Their journey began with the launch of Crunchy Bridge for Analytics over 18 months ago, a time when alternative solutions were scarce. Unbeknownst to them, this initiative was a catalyst for a movement advocating the integration of Postgres within a data platform that harmonizes transactional and analytical capabilities. As the landscape evolved, and following Crunchy Data’s acquisition by Snowflake, the significance of enhancing Postgres’s interoperability with analytics became increasingly clear.

Why we are open sourcing `pg_lake`

Our conviction is that Postgres serves as a cornerstone of the modern data stack, transcending its role as merely a relational database. With its robust features for JSON handling, geospatial data processing through PostGIS, and vector search capabilities via pgvector, Postgres stands as a comprehensive operational data platform. Concurrently, the data lakehouse paradigm has emerged as the preferred method for managing analytical data at scale.

As organizations increasingly blend operational and analytical functionalities to enhance modern applications and customer experiences, Snowflake provides an exceptional data platform to facilitate this integration. By augmenting Postgres’s capabilities, we anticipate empowering more developers to harness the potential of modern, data-driven, and AI-centric applications.

Our decision to open source this technology is driven by several key objectives:

Establish a standard: We aim to foster a robust, open standard for a more cohesive Postgres ecosystem that benefits all stakeholders.
Empower developers: The demands of modern applications and AI necessitate a blend of operational and analytical features. With pg_lake, the Postgres community will unlock new use cases and accelerate innovation.
Commit to Postgres: Snowflake’s dedication to the success of Postgres is unwavering. This release exemplifies our commitment to expanding the horizons of what can be achieved with the world’s most cherished open-source database.

As pg_lake and Snowflake Postgres integrate into the broader Snowflake ecosystem, we eagerly anticipate a future where the divide between operational and analytical data is effectively bridged.

Get started today

We are thrilled to introduce pg_lake to the community. It is now available for exploration and use, inviting developers to embark on a journey of innovation.

¹ These blogs are an archive from Crunchy Data, published prior to its acquisition by Snowflake.

Tech Optimizer

Introducing pg_lake: Integrate Your Data Lakehouse with Postgres

The journey to pg_lake

Why we are open sourcing pg_lake

Get started today

The journey to `pg_lake`

Why we are open sourcing `pg_lake`