Cloud data platform vendor Snowflake has open-sourced its PostgreSQL extensions to enhance integration with its lakehouse system. The new pg_lake extension allows developers to read and write directly to Apache Iceberg tables from PostgreSQL, streamlining data management without the need for data extraction. The PostgreSQL extensions, developed by Crunchy Data, are licensed under the Apache license. Snowflake acquired Crunchy Data for [openai_gpt model="gpt-4o-mini" prompt="Summarize the content and extract only the fact described in the text bellow. The summary shall NOT include a title, introduction and conclusion. Text: Cloud data platform vendor Snowflake has recently taken a significant step by open-sourcing its set of PostgreSQL extensions. This initiative aims to facilitate seamless integration between the widely-used open-source database and Snowflake's lakehouse system, enhancing the capabilities for developers and data engineers.
Integration with Apache Iceberg
With the introduction of pg_lake, developers can now read and write directly to Apache Iceberg tables from PostgreSQL. This innovation eliminates the cumbersome process of data extraction and movement, allowing users to leverage their existing PostgreSQL setups more effectively. Apache Iceberg is recognized for its open table format, which enables users to utilize their preferred analytics engines without the need to relocate data. The format enjoys backing from major players in the industry, including Snowflake, Google, and AWS.
Christian Kleinerman, Snowflake's executive vice president of product, shared insights with The Register about the implications of this open-source extension. He emphasized that it empowers developers using PostgreSQL to transform their database into a management interface for an open lakehouse. The lakehouse concept, initially introduced by Databricks five years ago, serves as a unified system for managing both structured and unstructured workloads.
Kleinerman elaborated on the practical applications of this integration: “One of the most common use cases for developers [will be] to build applications against PostgreSQL and then [move] or copy the data for analytics into either a data platform like Snowflake or increasingly, an open data lakehouse like Iceberg tables on S3 Tables in [AWS] or Microsoft Onelake [in Fabric]… that data now becomes available for analytics.”
Development and Licensing
The PostgreSQL extensions are available under the Apache license and were initially developed by Crunchy Data, a PostgreSQL specialist startup. Snowflake acquired Crunchy Data for 0 million in June of this year, further solidifying its commitment to enhancing PostgreSQL capabilities within its ecosystem.
In a recent blog post, Craig Kerstiens, Snowflake's software engineering director, highlighted that pg_lake enables developers to manage Iceberg tables directly in PostgreSQL. This is achieved by introducing a new Iceberg table type where PostgreSQL serves as the catalog. Additionally, developers can query raw data files in the data lake, external Iceberg tables, Delta tables, and various geospatial file formats directly from PostgreSQL.
Market Insights
Robert Kramer, vice president and principal analyst at Moor Insights & Strategy, commented on the strategic significance of this development. He noted that providing PostgreSQL users with a direct pathway into Snowflake’s lakehouse and AI capabilities without necessitating architectural changes is a wise approach. “Most organizations are not ripping out PostgreSQL — and Snowflake clearly understands that. Pg_lake lowers the barrier for PostgreSQL teams to gradually adopt Snowflake for high-value analytics and automation, rather than treating it as an all-or-nothing platform decision,” he stated. Kramer anticipates that this will lead to incremental adoption and increasing traction over time, particularly as teams integrate operational databases with governed AI execution.
In addition to the pg_lake announcement, Snowflake unveiled the general availability of Snowflake Intelligence, an AI agent designed to empower users to pose complex questions in natural language, thereby making insights readily accessible to every employee. Enhancements have also been made to its Horizon data catalog.
However, Kramer pointed out that Snowflake may still need to address certain aspects such as scale, monitoring, and the real-world costs associated with agent workloads. He remarked, “Buyers might need some help understanding how Snowflake is different from Databricks and other cloud platforms. Snowflake is designed to be a platform where AI can work reliably and responsibly, not just for testing purposes. For customers who want to move from experimenting with AI to using it in real-world operations, this mindset is really important.”" max_tokens="3500" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" frequency_penalty="frequency_penalty"] million in June. The integration enables developers to manage Iceberg tables directly in PostgreSQL and query various data formats. Analysts suggest this development will facilitate gradual adoption of Snowflake's capabilities by PostgreSQL users. Snowflake also announced the general availability of Snowflake Intelligence, an AI agent for natural language queries, alongside enhancements to its Horizon data catalog.