Microsoft backs open source PostgreSQL analytics extension

A Microsoft-backed open source initiative is set to enhance PostgreSQL’s capabilities as an analytics database by integrating the in-process OLAP system, DuckDB. This project, licensed under the permissive MIT License, aims to address the challenges developers face when leveraging PostgreSQL for analytical purposes, despite its recognition as the most popular database among developers since 2023.

Enhancing PostgreSQL with DuckDB

PostgreSQL has enjoyed a steady rise in popularity over the past 28 years, attributed to its robust and extensible architecture. However, developers have encountered difficulties in constructing analytical systems that perform as efficiently as transactional systems. The forthcoming PostgreSQL extension, which embeds DuckDB’s analytics engine, seeks to remedy this limitation.

A recent blog post from MotherDuck, a provider of back-end services and extensions for DuckDB, highlights that the intellectual property of this extension is owned by the independent non-profit DuckDB Foundation, ensuring its open-source status. The project is hosted in the official DuckDB GitHub repository.

According to the blog, “PostgreSQL is great at a lot of things, but if you try to use it for analytics, you hit a wall pretty quickly.” While PostgreSQL excels at creating and locating individual rows, it can struggle with analyzing larger datasets efficiently. Traditional attempts to enhance PostgreSQL’s analytical performance have often fallen short, as effective analytics requires different query execution techniques, such as batch processing and deferred data decompression.

DuckDB operates as an embedded analytics engine within a host process, eliminating the need for separate DBMS server software installation or maintenance. For instance, the DuckDB Python package allows users to run queries directly on data within the Pandas library without the need for data importation or copying. Written in C++, DuckDB is freely available under the MIT License.

MotherDuck emphasizes that DuckDB’s vectorized query engine processes data in chunks, making it particularly adept at addressing analytical inquiries. The blog notes, “DuckDB’s popularity has been soaring due to its speed, ease of use, and versatility.”

The new pg_duckdb extension is supported by a coalition of organizations, including DuckDB Labs, the commercial support entity founded by DuckDB’s developers; Microsoft, which provides PostgreSQL services and employs PostgreSQL committers; Hydra, the data and software company that initiated pg_duckdb; and Neon, which specializes in serverless PostgreSQL database platforms. MotherDuck also plays a supportive role in this endeavor.

Andy Pavlo, an associate professor of databaseology at Carnegie Mellon University, remarked on the historical context of PostgreSQL’s analytical capabilities. He noted that developers have attempted to create OLAP versions of PostgreSQL since the early 2000s, often resulting in hard forks of the original code, such as Greenplum and Vertica. However, advancements in the PostgreSQL extension API since 2006 have paved the way for more integrated solutions.

Pavlo highlighted the significance of DuckDB’s optimizer, which, while not as battle-tested as PostgreSQL’s, is based on the highly regarded TUM’s HyPer optimizer. This design allows DuckDB to handle complex queries that other systems, including Microsoft SQL Server and Oracle, struggle with. He expressed confidence in the backing of the project by notable companies, stating, “If I was a developer that was already using Postgres + DuckDB in my stack and I wanted to be able to combine the two together, I would choose pg_duckdb over the other DuckDB Postgres extension alternatives.”

Tech Optimizer
Microsoft backs open source PostgreSQL analytics extension