Building a High-Performance Postgres Time Series Stack with Iceberg

Postgres extensions are redefining the landscape of modern data workloads, particularly in the realm of time series data management. A compelling synergy emerges when combining pg_lake, pg_partman, and pg_incremental. This trio offers a vendor-agnostic, fully open-source solution tailored for high-performance time series workloads, all while being mindful of cost considerations.

An open source Postgres time series stack

To appreciate this powerful stack, let’s delve into the key components:

  • PostgreSQL: The cornerstone of our stack, renowned for its reliability and open-source nature.
  • pg_partman: This extension automates the creation and management of time partitions for extensive tables, enhancing performance and easing maintenance burdens.
  • pg_lake: Designed to connect Postgres with data lakes like S3, this extension facilitates the offloading of older “cold” time series data to Apache Iceberg, while ensuring it remains queryable within Postgres.
  • pg_incremental: This extension empowers users to process append-only data in incremental batches, streamlining data handling.

All these extensions are diligently maintained by the Postgres team at Snowflake, who prioritize engineering excellence and cater to enterprise needs.

Hands-on example: Internet of Things sensor data

Let’s construct a system aimed at monitoring temperature readings. This architecture utilizes local “warm” storage for recent data, while transitioning “cold” data to an Apache Iceberg™ table on S3, optimizing for long-term storage and cost efficiency. Below is a structured overview of the sample process we will follow:

  1. Create a partitioned table in Postgres for time series data using pg_partman.
  2. Establish an Iceberg table via pg_lake (bonus: explore hidden partitioning).
  3. Employ pg_incremental to facilitate automatic appending to Iceberg.
  4. Utilize pg_partman to eliminate old partitions.
  5. Query from either local warm tables or cold Iceberg tables, effectively reducing the local Postgres storage footprint and associated infrastructure costs.

Create sample partitioned table

Our first step involves creating a local Postgres table designed to accommodate recent transactional data, partitioned by time. The advantages of employing a time-partitioned table for time series data are manifold. It allows for efficient deletion of outdated data, enhances the speed of querying specific time ranges, and mitigates fragmentation—ensuring that data from different time intervals is not stored adjacently. These factors collectively contribute to improved performance.

Tech Optimizer
Building a High-Performance Postgres Time Series Stack with Iceberg