Partitioning serves as a robust mechanism within PostgreSQL for managing extensive datasets by dividing large tables into smaller, more manageable segments known as partitions. The primary advantage of this approach lies in its ability to enhance query performance, particularly when conditions on the partitioning key are included. This allows the database management system (DBMS) to execute partition pruning, scanning only the relevant partitions and significantly accelerating query response times. Additionally, partitioning streamlines maintenance tasks, such as the efficient removal of obsolete data by dropping entire partitions.
However, challenges arise when dealing with indexes on partitioned tables. Historically, PostgreSQL’s standard indexing solutions have exhibited certain limitations. Developers often found themselves relying on local indexes, which are created independently for each partition. While local indexes provide simplicity and efficiency for operations confined to a single partition, they fall short in scenarios requiring uniqueness across the entire partitioned table or when querying data that spans multiple partitions.
What’s the problem with local indexes on partitioned tables?
- Uniqueness constraint. In standard PostgreSQL, creating a unique index or enforcing a UNIQUE/PRIMARY KEY constraint on a partitioned table necessitates that the index encompasses all partitioning key columns. This limitation poses a challenge when uniqueness is required on a column not included in the partitioning key, such as a user email in a table partitioned by registration date. Local indexes cannot provide a solution, as they only guarantee uniqueness within their respective partitions.
- Query performance across partitions. Local indexes are inherently limited to their specific partitions. Consequently, when a query spans multiple partitions—such as retrieving all customer orders over a year from a table partitioned by month—PostgreSQL must sequentially consult the local indexes of each relevant partition. This approach can quickly lead to inefficiencies, particularly with a high number of partitions.
In response to these challenges, Postgres Pro Enterprise has introduced global indexes, which operate at the level of the entire partitioned table, ensuring both global uniqueness and improved query performance across multiple partitions.
How do global indexes work?
Global indexes differ from local indexes in that they are created once for the parent partitioned table, encompassing data from all partitions. Physically, they store index entries for the specified columns across every partition. The search process using a global index in Postgres Pro unfolds in two distinct steps:
- Search in the global index. Upon querying a partitioned table, Postgres Pro first examines the global index. This index is constructed based on the specified columns, with the primary key columns of the partitioned table automatically included as non-key columns. These non-key columns are stored in the index but do not participate in the tree search.
- Search data in the partition by primary key. Once a matching entry is identified in the global index, the system utilizes the primary key values stored as non-key columns to swiftly locate the corresponding partition and row, thereby avoiding the need to scan all partitions.
Implementation details
- Extension and access method. This feature is implemented via the
pgpro_gbtreeextension, utilizing a newgbtreeaccess method that adapts the standard B-tree for partitioned tables. - Creating an index. The syntax for creating a global index is familiar but requires the partitioned table to have a primary key and the index to be declared using
USING gbtree:
-- First install the extension
CREATE EXTENSION pgpro_gbtree;
-- Example: unique global index on the email column
-- (assumes my_partitioned_table has a PRIMARY KEY)
CREATE UNIQUE INDEX my_global_unique_index
ON my_partitioned_table USING gbtree (email);
- Planner logic. The planner will utilize a global index only if both conditions are met:
- The query lacks a partitioning-key condition that would enable partition pruning, as pruning is deemed more efficient than using the global index.
- The query includes a condition on columns that the global index covers.
- Virtual access method. A special “virtual” access method has been implemented to facilitate data retrieval through a global index, as partitioned tables typically do not appear as scan nodes in plans.
- Size and structure. The global index itself is not partitioned; it constitutes a single index containing entries from all partitions. Thus, the standard PostgreSQL index size limit (generally 32 TB) applies. While the expectation is that the index will be smaller than the table, this limit is noteworthy.
- Tie-in with the primary key. A global index implicitly relies on the table’s primary key. As long as a global index exists, the primary key cannot be dropped (except through
CASCADE, which also removes the index). Primary key columns cannot be explicitly designated as index keys; they are always included automatically usingINCLUDE.
Performance
Testing has revealed notable trends regarding performance:
- Reads. Global indexes significantly enhance query speed when scanning numerous partitions. Their performance can rival that of a dedicated helper index table and may be several times faster than local indexes, particularly in scenarios with a high number of partitions (one test indicated a speedup of approximately 7× with 100 partitions).
- Writes (INSERT/UPDATE). Conversely, inserting data into a table with a global index tends to be slower (about 1.6–1.7× slower in tests involving 100 partitions) compared to a local B-tree index. This slowdown is attributed to the global index’s internal enforcement of uniqueness, which necessitates checking for duplicates (key + primary key) and locking hash buckets to prevent race conditions during concurrent inserts.
Operations and maintenance
- DELETE / DETACH PARTITION. Similar to regular B-tree indexes, global indexes do not immediately eliminate entries when rows are deleted. Instead, entries for deleted rows or detached partitions remain in the index as “garbage.” This behavior is typical, and global indexes do not automatically rebuild on
SPLIT/MERGE. - VACUUM. To remove obsolete entries from the global index, it is necessary to execute
VACUUMon the partitioned table. - AUTOVACUUM. Currently, AUTOVACUUM is not supported for global indexes. The primary challenge lies in collecting meaningful statistics at the level of the entire partitioned table. However, global indexes do not utilize Multi-Version Concurrency Control (MVCC), which means they bloat significantly less than standard B-trees. Consequently, the need for VACUUM is infrequent and sometimes unnecessary. When cleanup is required, it must be performed manually, though this is generally rare.
Current limitations
Global indexes were introduced in Postgres Pro Enterprise 17.5.1 as an experimental feature, and they come with several limitations:
CONCURRENTLYis not supported for creating or rebuilding (REINDEX), similar to regular indexes.- Expressions or predicates (
WHERE) in CREATE INDEX are not supported. - Explicit specification of
INCLUDEcolumns is not permitted (PRIMARY KEY columns are automatically included). - Duplicate columns in the index key are not allowed.
- Primary key columns cannot be part of the index key.
- Foreign keys (
FOREIGN KEY) referencing a unique global index are not supported. - A table with a global index cannot be attached as a partition to another table.
CLUSTERis not supported.ON CONFLICTis not applicable to unique global indexes.
Many of these limitations may be addressed in future releases as the feature matures and user feedback is gathered.
Use cases
Despite the existing limitations, global indexes present transformative opportunities in various scenarios:
- Enforcing global uniqueness. A compelling use case is the ability to guarantee uniqueness of a value (such as email or contract number) across the entire partitioned table, rather than just within individual partitions. A UNIQUE global index effectively addresses this need, which is crucial for sectors like finance, customer relationship management (CRM), and accounting.
- Faster queries on non-partitioning keys. For queries that frequently filter on a non-partitioning column and span multiple partitions—such as retrieving all customer orders by
client_idfrom a date-partitionedorderstable—a global index onclient_idcan significantly enhance performance. - Simpler development. Developers benefit from reduced complexity when writing queries that filter on globally indexed columns, as the table behaves as a cohesive whole.