Understanding transaction visibility in PostgreSQL clusters with read replicas

On April 29, 2025, Jepsen released a detailed report examining transaction visibility behavior within the Amazon Relational Database Service (Amazon RDS) for PostgreSQL and its Multi-AZ clusters. We commend Jepsen for their thorough analysis and wish to provide additional context regarding this behavior, which is present in both Amazon RDS and community PostgreSQL. Interestingly, our internal testing has corroborated these findings, and we have been collaborating with the community to propose a resolution for this long-standing issue, which has been acknowledged and discussed since at least 2013.

The crux of the reported issue lies in the differing visibility order of transactions between primary and replica nodes in cluster configurations. Notably, this anomaly does not result in data loss or corruption, is absent in Single-AZ PostgreSQL deployments, and does not affect Amazon Aurora PostgreSQL Limitless Database or Amazon Aurora DSQL databases.

In this discussion, we will delve into the specifics of the issue, explore its potential impact on various architectural classes, share possible workarounds, and reaffirm our commitment to enhancing community PostgreSQL across all dimensions, including correctness.

Understanding the Long Fork Behavior

The report brings to light what is referred to as a Long Fork anomaly in database literature, which constitutes a breach of Snapshot Isolation. This anomaly allows for the possibility that two readers may witness the effects of transactions in a different sequence. For instance, consider two concurrent transactions, T1 and T2, that modify distinct rows in a PostgreSQL database configured in a cluster. A query Q1 issued to the primary might reveal that T1’s effects are already reflected in the table, while T2’s are not. Conversely, a query Q2 directed at a replica might observe T2’s effects as visible, but not those of T1.

This behavior stems from the fact that, on a PostgreSQL primary (in both standalone and replicated setups), the order in which the effects of non-conflicting transactions become visible can diverge from the order in which they achieve durability. In simpler terms, the visibility order of transactions does not consistently align with their logged commit order. When PostgreSQL captures a snapshot, it logs the list of transactions that were pending at that moment, which are tracked in the ProcArray. The effects of these pending transactions are permanently excluded from the snapshot, even after they commit. At the time of commitment, a transaction ensures its durability by recording its status in the Write-Ahead Log (WAL) and subsequently removes itself from the ProcArray asynchronously. Therefore, if T1 and T2 commit concurrently, T1 may achieve durability before T2 (in WAL), yet T2 may remove itself from the ProcArray prior to T1.

This behavior has been recognized within the PostgreSQL community for many years, with extensive discussions taking place on the pgsql-hackers mailing list since 2013. The Long Fork anomaly affects all isolation levels (Read Committed, Repeatable Read, and Serializable) in community PostgreSQL and is not exclusive to Amazon RDS; it can also be replicated in self-managed PostgreSQL deployments.

Example of Potential Impact

To illustrate the manifestation of the Long Fork, consider a hypothetical disagreement between Alice and Bob regarding whether the Jepsen post ever reached the #1 position on Hacker News. Assume that the number of page views for each post is recorded in separate rows of a relational table stored in a PostgreSQL database. The ranked list of posts is generated using a SQL query that sorts the posts by page view count. Alice accesses the website from one location, while Bob accesses it from another. Alice’s application server, which renders the ranked list, routes queries to the PostgreSQL primary, whereas Bob’s queries are directed to the replica.

As Alice and Bob refresh their browsers, they observe the rising popularity of Jepsen’s post. Alice sees the post ascend to #1 and captures a screenshot for proof. Meanwhile, Bob only sees the post at #2. He challenges Alice’s claim and requests the maintainers of Hacker News to provide him with the commit log of transactions that incremented the page view counters. From the log, Bob concludes that the tracked post nearly reached the top rank, but just before it did, its page view count was surpassed by another post due to a concurrent click. While Bob’s assertion is technically correct, as he was reading from the replica, Alice has her screenshot as evidence. Had the replica not existed and Alice lacked access to the commit logs, her claim would have been valid, and the observed behavior would align with the semantics of Snapshot Isolation.

Aligning Visibility Order with Commit Order

Although this behavior signifies a departure from formal Snapshot Isolation guarantees, it seldom affects application correctness in practice. Most applications inherently serialize their operations through application-level constraints or by working with related data that creates direct conflicts. PostgreSQL committers have explored various solutions that were discussed on mailing lists and presented at PGConf.EU 2024. One proposed solution aims to synchronize the visibility order with the commit order using Commit Sequence Numbers (CSNs). This fix is complex and involves multiple patches.

While the Long Fork anomaly may seem esoteric from an end-user perspective, addressing it is crucial for enhancing PostgreSQL clusters with advanced enterprise-grade capabilities. For instance:

  • Support in distributed systems – Distributed PostgreSQL systems cannot utilize the visibility order, as obtaining a consistent list of pending transactions across nodes is practically unfeasible. In contrast, Aurora Limitless and Aurora DSQL implement consistent snapshots using time-based Multi-Version Concurrency Control (MVCC), thereby avoiding the Long Fork anomaly.
  • Query routing and read/write splitting – Offloading read-only queries and subqueries to synchronously updated or caught-up read replicas may lead to non-repeatable reads if visibility order diverges from commit order.
  • Data synchronization – Taking a snapshot of the database state on the primary and subsequently rolling the state forward using the transaction log may introduce inconsistencies.
  • Point-in-time restore – Restoring the database to a specific log sequence number (LSN) might yield a state that was never observable on the primary, complicating the analysis of application-induced data corruption.
  • Storage layout optimization – Replacing transaction identifiers in tuples with logical or clock-based commit time during query execution could result in non-repeatable queries.
  • CPU utilization – Large production PostgreSQL servers can support thousands of connections. In high-throughput, read-heavy workloads, a significant portion of CPU resources is expended on taking snapshots.

AWS’s Commitment to PostgreSQL

At AWS, our dedication to PostgreSQL’s success is unwavering. In 2022, we established the PostgreSQL Contributors Team, focused on contributing to the core PostgreSQL engine. Our team actively engages in the PostgreSQL community’s development initiatives, employing leading database researchers who are advancing the state of the art in distributed databases. We uphold rigorous systems correctness practices, including formal methods for verification.

We remain committed to collaborating with the PostgreSQL community to address the enduring Snapshot Isolation anomaly in PostgreSQL.

Further Reading

For those interested in exploring the work within the PostgreSQL community, understanding how this issue is resolved in Aurora DSQL and Aurora Limitless, and delving into the research background, we recommend the following resources:


About the Author

Sergey Melnik is a Senior Principal Technologist at AWS, specializing in distributed systems, data management, and cloud computing.

Tech Optimizer
Understanding transaction visibility in PostgreSQL clusters with read replicas