Many organizations leveraging PostgreSQL are faced with the challenge of replicating data across distinct logical databases while maintaining strict control over what information is shared. For instance, a healthcare platform may need to transfer appointment data to an analytics database without compromising Protected Health Information (PHI). Similarly, a multi-tenant software as a service (SaaS) application might require separate read replicas that contain only the data of individual customers.
With the introduction of PostgreSQL 15, native logical replication now offers both column-level and row-level filtering directly within publications. This advancement allows users to precisely dictate which columns and rows are transferred from a PostgreSQL source database to a target database, eliminating the need for middleware or custom coding. It is important to note that this replication solution is exclusively PostgreSQL-to-PostgreSQL, necessitating that both the source and target databases are PostgreSQL instances.
How logical replication works in PostgreSQL
PostgreSQL logical replication functions on a publish-subscribe model. The source database establishes a publication that outlines the data to be shared, while the target database creates a subscription that connects to this publication to receive updates.
The process unfolds in the following steps:
- The source database logs all changes to the PostgreSQL Write-Ahead Log (WAL) at a logical level, capturing detailed row-level information.
- A publication serves as a filter atop the WAL, specifying which tables, columns, and rows (using
WHEREclauses) should be broadcast. Upon creation of a publication, PostgreSQL does not create a separate data buffer; rather, the source’s WAL sender process reads the WAL, applies the publication’s filters in-memory, and transmits only the filtered data to the subscriber. - The target database’s subscription connects to the source, retrieves the filtered WAL stream, and applies the changes locally.
The critical insight here is that filtering occurs at the source before any data is transmitted. The WAL sender process decodes the WAL, applies the publication filters, and only the relevant data is sent to the subscriber. Consequently, sensitive columns or irrelevant rows do not traverse the network and remain on the source.
Two filtering modes
- Column-level filtering – Users can specify exactly which columns to include in a publication. For instance, one might replicate employee_id, name, and department while excluding salary and ssn. This is defined using a column list in the
CREATE PUBLICATIONstatement:CREATE PUBLICATION pub_emp_safe FOR TABLE employees (employee_id, name, department); - Row-level filtering – Users can specify a
WHEREclause to determine which rows to replicate. For example, one might replicate only orders from a specific region:CREATE PUBLICATION pub_orders_na FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
Both filtering modes can be combined within a single publication for enhanced control. The PostgreSQL documentation on publication column lists provides comprehensive syntax details.
Enabling logical replication across environments
The SQL syntax for creating publications and subscriptions remains consistent across Amazon RDS for PostgreSQL, Amazon Aurora, and self-managed PostgreSQL. The primary distinction lies in how logical replication is enabled. For Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition, one must set rds.logical_replication = 1 in the custom parameter group (instance-level for RDS, cluster-level for Aurora) and reboot the instance. For self-managed PostgreSQL, the configuration involves setting wal_level = logical in postgresql.conf followed by a server restart. The subsequent sections will detail deployment steps for each environment.
Use cases
Multi-tenant SaaS data isolation
Row-level filtering facilitates effective data isolation for each tenant within a shared database. By creating a publication per tenant with a WHERE (tenant_id = 'customer_xyz') clause, each customer’s replica receives only their respective rows. Adding a new tenant is as simple as creating a new publication and subscription with a few SQL commands, avoiding any architectural overhaul.
Scalability considerations are essential. Each subscription utilizes a replication slot on the source, which initiates a dedicated WAL sender process (approximately 4 MB of memory each). PostgreSQL defaults to 10 replication slots (max_replication_slots) and 10 WAL sender processes (max_wal_senders), both of which can be configured at server startup. Additionally, each slot retains WAL segments until its subscriber consumes them, which can lead to increased WAL disk usage if a subscriber falls behind. For scenarios involving a large number of tenants, it may be prudent to group smaller tenants into shared publications to manage resource consumption effectively.
Regional data distribution for ecommerce
Row-level filtering enables regional fulfillment centers to receive only the order data pertinent to their geographical area. For example, a North American center would receive only orders from the US, Canada, and Mexico, while European and Asia-Pacific centers would receive their respective regional data. This targeted approach typically results in a 60-80% reduction in replication volume and storage costs.
Financial services PCI DSS compliance
Payment processors often need to analyze transaction patterns without storing sensitive data such as Primary Account Numbers (PANs) or CVV codes. Column-level filtering allows for the replication of transaction metadata while excluding restricted cardholder data columns, thereby minimizing PCI DSS scope on the target system.
Development and testing environments
Column-level filtering provides developers with a live, continuously updated feed of production data devoid of sensitive columns. This approach allows developers to work with real data volumes, relationships, and distributions while ensuring that customer Personally Identifiable Information (PII) remains secure. Since it utilizes live replication rather than a nightly batch job, the data is always current.
Retail active inventory replication
Retail chains managing extensive catalogs (500,000+ SKUs) can implement row-level filtering with a WHERE (status = 'active' AND (qty_on_hand > 0 OR qty_on_order > 0)) clause, ensuring that each store receives only the 50,000-80,000 active SKUs it requires, resulting in an 80-90% reduction in replication data.
Healthcare data compliance (HIPAA)
Healthcare organizations manage highly sensitive data, including Social Security numbers, diagnosis codes, medication histories, and insurance details. By utilizing column-level filtering, operational data such as appointment dates, doctor IDs, billing amounts, and procedure codes can be replicated while keeping PHI columns (patient names, SSNs, diagnosis codes, insurance IDs) on the source. This ensures that the analytics database does not receive any SSNs, and since sensitive data does not leave the source, audit trail requirements are simplified.
Prerequisites
Before proceeding, ensure the following prerequisites are met:
- PostgreSQL 15 or later on both source and target databases. The features for column-level filtering and row-level
WHEREclause filtering were introduced in PostgreSQL 15.0 and are available in all subsequent minor versions. While the source and target do not need to run the same major version, both must be PostgreSQL 15 or later to utilize these features. Refer to the RDS PostgreSQL 15 Release notes for the complete feature list. - Network connectivity between source and target instances. The target must be able to access the source on port 5432 (or your configured PostgreSQL port). For RDS/Aurora, ensure that security groups permit inbound traffic from the target’s IP or security group. Consult RDS security group configuration.
- A PostgreSQL user with replication privileges on the source. This user requires the
rds_replicationrole (for RDS/Aurora) or theREPLICATIONattribute (for self-managed). Additionally, the user must haveSELECTprivileges on the tables being published. - Logical replication enabled on the source instance:
- For RDS/Aurora: Set
rds.logical_replication = 1in your custom parameter group (for RDS) and reboot. - For self-managed: Set
wal_level = logicalinpostgresql.confand restart.
- For RDS/Aurora: Set
- Sufficient WAL retention. Logical replication slots retain WAL segments until all subscribers consume them. Monitor disk usage, especially if a subscriber goes offline for extended periods. For RDS, the
max_slot_wal_keep_sizeparameter can limit WAL retention. Refer to managing replication slots on RDS. - A PostgreSQL client such as psql to connect to both source and target instances. You can install and run psql from an Amazon EC2 instance within the same VPC as your databases. Alternatively, you can launch AWS CloudShell in VPC mode within the same VPC and install psql there.
Step-by-step deployment: Healthcare compliance example
To illustrate the HIPAA compliance use case, we will configure a source database to hold patient appointment data and replicate only the non-PHI columns to an analytics target.
Step 1: Enable logical replication on the source
For Amazon RDS for PostgreSQL:
- Modify your custom DB parameter group to enable logical replication:
aws rds modify-db-parameter-group --db-parameter-group-name --parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot" - Reboot the DB instance to apply the change:
aws rds reboot-db-instance --db-instance-identifier - Connect to the source instance and verify that logical replication is active:
SHOW wal_level;You should see “logical” in the output.
For Amazon Aurora PostgreSQL-Compatible Edition:
- Modify your custom DB cluster parameter group:
aws rds modify-db-cluster-parameter-group --db-cluster-parameter-group-name --parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot" - Reboot the writer instance of the cluster:
aws rds reboot-db-instance --db-instance-identifier - Connect to the writer instance and verify:
SHOW wal_level;
For self-managed PostgreSQL:
- Edit
postgresql.confand set:wal_level = logical - Restart the PostgreSQL service:
sudo systemctl restart postgresql - Connect and verify:
SHOW wal_level;
Step 2: Create the source tables
Connect to your source database and create the patient appointments table, which includes both operational data and PHI:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
1
Step 3: Insert sample data
Populate the table with test records spanning multiple departments and doctors:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
2
Step 4: Create the publication with column filtering
Create a publication that includes only the non-PHI columns. Exclude the patient_name, ssn, date_of_birth, and diagnosis_code columns by listing only the columns intended for replication:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
3
Verify that the publication was created correctly:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
4
The publication is currently metadata only; it defines what to replicate but does not initiate data transfer. No replication slot exists yet, and no WAL decoding is occurring. Data will begin flowing once a subscriber creates a subscription to this publication in the next step.
Step 5: Create the target table
Connect to your PostgreSQL target (analytics) database and create a corresponding table. This table must include at least the columns defined in the publication:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
5
Notice that this table only contains the non-PHI columns. You may also create the full 12-column table, with unreplicated columns remaining NULL, but defining only the necessary columns clarifies intent and prevents accidental data population later.
Step 6: Create the subscription on the target
Create a subscription on the target database to connect to the source publication:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
6
This subscription will immediately initiate an initial synchronization of the table, copying all existing rows (filtered by the publication’s column list) to the target. Following the initial sync, it will switch to streaming mode, applying changes in near real-time. Monitor the subscription status using the pg_stat_subscription view:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
7
Step 7: Verify the replication
On the target, query the replicated data:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
8
Expected output should show only the operational columns, with no patient names, SSNs, dates of birth, or diagnosis codes present. The analytics team can now run queries on this data without exposing sensitive information.
Step 8: Test Ongoing Replication
On the source, insert a new record:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
9
Within seconds, query the target again:
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
8
You should see the new row with only the eight operational columns. The PHI fields remain excluded from the target. Ongoing changes (INSERTs, UPDATEs, DELETEs) on the source are continuously replicated with the same column filtering applied.
A note on limitations
While logical replication with fine-grained filtering is a powerful tool, it is essential to be aware of certain constraints:
- Replica identity: For
UPDATEandDELETEoperations to function correctly, the published table must have a replica identity. The default is the primary key, which suffices in most cases. However, if your primary key column is not included in the publication, these operations will fail due to PostgreSQL’s inability to identify the respective row on the target. - DDL changes aren’t replicated: Schema modifications (such as
ALTER TABLE) do not automatically propagate through logical replication. These changes must be manually applied to the target prior to their implementation on the source to avoid breaking replication. Refer to the logical replication restrictions documentation for a complete list of limitations. - Sequences aren’t replicated: PostgreSQL sequences, which are commonly used to generate unique IDs for primary key columns, are not synchronized through logical replication. While the generated ID values are copied as data, the state of the sequence itself is not. This can lead to discrepancies in sequence counters on the target database. If the target database is promoted to primary (for instance, during disaster recovery), it is necessary to manually adjust the sequences to values higher than the maximum replicated IDs to avoid duplicate key errors.
- Performance overhead: The initial synchronization of large tables can be time-consuming and impose a read load on the source. It is advisable to schedule this operation during off-peak hours and to monitor sync progress using
pg_subscription_rel. Beyond the initial sync, ongoing replication also consumes resources on the source, as each active subscription runs a dedicated WAL sender process that continuously decodes WAL entries and applies publication filters. This adds CPU overhead proportional to the write volume on published tables. For high-throughput workloads, it is crucial to monitor CPU utilization on the source and consider scaling up if WAL sender processes consume a significant share of available CPU.
Cleanup
After testing, it is prudent to remove replication resources to prevent unnecessary costs and WAL accumulation. Begin by dropping the subscription on the target:
aws rds modify-db-parameter-group
--db-parameter-group-name
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"
1
Then, drop the publication on the source:
aws rds modify-db-parameter-group
--db-parameter-group-name
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"
2
Finally, remove the test tables from both source and target:
aws rds modify-db-parameter-group
--db-parameter-group-name
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"
3
aws rds modify-db-parameter-group
--db-parameter-group-name
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"
4
If dedicated RDS or Aurora instances were created for this walkthrough, they should be deleted through the AWS Management Console or via the CLI to avoid incurring charges. If an existing parameter group was modified, consider reverting rds.logical_replication to 0 if logical replication is no longer required, followed by a reboot of the instance.