Implementing real-time change data capture with Debezium for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL

June 6, 2026

Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service for PostgreSQL have embraced native streaming replication, enabling data changes to flow seamlessly from source databases. Yet, many organizations face challenges in capturing and disseminating these changes to downstream systems in real time, often without compromising database performance or introducing data lag. Traditional batch-based extract, transform, and load (ETL) pipelines frequently fall short, leading to delays that can span minutes or even hours, resulting in outdated inventory data, delayed notifications, and missed opportunities to act on transactional signals as they arise.

Enter Debezium, an open-source distributed platform designed for change data capture (CDC). Debezium monitors databases and streams changes to applications or data pipelines, delivering updates from databases to Kafka topics in real time to support event-driven architectures. This functionality empowers businesses to maintain current data across multiple systems, minimize synchronization delays, and respond promptly to business events.

Solution Overview

This CDC solution leverages the native logical replication capabilities of PostgreSQL, combined with Debezium’s robust change capture framework. Both Amazon Aurora for PostgreSQL and Amazon RDS for PostgreSQL support logical replication, offering flexible options for implementing CDC solutions. In this context, we will focus on Amazon Aurora for PostgreSQL.

The implementation begins by enabling logical replication on Amazon Aurora for PostgreSQL via DB cluster parameter groups. Debezium connectors then monitor the database’s Write-Ahead Logging (WAL) through logical replication slots, transforming transaction log entries into structured event streams for downstream consumption.

Key Components of the Solution Architecture

  • Amazon Aurora for PostgreSQL as the source database with logical replication enabled
  • A Debezium PostgreSQL connector running on MSK Connect for managed change capture
  • Amazon MSK for reliable, scalable message streaming
  • An Amazon Elastic Compute Cloud (Amazon EC2) instance for testing and consuming change events

Note: Additional downstream integration targets shown in the architecture diagram can be configured based on specific use case requirements but are not part of this core CDC implementation.

Implement the Solution

This solution can be implemented via the AWS Management Console or the latest version of the AWS Command Line Interface (AWS CLI).

Create an Amazon Aurora for PostgreSQL DB Cluster and Enable Logical Replication

To create an Aurora PostgreSQL DB cluster, follow these steps to enable logical replication:

  1. Create a DB cluster parameter group. Choose a Parameter group family that matches the PostgreSQL major version of your database instance. It is advisable to use the latest version available for Aurora PostgreSQL.
  2. Modify the DB cluster parameter group to set the rds.logical_replication parameter to 1.
  3. Associate the DB cluster parameter group with the Aurora PostgreSQL DB cluster, and stop and start the DB cluster for the parameter group to sync with the database.

Create an Amazon MSK Cluster

With the database configured for replication, create an Amazon MSK serverless cluster by following these instructions:

  1. Sign in to the AWS Management Console and navigate to the Amazon MSK console.
  2. Choose Create cluster.
  3. Choose Custom create to select a virtual private cloud (VPC), subnets, and security groups.
  4. For Cluster name, provide a descriptive name for your cluster.
  5. For Cluster type, select Serverless and choose Next.
  6. On the Networking page, select the VPC where the database was created.
  7. Select at least two subnets (up to four) from the chosen VPC.
  8. Use the same security group attached to the database and choose Next.
  9. Proceed through the Security and Metrics and tags pages by choosing Next.
  10. On the Review and create page, review your selections and then choose Create cluster.
  11. Monitor the cluster Status on the Cluster summary page, waiting for it to change from Creating to Active.

Set Up the EC2 Instance

Create and launch an Amazon EC2 instance to install Kafka, download dependencies, authenticate with the Amazon MSK cluster using IAM, configure a Kafka client, and connect to the database. Ensure that the EC2 instance shares the same VPC and security group as the Amazon MSK cluster and Aurora PostgreSQL database. Additionally, add a security group that allows SSH access to the instance.

  1. To install Kafka on Amazon Linux or Red Hat Enterprise Linux, execute the following commands:
    # To install dependencies
    sudo yum install java-17-amazon-corretto
    
    # To download a binary distribution of Apache Kafka
    wget https://archive.apache.org/dist/kafka/4.0.0/kafka_2.13-4.0.0.tgz
    
    # To extract the archive in the home directory
    tar -xzf kafka_2.13-4.0.0.tgz
  2. Authenticate with the Amazon MSK cluster using AWS Identity and Access Management (IAM) by following the instructions in the Amazon MSK Developer Guide. Download the latest stable release of the Amazon MSK Library for IAM:
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/
  3. In the ~/kafka_2.13-4.0.0/config/ directory, create a client.properties file to configure a Kafka client for IAM authentication:
    # Create a client.properties file and update it with the following
    vi client.properties
    
    # Set up TLS for encryption and SASL for authentication.
    security.protocol=SASL_SSL
    
    # Identify the SASL mechanism to use.
    sasl.mechanism=AWS_MSK_IAM
    
    # Bind SASL client implementation.
    sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
    
    # Encapsulate constructing a SigV4 signature based on extracted credentials.
    sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
  4. Source the environment variables to add Kafka binaries to the PATH and the Amazon MSK Library for IAM to the CLASSPATH:
    # Set CLASSPATH and PATH
    export CLASSPATH=$CLASSPATH:~/kafka/libs/aws-msk-iam-auth-2.3.2-all.jar
    export PATH=$PATH:$HOME/.local/bin:$HOME/bin:/home/ec2-user/kafka_2.13-4.0.0/bin
  5. Install a psql client and all related dependencies to connect to the Aurora PostgreSQL database:
    sudo yum install postgresql17 -y

Create a Custom Plugin

Next, create a custom plugin for Amazon MSK to install on MSK Connect workers where the connector will run to replicate changes from RDS for PostgreSQL. Download the PostgreSQL connector plugin from the Debezium website. This plugin, initially in .tar.gz format, must be converted to ZIP format since MSK Connect supports custom plugins in ZIP or JAR format only.

  1. Create a directory for Debezium plugins (if it doesn’t exist):
    mkdir -p ~/opt/debezium
  2. Change to the directory:
    cd ~/opt/debezium
  3. Download the Debezium connector using wget:
    wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/3.1.0.Final/debezium-connector-postgres-3.1.0.Final-plugin.tar.gz
  4. Extract the downloaded file:
    tar -xzvf debezium-connector-postgres-3.1.0.Final-plugin.tar.gz
  5. (Optional) Clean up the tar file:
    rm debezium-connector-postgres-3.1.0.Final-plugin.tar.gz
  6. Zip the plugin files:
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

    0

Upload the custom plugin in ZIP format to an Amazon Simple Storage Service (Amazon S3) bucket located in the same AWS Region where MSK Connect is being created.

MSK Connect does not include Debezium by default. Creating a custom plugin from the S3 path allows MSK Connect to distribute the connector code across workers for CDC capabilities. Using the path of the previously uploaded Amazon S3 object, create a custom plugin in MSK Connect.

Next, store the credentials in AWS Secrets Manager in the same AWS Region as your Amazon MSK cluster and Amazon RDS database. Store the following credentials:

Create an IAM Role and Policy

MSK Connect requires an IAM role with specific permissions to access AWS Secrets Manager (to retrieve database and IAM credentials) and interact with Amazon MSK clusters. Create the required IAM role and policy for MSK Connect, and attach the policy to the role.

  1. Create the IAM role:
  2. Create the MSKConnect policy. Update the information in the following code with values specific to your environment, such as your AWS account ID, Amazon S3 bucket name, AWS Region, Amazon MSK cluster name, secret ARNs, and Amazon CloudWatch Logs group (all highlighted in red in the following code).
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

    1

  3. Attach the policy to the role.

Connect to the Database from the Amazon EC2 Instance

Now that you have created the IAM role and policy and attached the policy to the role, connect to the Aurora PostgreSQL database from the Amazon EC2 instance to prepare it for replicating changes through publications. Connect using the default PostgreSQL superuser postgres and the default database postgres, then run the following commands to create a test table with data and prepare it for replication:

wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

2

Create an Amazon MSK Connector to Stream Database Changes

The Amazon MSK connector enables Debezium to continuously capture change data from your PostgreSQL database and stream it in real time to Amazon MSK topics. This connector serves as the bridge between your database’s logical replication stream and Kafka, transforming database changes (inserts, updates, deletes) into Kafka events that downstream applications can consume. The final step before verifying the replication setup is to create an Amazon MSK connector:

  1. Open the Amazon MSK console.
  2. In the navigation pane, under MSK Connect, choose Connectors.
  3. Choose Create connector.
  4. On the Select plugin page, find and select the Debezium PostgreSQL connector plugin you created earlier, then choose Next.
  5. On the Configure connector page, enter a descriptive name for your connector.
  6. (Optional) Provide a description for the connector.
  7. For Cluster, select the Amazon MSK cluster you created earlier.
  8. For Connector configuration, paste the configuration, replacing placeholder values with your environment-specific information.
  9. Choose Next.
  10. On the Connector capacity page, keep the default values and choose Next.
  11. On the Worker configuration page, select 3.7.x for the Apache Kafka Connect version.
  12. For Worker configuration, select Use the MSK default configuration and choose Next.
  13. On the Access permissions page, select the MSKConnectRole02 IAM role you created earlier and choose Next.
  14. On the Security page, review the settings for authentication and encryption in transit and choose Next.
  15. On the Logs page, select Deliver to Amazon CloudWatch Logs for log management.
  16. For Log group, enter the ARN of the log group specified in the IAM policy and choose Next.
  17. On the Review and create page, review all configurations and choose Create connector.

MSK Connect connectors typically take about 10–15 minutes to create and become fully operational. Proceed to the next section after the connector has been created.

Test the Solution

Now, test the architecture you have set up. Connect to the Amazon EC2 instance and create the BOOTSTRAP_SERVERS environment variable to store the bootstrap servers of your Amazon MSK cluster.

  1. Open the Amazon MSK console.
  2. Select your cluster name.
  3. Choose View client information.
  4. Copy the bootstrap server endpoints (private or public, depending on your network configuration).

Then, on your EC2 instance, run the following commands to test the connection and verify if replication is successful:

wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

3

Test Real-Time Changes

After verifying initial data replication, test the real-time change data capture by inserting new records into the PostgreSQL database. These changes should automatically stream through Debezium to your Amazon MSK topics, demonstrating live CDC functionality.

  1. Insert new records in PostgreSQL:
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

    4

  2. Update existing records:
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

    5

  3. Delete records:
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar -P kafka/libs/

    6

  4. Observe real-time events in the Kafka consumer showing INSERT, UPDATE, and DELETE operations.

In the Kafka consumer terminal window, you will see new change events appear in real time for each operation performed. Each event is a JSON message that includes:

  • Operation type: "op": "c" for INSERT, "op": "u" for UPDATE, and "op": "d" for DELETE
  • Before and after values: For updates, both the old value ("before": {"id": 1}) and new value ("after": {"id": 1, "name": "Updated Sample Data"}) are visible.
  • Table and schema information: The source table (public.test_table) and database details.
  • Transaction metadata: Timestamps, transaction IDs, and LSN (Log Sequence Number) positions.

Monitoring and Troubleshooting

Monitor and troubleshoot your CDC pipeline using key metrics:

MSK Connect Metrics

  • Available through CloudWatch Logs: Worker-level logs for debugging connector issues and tracking task execution.
  • Available through the MSK Connect console: Connector status (Creating, Running, Failed, Deleting), configuration details, task status, and count.
  • Available through Amazon CloudWatch metrics: Connector-level metrics such as task count and status.

Common Issues and Solutions
Replication Slot Lag

Issue: The replication slot accumulates WAL files faster than Debezium can consume them, causing disk space issues.

Resolution steps:

  • Restart the connector: In the MSK Connect console, select your connector and choose Restart to resume WAL consumption.
  • Increase connector parallelism: In your connector configuration, increase the tasks.max parameter to allow more parallel processing.
  • Drop unused replication slots: If you have inactive replication slots from previous connectors, drop them to prevent unnecessary WAL retention.
  • Monitor PostgreSQL parameters: Check your database parameter group settings for max_wal_size and wal_keep_size.

Connector Failures

Issue: The MSK Connect connector fails to start or stops unexpectedly.

Resolution steps:

  • Check the CloudWatch Logs for detailed error messages.
  • Verify that the IAM role attached to your connector has the necessary permissions.

Schema Evolution

Issue: Database schema changes cause connector failures or data inconsistencies.

Resolution steps:

  • Configure AWS Glue Schema Registry for automatic schema management and evolution.
  • In your Debezium connector configuration, add schema registry settings.

Network Connectivity

Issue: The MSK Connect connector cannot reach the RDS database or MSK cluster.

Resolution steps:

  • Verify that the security groups attached to your MSK cluster, Aurora PostgreSQL database, and MSK Connect workers allow the necessary traffic.
  • Ensure all resources are in the same VPC or have proper VPC peering/transit gateway configuration.
  • Test connectivity from an EC2 instance in the same subnets as your MSK Connect workers.

Clean Up

To avoid incurring future charges, delete the resources you created in the following order:

  1. Delete the MSK Connect connector: Open the Amazon MSK console, under MSK Connect choose Connectors, select your connector, choose Delete, enter delete, and wait 5–10 minutes for completion.
  2. Delete the custom plugin: Under MSK Connect, choose Custom plugins, select your plugin, and choose Delete.
  3. Delete the Amazon MSK cluster: Choose Clusters, select your cluster, choose Delete, enter delete, and wait 10–15 minutes.
  4. Delete the Aurora PostgreSQL database instance: Open the RDS console, choose Databases, select your cluster, choose Actions > Delete, clear Create final snapshot and retain automated backups, and enter delete me.
  5. Terminate the EC2 instance: Open the EC2 console, choose Instances, select your instance, choose Instance state > Terminate instance, and confirm.
  6. Delete the IAM role and policy: Open the IAM console, delete the role MSKConnectRole02 from Roles, then delete the policy MSKConnectPolicy02 from Policies.
  7. Delete Secrets Manager secrets: Open the Secrets Manager console, delete both secrets (aws-access and postgres-config) by choosing Actions > Delete secret with a 7-day waiting period.
  8. Delete the CloudWatch Logs log group: Open the CloudWatch console, choose Log groups, select your log group, and choose Actions > Delete log group(s).
  9. Delete the S3 bucket: Open the S3 console, select your bucket, choose Empty (enter permanently delete), then choose Delete bucket (enter the bucket name).

Important: Deletion is permanent and cannot be undone. Back up any data you want to retain before proceeding.

Tech Optimizer