The process outlined in this post was co-developed and tested with Ionut Bruma, a Solutions Architect in the Capital Markets division at London Stock Exchange Group (LSEG).
In this post, we share how the London Stock Exchange Group (LSEG) Capital Markets Business unit improved their Blue/Green software deployment methodology by using continuous logical database replication.
Previously, deployments required manually backing up and restoring the production database to create the Green environment. This step was time-consuming and involved shutting down the production database to keep both Blue and Green environments consistent during deployment.
Using logical replication, the backup and restore process is replaced by continuous replication. This significantly minimizes deployment time by reducing the downtime of the Blue environment.
Blue/green deployment is a strategy to implement changes to a website, application, or database by interchanging production and staging environments. The Blue environment represents the primary or active backend, and the Green environment is a replica that is staged and synchronized with the live environment. The process involves making changes or upgrades to the Green environment and then switching over, thereby minimizing downtime and enabling you to roll back to the Blue environment if problems arise.
Without a Blue/green methodology, deployments involve production downtime, where all activities are performed during non-business hours, incurring additional operational costs and reduction in availability. As part of software deployment, you can use Blue/green deployment to achieve more agility in the deployment process by minimizing the downtime of the production system. This allows internal product teams to shorten the time to market for new features as well as reduce deployment efforts.
In addition to supporting database version upgrades and select database schema changes, this type of deployment also helps during internal testing of application changes by mirroring production configurations. Eliminating the need to back up and restore the production database for each iteration of a test cycle provides a significant advantage.
Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL both have the Amazon RDS Blue/Green Deployments for Aurora. Managed Blue/green deployment vastly simplifies upgrades and schema changes, but is not supported in conjunction with some Amazon Aurora features such as Amazon Aurora Global Database.
In this post, we show you the process of implementing a Blue/green deployment architecture using Aurora PostgreSQL Global Database. Specifically, we explore best practices and considerations when configuring the architecture. Blue/green deployment serves as a robust and efficient approach to make sure applications stay resilient and synchronized throughout the process.
Solution overview
Our application serves internal users through Amazon Route 53, resolving to the internal Application Load Balancer that distributes traffic to a Kubernetes cluster deployed across three Availability Zones. The application layer performs CRUD operations against an Aurora global database.
The following diagram illustrates the architecture of the testing phase.
The following diagram illustrates the step of cutting traffic to the Blue cluster.
The following diagram illustrates the step of stopping replication from Blue to Green. This is done after you have verified that there is no user activity, application, or batch job actively accessing the database.
The following diagram illustrates the step of pointing traffic to the Green cluster. From this point forward, the Green environment serves as the primary production environment until the next iteration.
In the following sections, we detail the steps to implement a Blue/green deployment:
- Configure the cluster parameter group associated with the source (Blue) Aurora global database cluster, to enable logical replication.
- Create a publication of the database hosted on the Blue cluster.
- Create a clone of the Blue cluster using the Aurora cloning feature, which creates the Green (target) cluster.
- Configure the subscription from the Green cluster to the Blue publication.
- Verify replication lag.
- Convert the Green cluster into an Aurora global database.
- Perform pre-cutover validation.
- Alter sequences and drop the subscription.
- Cut over to the Green cluster.
We use native PostgreSQL logical replication to synchronize the Green environment to provide the ongoing CDC capability. The logical replication process is asynchronous. For more information, refer to Using logical replication to perform a major version upgrade for Aurora PostgreSQL.
The target database is in a writable state, open to both Data Definition Language (DDL) and Data Manipulation Language (DML) operations outside of replication. It is strongly recommended that you take appropriate measures to ensure that DML and DDL changes are applied to the intended database.
Prerequisites
Complete the following prerequisites:
- Create an Aurora PostgreSQL cluster configured with a global database.
Note that Aurora Global Database doesn’t support managing users via AWS Secrets Manager. We discuss alternative security methods later in this post.
- Review the limitations of logical replication with Amazon Aurora PostgreSQL.
- Confirm that you meet the relevant requirements, for example a user with the rds_superuser role granted to it
- Establish connectivity to the DB cluster via a PostgreSQL client (for example, psql or pgAdmin).
- Confirm access to the AWS control plane via the AWS Management Console or AWS Command Line Interface (AWS CLI).
Note that as of this writing, Amazon RDS Proxy doesn’t support streaming replication mode. If you’re using RDS Proxy, you should use the regular cluster endpoint and ignore RDS Proxy for the replication process.
Required permissions
There are two levels of permissions required for the process:
- Control plane (AWS resource layer) – These permissions grant the ability to modify the Aurora clusters involved, as well as to create and modify a new cluster via the console or API
- Data plane (Aurora data layer) – These permissions grant the ability to create the replication configuration using a PostgreSQL client
The following table summarizes the relevant solution steps and whether they are performed via the control plane or data plane.
Step Number | Step Description | Control Plane | Data Plane |
1 | Configure the Blue cluster parameter group for logical replication | X | – |
2 | Create a publication of the database hosted on the Blue cluster | – | X |
3 | Create a clone of the Blue cluster | X | – |
4 | Configure the subscription from the Green cluster to the Blue publication | – | X |
5 | Verify replication lag | – | X |
6 | Convert the Green cluster into an Aurora global database | X | – |
7 | Perform pre-cutover validation | – | X |
8 | Alter sequences and drop the subscription | – | X |
Configure the Blue cluster parameter group
Aurora clusters and Aurora instance parameter groups are Regional resources. Make sure that these resources are configured in the target Region with the desired settings. Typically, the parameter settings in the target Region will mirror your source RDS cluster and RDS instance parameter groups.
The instance should use a custom DB cluster parameter group with the following settings:
- rds.logical_replication – Configure this parameter to 1. The
rds.logical_replication
parameter serves the same purpose as a standalone PostgreSQL server’swal_level
parameter and other parameters that control the write-ahead log file management. (This change will require the instances to be rebooted, if the parameter group is created subsequently.) - max_replication_slots – Configure this parameter to the total number of subscriptions that will be created.
- max_wal_senders – Configure this parameter to the number of concurrent connections, with additional overhead for management tasks and new sessions. If you’re using AWS DMS, the number of
max_wal_senders
should be the sum of the number of concurrent sessions and the number of AWS DMS tasks that may be operational at any given time. - max_logical_replication_workers – Configure this parameter to the anticipated number of logical replication workers and table synchronization workers. It is typically advised to set the number of replication workers to the same value used for
max_wal_senders
. The workers are drawn from the pool of background processes (max_worker_processes
) allocated for the server. - max_worker_processes – Configure this parameter to the number of background processes for the server. The allocation should be sufficient for replication, auto-vacuum processes, and other maintenance processes that might occur concurrently.
Create a publication of the database on the Blue cluster
Complete the following steps to create a publication of the database on the Blue cluster:
- Connect to the source database on the Blue cluster
- Run the following query to confirm the settings:
SELECT name, setting FROM pg_settings WHERE name in ('rds.logical_replication', 'max_replication_slots', 'max_wal_senders', 'max_logical_replication_workers', 'max_worker_processes');
- Create a publication:
CREATE PUBLICATION publication_name FOR ALL TABLES;
- Create a replication slot:
SELECT pg_create_logical_replication_slot('replication_slot_name', 'pgoutput');
- Create a non-admin account to serve replication only, and grant the following permissions in the source database:
CREATE USER repl_user WITH PASSWORD 'Sup3r_Secure_P4ssw0rd';
GRANT rds_replication TO repl_user;
GRANT INSERT, UPDATE, SELECT, DELETE ON ALL TABLES IN SCHEMA schema TO repl_user;
Create a clone of the Blue cluster
Create the Green cluster by creating a clone of the Blue cluster. For instructions on creating a clone of the Blue cluster (to serve as the Green cluster), refer to Cloning a volume for an Amazon Aurora DB cluster.
Configure the subscription in the Green database
After the clone is provisioned and in an available state, complete the following steps:
- Connect to the target database on the Green cluster
- Retrieve the log position from the cluster or writer endpoint and store it in a secure location. This will be used to start the subscription at the point the clone was created, so no data is lost, and no duplicate changes are processed. To do this run the following query:
SELECT aurora_volume_logical_start_lsn();
- Drop the replication slot on the Green instance only:
SELECT pg_drop_replication_slot('replication_slot_name');
- Drop the publication from the Green instance only:
DROP PUBLICATION publication_name;
- Create a subscription to the Blue instance:
CREATE SUBSCRIPTION subscription_name CONNECTION 'postgres://repl_user:repl_user_password@source_instance_URL/database' PUBLICATION publication_name WITH (copy_data = false, create_slot = false, enabled = false, connect = true, slot_name = 'replication_slot_name');
- Retrieve the
roname
value, which is the identifier of the replication origin:
SELECT * FROM pg_replication_origin;
- Use the
roname
output from the previous step to configure the start position in the publication. Thelog_sequence_number
is the output ofaurora_volume_logical_start_lsn()
from Step 2.
SELECT pg_replication_origin_advance('roname', 'log_sequence_number');
- Enable the subscription:
CREATE PUBLICATION publication_name FOR ALL TABLES;
Verify replication lag
Confirm that the replication is configured and replicating data:
- Connect to the source database on the Blue cluster
- Run the following query to confirm:
CREATE PUBLICATION publication_name FOR ALL TABLES;
Optionally, you can log the preceding query in a table at a specified frequency using the pg_cron
extension to track replication lag over time.
Note that although Amazon CloudWatch publishes various metrics related to