PostgreSQL has emerged as the go-to open-source relational database for a diverse array of enterprise developers and startups. This powerful platform enables rapid prototyping and supports mission-critical applications. The integration of Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition simplifies the process of setting up, operating, and scaling PostgreSQL deployments within the AWS Cloud. With these services, users can deploy scalable PostgreSQL instances in mere minutes, benefiting from cost-effective and resizable hardware capacity. Furthermore, RDS for PostgreSQL and Aurora PostgreSQL automate complex administrative tasks, including software installation and upgrades, storage management, backups, and replication for disaster recovery, high availability, and enhanced read throughput.
This article outlines the steps necessary to migrate a PostgreSQL database from Google Cloud SQL to RDS for PostgreSQL and Aurora PostgreSQL using the pglogical extension. We will also cover the essential connection attributes needed to facilitate this migration. The pglogical extension is compatible with community PostgreSQL versions 9.4 and higher, and it is supported on RDS for PostgreSQL and Aurora PostgreSQL starting from version 12.
Solution overview
The pglogical extension employs asynchronous replication, transmitting only the changes in data through logical decoding. This method enhances replication efficiency by ensuring that only differences are replicated, and it is resilient to network faults, allowing it to resume operations post-disruption. The publisher/subscriber model utilized by PostgreSQL pglogical extension supports logical streaming replication, making it applicable for both RDS for PostgreSQL and Aurora PostgreSQL.
The architecture of this solution is illustrated in the accompanying diagram. In the subsequent sections, we will detail the configuration of the primary database within Cloud SQL for PostgreSQL.
Configure the primary database
To set up the primary database, follow these steps:
- Access the Cloud SQL Instances page on the Google Cloud console.
- Enable access on the primary instance for the IP address of the external replica. For further details, refer to the How to Configure authorized networks.
- Connect to the Cloud SQL Instances using either the PostgreSQL command-line client or Cloud Shell.
- Create a PostgreSQL user with the REPLICATION attribute:
CREATE USER repl WITH REPLICATION IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'XXXXXXX';
Note: The REPLICATION role is privileged, and the user must be a database superuser.
- To enable logical replication with the pglogical extension, edit the Cloud SQL instance to add and set the following parameters:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
For more details, consult the Set up logical replication and decoding.
- Restart the database, then log in again using the PostgreSQL command-line client or Cloud Shell. Change to the
repl
user and create the pglogical extension:CREATE EXTENSION pglogical; grant usage on schema pglogical to repl; grant all on schema pglogical to repl;
Next, create a pglogical node, which represents a physical PostgreSQL instance and stores connection details for that instance.
- Create a provider (publisher) node with the following command:
SELECT pglogical.create_node( node_name := 'provider', dsn := 'host=XXXXXXX port=5432 dbname=test user=repl password=XXXXXXX');
Follow the logging techniques to minimize the risk of sensitive information being logged to the server.
Note: The default port used in this configuration is 5432. You can verify the port number with the following command:Select * from pg_settings where name='port';
- If starting with a new database, create the same database and tables on both the primary and replica instances. For example:
CREATE DATABASE cloudsqltordsdb; connect cloudsqltordsdb; CREATE TABLE cloudsqltordsdb_replica_table (emp_id SERIAL PRIMARY KEY, emp_name text); INSERT INTO cloudsqltordsdb_replica_table VALUES (default, 'tom'); INSERT INTO cloudsqltordsdb_replica_table VALUES (default, 'sam'); INSERT INTO cloudsqltordsdb_replica_table VALUES (default, 'harry');
For larger databases with multiple tables, it is advisable to use pg_dump to export all schemas without data into a script file and then import that script file into the target database.
Configure the RDS for PostgreSQL instance
To configure the RDS for PostgreSQL instance, follow these steps:
- Connect to the RDS for PostgreSQL or Aurora PostgreSQL instance using one of the following methods:
- Create a special user with rds_superuser privileges for replication and grant replication privileges:
CREATE ROLE repl WITH LOGIN NOSUPERUSER; GRANT rds_replication, rds_superuser TO repl;
- If starting with a new database, use the repl user to create the same database on both the primary and replica instances:
CREATE DATABASE cloudsqltordsdb;
- Configure your RDS instance to enable logical replication in the target database by setting the parameter in the database parameter group. For more information, refer to the Working with parameter groups documentation. When using Aurora PostgreSQL, parameters need to be modified in the DB cluster parameter group, as detailed in the Amazon Aurora DB cluster and DB instance parameters documentation.
rds.logical_replication=1 shared_preload_libraries = 'pglogical'
- Reboot the instance after configuring the parameters in the parameter group for the changes to take effect, as these parameters are static. Refer to the following links for guidance on rebooting a DB instance and rebooting a DB instance within an Aurora cluster.
- Create the pglogical extension:
CREATE EXTENSION pglogical;
- Create the provider node on the Cloud SQL Instance by connecting to it and executing the following commands:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
- Add all tables in the schema to be migrated from Cloud SQL to RDS PostgreSQL or Aurora PostgreSQL database to the default replication set by executing the following command:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
- Create the subscriber node on Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database by running the following command:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
- Create a pglogical subscription on the subscriber node (Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database) by executing the following command:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
- Check the status of the subscription by running the following command on the subscriber node:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
If the status indicates “replicating,” the setup is successful.
- To validate the replication, insert data into the Cloud SQL for PostgreSQL:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
- Finally, check whether the records are replicating to the Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
Known limitations of pglogical and workarounds
Utilizing pglogical
presents certain limitations:
Sequences:
To replicate all sequences for the schema public using pglogical, the sequences must be added to the replication set with the following command:
a. cloudsql.enable_pglogical
b. cloudsql.logical_decoding
c. max_replication_slots
d. max_worker_processes
e. max_wal_senders
Primary Key:
A notable limitation of pglogical is its inability to directly replicate changes to primary keys. This stems from the unique constraints associated with primary keys and the potential for conflicts in a distributed environment. When a primary key is altered on the source, it may conflict with existing primary key values on the target. A common workaround involves using a surrogate key (such as a UUID or a separate serial column) as the primary key while maintaining a unique constraint on the original natural key. This allows replication based on the surrogate key, thus avoiding conflicts. Here’s a simplified example:
- Create a Surrogate Key:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
8
- Update Data and Replication Set: Change your replication set to include the new surrogate key column. Perform an initial data update to populate the surrogate key column.
- Replication with Surrogate Key: Replicate based on the surrogate key, while maintaining the uniqueness constraint on the natural key column on the target database.
- On the target database:
a. cloudsql.enable_pglogical b. cloudsql.logical_decoding c. max_replication_slots d. max_worker_processes e. max_wal_senders
8
By employing a surrogate key for replication purposes, conflicts in primary key values can be avoided. However, it is crucial to exercise caution when implementing such changes and to conduct thorough testing in a controlled environment. Additionally, consider the specific requirements and constraints of your application before adopting this workaround.
Extensions:
In PostgreSQL, database extensions are managed on a per-database basis and are not replicated as part of pglogical replication. To transfer extensions, it is advisable to capture the list of extensions utilized in the source (Google Cloud SQL for PostgreSQL instance) and manually create those extensions in the RDS for PostgreSQL database. For a list of extensions supported by Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL, please refer to the respective documentation. To identify all installed extensions in Cloud SQL for PostgreSQL, run the following command:
CREATE EXTENSION pglogical;
grant usage on schema pglogical to repl;
grant all on schema pglogical to repl;
Materialized Views:
Pglogical only replicates tables and does not support the replication of materialized views or their refreshing. A workaround for this limitation is to replicate the base tables and then create and refresh materialized views separately on the subscriber. Consider employing triggers or event-based scheduling to refresh the materialized views based on changes in the replicated tables. Alternatively, if materialized views are used solely for performance, consider utilizing regular views and applying caching at the application or query level instead.
Schema Changes (DDLs):
DDL statements are not automatically replicated. It is recommended to avoid DDLs during migration to minimize the number of variables to consider. However, if DDLs are unavoidable, users can utilize the pglogical.replicate_ddl_command
function to execute DDL on the source database. The DDL statement will be queued and replicated to the target database. The user executing pglogical.replicate_ddl_command
must have the same username on both the source and target databases, with superuser privileges or ownership of the table being migrated. For example, to replicate a DDL statement:
CREATE EXTENSION pglogical;
grant usage on schema pglogical to repl;
grant all on schema pglogical to repl;
- Run the following command on the source database to create the table “public.t_test5”:
CREATE EXTENSION pglogical; grant usage on schema pglogical to repl; grant all on schema pglogical to repl;
- Confirm if the DDL statement has been queued on the source database by running:
CREATE EXTENSION pglogical; grant usage on schema pglogical to repl; grant all on schema pglogical to repl;
- Add the table to the replication set for data to be replicated to the target database for the newly created table:
CREATE EXTENSION pglogical; grant usage on schema pglogical to repl; grant all on schema pglogical to repl;
Large Objects:
Currently, PostgreSQL’s logical decoding does not support large objects, meaning pglogical cannot replicate them. It is recommended to convert large objects to bytea columns for replication via pglogical or to store binary data in Amazon S3.
About the authors
Sarabjeet Singh is a Sr. Database Specialist Solutions Architect at AWS, providing guidance and technical assistance on database projects to enhance the value of solutions utilizing AWS.
Kranthi Kiran Burada serves as a Sr. Database Migration Specialist at AWS, focusing on transitioning clients from commercial databases to open-source solutions like PostgreSQL, with expertise in performance optimization and database design.
Jerome Darko is a Solution Architect at AWS within the Database Migration Accelerator team, dedicated to expediting customer migrations of database and analytic workloads to AWS for optimal business value.