Migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server | Amazon Web Services

Customers often opt for AWS Database Migration Service (AWS DMS) to facilitate the migration of SQL Server databases to Babelfish for Aurora PostgreSQL. AWS DMS accommodates all editions of SQL Server for full load migrations. However, when it comes to continuous replication that captures ongoing changes, the source SQL Server must have either transactional replication or change data capture (CDC) enabled. This setup allows AWS DMS to read changes from the database’s transaction log file or its transaction log backup, subsequently replicating these changes to the target database.

SQL Server editions such as Enterprise, Standard, and Developer come equipped with transactional replication and CDC capabilities, making them suitable for implementing continuous replication to Babelfish for Aurora PostgreSQL. Conversely, editions like SQL Server Web Edition or SQL Server workloads running on Azure SQL do not support either transactional replication or CDC. In these scenarios, only full load migrations can be performed without the option for continuous replication.

To navigate the limitations of replicating ongoing changes, one viable workaround involves utilizing change tracking alongside a linked server. This method enables the tracking of changes in Azure SQL or SQL Server Web Edition, allowing effective replication to the target database.

Solution overview

Change tracking serves as a mechanism to monitor data alterations within a database. When enabled on a table, SQL Server internally maintains a record of modifications—be it inserts, updates, or deletes—in a separate internal table that includes the primary key column of the table. All changes since the initiation of tracking can be retrieved using the CHANGETABLE function. For further details on change tracking, refer to Work with change tracking (SQL Server).

The following diagram illustrates the architecture of the solution.

To implement this solution, the following high-level setup steps should be completed:

  1. Enable change tracking at both the database and table levels on the source server.
  2. Migrate the source database with an initial full load using AWS DMS to the target.
  3. Create a linked server at the target that connects to the source SQL Server.
  4. Create an anchor table at the target.
  5. Extract the changed data from the source and load it into the target table.

Prerequisites

To test this solution, the following prerequisites are necessary:

  • Source SQL Server or Azure SQL instance
  • Northwind database on SQL Server
  • Babelfish for Aurora PostgreSQL instance version 4.0 or later
  • Familiarity with functions in SQL Server and PostgreSQL
  • SQL Server Management Studio (SSMS) or another client tool for SQL Server connectivity

This solution entails the creation and utilization of new AWS resources, which will incur costs on your account. For detailed pricing information, please refer to AWS Pricing. It is advisable to set this up in a non-production instance and conduct end-to-end validations before deploying the solution in a production environment.

Enable change tracking on the source

To enable change tracking on the source server, follow these steps:

  1. Connect to the SQL Server instance using SSMS.
  2. Select the Northwind database and choose New Query.
  3. Enable change tracking at the database level using the following command:
  4. ALTER DATABASE NORTHWIND SET CHANGE_TRACKING = ON(CHANGE_RETENTION = 5 DAYS, AUTO_CLEANUP = OFF)
  5. Enable change tracking on the tables intended for continuous replication to the target. For instance, enable change tracking on the customers table with the following command:
  6. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING
  7. Run the following command to verify that change tracking is enabled on the tables:
  8. SELECT schema_name(T.schema_id) as SCH_NAME ,T.NAME,CT.min_valid_version ,ct.*
    FROM northwind.sys.tables T LEFT OUTER JOIN sys.change_tracking_tables CT on T.object_id = CT.object_id
    WHERE min_valid_version is not null
  9. To test how change tracking retrieves updated rows, execute the following commands:
    1. Note the current change tracking version using the function CHANGE_TRACKING_CURRENT_VERSION(). Assume this value is 26.
    2. Update a row in the customers table.
    3. To identify which row in the customer table was updated, use the CHANGETABLE function with the version number (26 in this example) as a parameter.
DECLARE @CTCV INT;
SELECT @CTCV = CHANGE_TRACKING_CURRENT_VERSION () ;

-- In this example, the current value is 26
SELECT @CTCV

-- Update a record in the customer table
UPDATE dbo.CUSTOMERS SET CompanyName ='camilo2' where CUSTOMERID = 'ALFKI'

-- Get all the changes made after version 26 in this example.
SELECT P.*,'--',CT.* FROM customers AS P JOIN CHANGETABLE(CHANGES northwind.dbo.customers,@CTCV ) AS CT ON P.customerid = CT.customerid;

It is crucial to enable change tracking on the tables requiring continuous replication before initiating a full load with AWS DMS. This ensures the system tracks all data modification operations (DML) on the designated tables from the outset of the full load.

Migrate the source database with initial full load using AWS DMS to target

To migrate the source database with a full load using AWS DMS to the target, follow these steps. For detailed guidance, refer to Migrate SQL Server to Babelfish for Aurora PostgreSQL using the Compass tool and AWS DMS, which outlines the process of creating and connecting to a Babelfish for Aurora PostgreSQL cluster, along with the steps to migrate data from SQL Server to Babelfish using AWS DMS.

  1. On the target server, create the schema for the Northwind database. You can download a Northwind sample schema from the GitHub repository.
  2. Migrate the data from the source to the target using the full load setting in AWS DMS. For instructions, refer to Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service.

Create a linked server at the target to the source SQL Server

To configure a linked server in Babelfish for Aurora PostgreSQL, refer to Babelfish supports linked servers. Follow these steps:

  1. Connect to the target database instance using SSMS or SQLCMD and execute the following command. Here, we use SQLCMD to connect to the Babelfish for Aurora PostgreSQL instance. Replace the parameters with appropriate values:
  2. sqlcmd -S your-DB-instance.aws-region.rds.amazonaws.com -U test -P password
  3. Install the tds_fdw extension:
  4. EXEC sp_execute_postgresql N'CREATE EXTENSION tds_fdw';
  5. Create a linked server on the target instance using the following command. Replace the @datasrc, @rmtuser, and @rmtpassword parameters with appropriate values:
  6. Use Northwind; 
    GO 
    EXEC master.dbo.sp_addlinkedserver @server=N'ls_northwind', @srvproduct=N'', @provider=N'SQLNCLI', @datasrc=N'myserver.xxxxx.US-WEST-2.RDS.AMAZONAWS.COM', @catalog='northwind'; 
    GO 
    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N' ls_northwind',@useself=N'False',@locallogin=NULL,@rmtuser=N'username',@rmtpassword='password';
  7. Use OPENQUERY() T-SQL as shown in the following code, or a standard four-part naming convention to reference a table, view, or other supported objects on the remote server:
  8. SELECT * FROM OPENQUERY(ls_northwind, 'SELECT * FROM customers');
  9. Run the following command to retrieve modified records in the customers table based on the initial change tracking version (for this example, we use version 0):
  10. SELECT * FROM OPENQUERY(ls_northwind, 'SELECT C.* FROM customers AS C JOIN CHANGETABLE(CHANGES northwind.dbo.customers,0 ) AS CT ON C.customerid = CT.customerid');

    The output of this command will display the modified records.

With successful querying of modified records at the source via the linked server, the next step is to create an anchor table at the target. This table will serve as a repository for change tracking version numbers each time data is replicated.

Create an anchor table at the target

After successfully querying records at the source using the linked server, create an anchor table at the target. The following command creates a table named CT_ANCHOR_NORTHWIND in the Northwind database, which will hold the change tracking version numbers:

CREATE TABLE ct_anchor_northwind
(
sch_name varchar(256),
tbl_name varchar(256),
ct_fetched_ver int, -- 0 during first execution
ct_next_ver int -- current version at the source database, to fetch the next version of the records
)

Once the anchor table is created, insert a row for each table that needs tracking. For instance, insert the schema name, customer name, fetched change tracking version, and next change tracking version values for the customers table in the Northwind database:

ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

0

During each replication, the column CT_NEXT_VER must be updated manually or through an automation script with the current version ID of the change tracking table. This value will be essential for retrieving the next set of changed records.

Extract the changed data from the source and load it to the target table

To synchronize data between the source SQL Server and the target Babelfish for Aurora PostgreSQL, follow these steps. These steps can also be executed on a scheduled basis:

  1. Connect to the target database and retrieve the value of ct_next_ver (the next version to fetch the records) from the anchor table (ct_northwind_anchor) into a variable @ct_next_ver:
  2. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    1

  3. Obtain the current change tracking version from the source in a variable @ct_src_current_ver. This can be achieved using a linked server query on the target:
  4. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    2

  5. Delete the records at the target that match the primary key values of the source table’s changed rows based on a specific change tracking version (@ct_next_ver):
  6. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    3

  7. Extract the new records from the source based on the specific change tracking version (@ct_next_ver) into their destination:
  8. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    4

  9. Update the column values in the anchor table with the appropriate values. These values will be used to fetch the next set during the next run:
  10. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    5

  11. Verify the values in the target table against the source to ensure synchronization:
  12. ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

    6

Clean up

To prevent future charges and remove the components created during the testing of this use case, follow these steps:

  1. Connect to the source SQL Server instance using SSMS.
  2. Select the master database and choose New Query.
  3. Execute the following command:
ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING

7

In this discussion, we have explored the process of migrating SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking in conjunction with a linked server. This configuration is particularly beneficial for migrating SQL Server workloads to Babelfish with minimal downtime. Furthermore, the solution can be automated by scheduling scripts to run periodically.


About the Authors

Chandra Pathivada is a Senior Database Specialist Solutions Architect with Amazon Web Services, focusing on open-source database engines like Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. He enjoys collaborating with customers to design, deploy, and optimize relational database workloads on AWS.

Minesh Chande is a Senior Database Specialist Solutions Architect at Amazon Web Services, assisting customers across various industry verticals in designing, migrating, and optimizing their SQL Server workloads to managed database platforms such as Amazon RDS, Amazon RDS Custom, and Babelfish for Aurora PostgreSQL.

Tech Optimizer
Migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server | Amazon Web Services