Build a custom solution to migrate SQL Server HierarchyID to PostgreSQL LTREE with AWS DMS

Data migration from SQL Server to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition often necessitates adjustments to the database schema or the SQL commands utilized for data access. To facilitate this process, AWS offers DMS Schema Conversion within the AWS Management Console, which aids in converting existing database schemas from one engine to another. Additionally, the AWS Database Migration Service (AWS DMS) is available to assist in migrating data to AWS, featuring enhanced security measures and minimized downtime.

While these tools automate much of the migration effort, certain scenarios may require manual intervention to convert SQL from SQL Server to PostgreSQL. This discussion focuses on configuring AWS DMS tasks to efficiently migrate HierarchyID columns from SQL Server to Aurora PostgreSQL-Compatible Edition.

Hierarchical data overview

Hierarchical data structures are integral to contemporary applications, manifesting in various forms such as organizational charts, product categories, and file systems. These structures organize information in a tree-like format, where each item has a single parent and can possess multiple children. This organization is crucial for representing natural relationships among data elements, thereby enabling efficient storage, retrieval, and analysis of complex, interconnected information. The ability to effectively model nested or layered information renders hierarchical data invaluable for applications requiring structured, parent-child relationships.

SQL Server’s approach with HierarchyID

SQL Server employs the native HierarchyID data type to manage tree-structured data. This built-in feature encodes node positions in binary format, providing efficient methods for hierarchy navigation, comparison, and manipulation. HierarchyID includes specialized functions such as GetAncestor and GetDescendant, along with optimized indexing capabilities, making it particularly effective for common tree operations like finding descendants, determining node depths, and reorganizing hierarchical structures within SQL Server databases.

PostgreSQL’s solution with LTREE

In contrast, PostgreSQL utilizes the LTREE extension to manage hierarchical data through a label-based system. LTREE represents hierarchical relationships using dot-separated paths, offering a unique yet equally powerful method for handling tree structures. This extension provides comprehensive functionality for path matching, ancestor and descendant searches, and depth calculations, all supported by efficient indexing mechanisms. Although its implementation diverges from SQL Server’s HierarchyID, LTREE offers comparable capabilities for managing hierarchical data, establishing itself as a robust solution for PostgreSQL environments. Understanding these differing methodologies is essential when planning data migrations between these systems, as they influence how tree relationships must be restructured during the transition.

Solution overview

SQL Server and PostgreSQL handle hierarchical data structures in distinct ways. SQL Server employs HierarchyID’s binary format with forward slash notation (e.g., /1/1/1/), while PostgreSQL’s LTREE extension utilizes a dot-separated format (e.g., 1.1.1). To illustrate this transformation, consider a department store’s product hierarchy featuring two primary departments: Electronics (/1/) and Clothing (/2/).

When utilizing the HierarchyID datatype with AWS DMS, it is important to note that native support is not available. AWS DMS necessitates a workaround solution for successful data migration. For optimal performance and reliability, it is advisable to implement the solution outlined in this post for short-term migrations. The migration of hierarchical structures can be achieved through the following steps:

  1. Prepare the source SQL Server and target PostgreSQL environments.
  2. Create source tables containing the HierarchyID data type in SQL Server.
  3. Install the LTREE extension in the PostgreSQL target database.
  4. Convert the schema using AWS DMS Schema Conversion.
  5. Execute the migration process using AWS DMS:
    • Run an AWS DMS full-load and change data capture (CDC) task with Stop after applying cached changes settings.
    • After the task has paused automatically following the full load, add a new LTREE column to the target table and transform HierarchyID data with the generated transformation logic.
    • Resume the AWS DMS task for ongoing CDC replication.
  6. Complete the migration by dropping the original column and renaming the LTREE column to match the name of the dropped column.

Prerequisites

Before proceeding, ensure you have the following prerequisites in place:

Solution walkthrough

With the prerequisites established, you are prepared to begin the solution walkthrough.

Build sample tables on the source

Start by creating a sample table with a HierarchyID column in SQL Server:

  1. Connect to your SQL Server instance using SQL Server Management Studio (SSMS) or another SQL client.
  2. Create a sample table with a HierarchyID column:
    CREATE TABLE [dbo].[products_hid](
        [id] [int] IDENTITY(1,1) NOT NULL,
        [node] [hierarchyid] NULL,
        [name] varchar(50) NULL,
        PRIMARY KEY CLUSTERED
        (
            [id] ASC
        ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY];
  3. Insert sample data into the table:
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');
  4. Verify the data was inserted correctly. You should see columns for id, node_path, and name with 10 entries:
    SELECT id, node.ToString() AS node_path, name FROM [dbo].[products_hid];
  5. To capture changes, ensure that the database is configured for full backups and that MS-Replication or MS-CDC is enabled for your database and tables. In this instance, we enabled MS-CDC for the database DMS_DB. Since the source is an RDS SQL Server instance, we utilize RDS stored procedures. For further details, see Using change data capture for Amazon RDS for SQL Server
    exec msdb.dbo.rds_cdc_enable_db 'DMS_DB'
  6. Check if CDC has been enabled successfully by executing the following command:
    SELECT is_cdc_enabled 
    FROM sys.databases 
    WHERE name = 'DMS_DB';

Set up your target Aurora PostgreSQL DB

  1. Connect to your Aurora PostgreSQL cluster to create the LTREE extension. This will implement a data type LTREE for representing labels of data stored in a hierarchical tree-like structure.
    CREATE EXTENSION IF NOT EXISTS ltree;
  2. Validate the extension was created using the following command:
    SELECT * FROM pg_extension WHERE extname = 'ltree';

Prepare the table for migration using AWS DMS SC

For the example in this post, we utilized AWS DMS SC to prepare the table structure at the target. Alternatively, you can manually create the table structure at the target. Below is the target data definition language (DDL) created by AWS DMS SC by default:

CREATE TABLE dms_db_dbo.products_hid(
    id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    node VARCHAR(8000),
    name VARCHAR(50)
)
WITH (
    OIDS=FALSE
);

We modify the GENERATED ALWAYS AS IDENTITY to GENERATED BY DEFAULT AS IDENTITY because GENERATED ALWAYS instructs PostgreSQL to always generate a value for the IDENTITY column. Attempting to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column will result in an error when using AWS DMS. For more information, refer to Handle IDENTITY columns in AWS DMS.

The final DDL for the dms_db_dbo.products_hid table appears as follows:

CREATE TABLE dms_db_dbo.products_hid(
    id INTEGER NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    node VARCHAR(8000),
    name VARCHAR(50)
)
WITH (
    OIDS=FALSE
);

Upon successful migration, at cutover, ensure to modify the IDENTITY column back to GENERATED ALWAYS. Steps for this adjustment are outlined in a subsequent section of this post.

Migrate the HierarchyID column data using AWS DMS

Now, you are prepared to create the AWS DMS endpoints and the migration task. Once the endpoints and task are established, you can initiate the migration.

Create the AWS DMS endpoints

Create AWS DMS endpoints for both the source and target databases. AWS DMS endpoints provide connection, data store type, and location information regarding your data store.

Create your source endpoint

  1. On the AWS Management Console for AWS DMS, select Source endpoint for the Endpoint type.
  2. For Endpoint identifier, enter a name.
  3. For Source engine, select Microsoft SQL Server.
  4. For Access to endpoint database, choose AWS Secrets Manager and select the Secret from the dropdown. While you can provide user credentials manually, utilizing Secrets Manager is recommended.
  5. For IAM role, input the Amazon Resource Name (ARN) for the AWS Identity and Access Management (IAM) role that has access to Secrets Manager. For more information on the necessary permissions for data replication using AWS DMS, refer to Using secrets to access AWS Database Migration Service endpoints.
  6. For Secure Socket Layer (SSL) mode, select None. Alternatively, you can encrypt connections for both source and target endpoints using SSL. For details, see Using SSL with AWS Database Migration Service.
  7. Input the appropriate database name.
  8. Test the endpoint, then finalize endpoint creation.

Create the target endpoint

Repeat the previous steps with the following parameters for the target endpoint:

  1. On the AWS DMS console, select Target endpoint for the Endpoint type.
  2. For Endpoint identifier, enter a name.
  3. For Target engine, select Amazon Aurora PostgreSQL.
  4. Test the endpoint, then complete endpoint creation.

Create and run the AWS DMS migration task

An AWS DMS task is where the migration work occurs. This is where you configure the database objects to migrate, logging requirements, error handling, and more. Follow these steps to create your task:

  1. On the AWS DMS console, create a new migration task.
  2. For Task identifier, enter a recognizable name.
  3. For Replication instance, select the replication instance you created.
  4. For Source database endpoint, select the SQL Server endpoint you created.
  5. For Target database endpoint, select the Aurora PostgreSQL endpoint you created.
  6. For Migration type, select Migrate existing data and replicate ongoing changes.
  7. For Target table preparation mode, select Do nothing.
  8. For Stop task after full load completes, select Stop after applying cached changes.
  9. Enable CloudWatch Logs under the task settings for debugging purposes.
  10. Under Table mappings, for Editing mode, select JSON editor. Use the following JSON to create the table mappings:
    {
        "rules": [
            {
                "rule-id": 1606370666,
                "rule-name": "me0gv5az-ag7ry8",
                "rule-type": "selection",
                "rule-action": "include",
                "object-locator": {
                    "schema-name": "dbo",
                    "table-name": "products_hid"
                },
                "filters": []
            },
            {
                "rule-id": 1654891348,
                "rule-name": "me0gvnuw-7ncfid",
                "rule-type": "transformation",
                "rule-target": "schema",
                "rule-action": "rename",
                "object-locator": {
                    "schema-name": "dbo"
                },
                "value": "dms_db_dbo"
            }
        ]
    }
  11. Uncheck the Premigration assessment checkbox. Although this feature is typically recommended, it is disabled as part of this blog post’s solution.
  12. Leave the remaining settings as default and select Create task to initiate the task. The task will commence immediately.
  13. Wait for the full load to complete. The summary section of the Database migration task will display the status of the migration task. The task will automatically stop after the full load is finished and the cached changes have been applied.
  14. Verify the migrated data in Aurora PostgreSQL by executing the following select query:
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    0

Implement LTREE column transformation on Aurora PostgreSQL

To enable hierarchical queries in PostgreSQL, you will now add logic to the target table with an LTREE column that automatically transforms the migrated hierarchical data into PostgreSQL’s native format by following these steps:

  1. Alter the target table in PostgreSQL to add a generated LTREE column. In this blog post, we define the root as ‘0’.
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    1

  2. Verify that the new LTREE column, node_temp, is populated by running the following select command:
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    2

Resume your DMS task to test CDC replication with the new LTREE column

With the LTREE column established, you can now resume the migration task to ensure ongoing changes are captured accurately and that the generated LTREE column populates automatically for new records.

  1. To resume an AWS DMS task, navigate to the Actions menu in the AWS DMS console and select Resume. Once the task is in a running state, proceed with the next step.
  2. Test the CDC replication by inserting new rows in the source database:
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    3

  3. Verify that the new rows are correctly migrated to the target database, noting how the node_temp column is automatically populated for the new records that were added.

Post migration

After the migration is complete and data integrity has been verified, finalize the process by executing the following steps:

  1. Replace the original column with the transformed one by executing the following commands:
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    4

  2. After completing the data load, change the IDENTITY column’s behavior back to its original state of auto-generation. Modify the IDENTITY column back to GENERATED ALWAYS and restart the sequence from the last used identity value in your table, which in this example is 13.
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    5

  3. Insert a row in the PostgreSQL database to verify that data is being entered in the correct LTREE format.
    INSERT INTO [dbo].[products_hid] ([node], [name])
    VALUES
    (hierarchyid::Parse('/1/'), 'Electronics'),
    (hierarchyid::Parse('/1/1/'), 'Computers'),
    (hierarchyid::Parse('/1/2/'), 'Smartphones'),
    (hierarchyid::Parse('/1/1/1/'), 'Laptops'),
    (hierarchyid::Parse('/1/1/2/'), 'Desktops'),
    (hierarchyid::Parse('/2/'), 'Clothing'),
    (hierarchyid::Parse('/2/1/'), 'Men''s Wear'),
    (hierarchyid::Parse('/2/2/'), 'Women''s Wear'),
    (hierarchyid::Parse('/2/1/1/'), 'Shirts'),
    (hierarchyid::Parse('/2/2/1/'), 'Dresses');

    6

With these steps, your HierarchyID columns have been successfully migrated to LTREE format.

Common HierarchyID to LTREE Function Mapping

The table below maps SQL Server HierarchyID operators and functions to their PostgreSQL LTREE equivalents. Note that the specific syntax may vary depending on your LTREE hierarchy structure—whether you represent the root as ‘root’, an empty string, or utilize another convention.

SQL Server (HierarchyID) PostgreSQL (LTREE) Meaning
GetLevel() nlevel(ltree) Get node level
GetAncestor(n) Use combination of subpath(ltree, offset integer, len integer) & nlevel(ltree) Get nth level ancestor
GetRoot() subpath(ltree, 0, nlevel(node) – 1) OR use ltree pattern matching Get root
IsDescendantOf() ltree <@ ‘parent_path’ Get all the descendants
GetReparentedValue() No direct equivalent to GetReparentedValue() in PostgreSQL ltree. You must manually construct the reparenting logic using ltree functions. Moves a node from the old parent to the new parent.
GetDescendant() In PostgreSQL with ltree, there’s no direct equivalent to GetDescendant(). You’d need to create a custom function or use a subquery to find the next available child. See what the next child would be for a parent in the table
Parse(‘/1/2/3/’) ‘root.child.grandchild’::ltree Convert string to path
ToString() path::text Convert path to string

Clean up

To prevent ongoing charges, it is advisable to clean up the resources you created:

  1. Delete the Aurora PostgreSQL DB cluster.
  2. Delete the RDS for SQL Server instance.
  3. Clean up AWS DMS resources.
Tech Optimizer
Build a custom solution to migrate SQL Server HierarchyID to PostgreSQL LTREE with AWS DMS