Transition a pivot query that includes dynamic columns from SQL Server to PostgreSQL

In the realm of database migration, particularly when transitioning workloads from SQL Server to PostgreSQL, the use of the PIVOT function is often a focal point for generating dynamic reports. A well-documented solution exists in the Microsoft SQL Server 2019 to Amazon Aurora PostgreSQL Migration Playbook, which utilizes CASE WHEN statements for each pivoted column. However, this method can become cumbersome as the number of pivoted columns increases, making maintenance a challenge and necessitating code changes for any new columns added.

To address these limitations, we introduce the crosstab function, part of PostgreSQL’s tablefunc extension. This function allows for a more flexible implementation of SQL Server’s PIVOT functionality.

Solution overview

Our approach leverages the crosstab function as the core component to demonstrate how to implement SQL Server-like PIVOT functionality through the PostgreSQL function get_dynamic_pivot_data. The primary advantage of using crosstab lies in its ability to dynamically generate columns based on the results of provided queries, thus accommodating various data sets with ease. This method also allows for the management of both fixed and variable columns in a pivot table, alongside the necessary cursor in the PostgreSQL function. The function can be executed using either psql or C#. For demonstration purposes, we will utilize psql, a terminal-based front-end to PostgreSQL, to illustrate how to invoke it with a cursor parameter. Typically, C# interacts with PostgreSQL via Npgsql. Sample C# code will be provided to showcase how to utilize the PostgreSQL function with Npgsql, including cursor variable management.

The following diagram illustrates the architecture of our solution.

Prerequisites

Before diving into the implementation, ensure that the following configurations are set up in your AWS account and on your local installations, depending on how you intend to test the function.

Test on the PostgreSQL client side

The steps below outline the process for testing the function on the PostgreSQL client side:

  1. Provision either Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
  2. Install a PostgreSQL client tool, such as pgAdmin on Amazon EC2 for Microsoft Windows Server, or psql on Amazon Linux 2023.
    • AWS CloudShell includes the PostgreSQL client tool (psql) version 15 by default. If you prefer not to provision an EC2 instance, you can use CloudShell to access Aurora PostgreSQL directly.

Test with C#:

For those opting to test with C#, the following steps are necessary. If C# is not your choice, these installations are not required.

  1. Download and install .NET SDK 8 on your EC2 instance. You can verify the .NET version on Windows using the command:
    C:UsersAdministrator>dotnet --list-sdks
    8.0.404 [C:Program Filesdotnetsdk]
    C:UsersAdministrator>dotnet --version
    8.0.404
  2. Download and install Visual Studio Code on your EC2 instance for Windows.

Use the PIVOT function on the SQL Server side

To transition a pivot query that includes dynamic columns, you will begin by creating two tables and a stored procedure on the SQL Server side. The corresponding code can be found in our GitHub repository. Follow these steps:

  1. Create two tables with sample data:
    1. The following code creates the table QuarterTbl:
      CREATE TABLE QuarterTbl (
      QuarterID INT NOT NULL IDENTITY PRIMARY KEY,
      QuarterItem varchar(2)
      );
      
      INSERT INTO QuarterTbl([QuarterItem])
      VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4');

      The output displays sample data from the table QuarterTbl:

      QuarterID    QuarterItem
      ------------ -----------
      1            Q1
      2            Q2
      3            Q3
      4            Q4
      
  2. Create the ProductSales table:
    CREATE TABLE ProductSales (
    ProductID INT NOT NULL IDENTITY PRIMARY KEY,
    ProductName varchar(10),
    QuarterID int,
    Year varchar(5),
    Sales int
    FOREIGN KEY (QuarterID) REFERENCES QuarterTbl(QuarterID)
    );
    
    INSERT INTO ProductSales([ProductName],[QuarterID],[Year],[Sales])
    VALUES
    ('ProductA', 1, 'Y2017', 100),
    ('ProductA', 2, 'Y2018', 150),
    ('ProductA', 2, 'Y2018', 200),
    ('ProductA', 1, 'Y2019', 300),
    ('ProductA', 2, 'Y2020', 500),
    ('ProductA', 3, 'Y2021', 450),
    ('ProductA', 1, 'Y2022', 675),
    ('ProductB', 2, 'Y2017', 0),
    ('ProductB', 1, 'Y2018', 900),
    ('ProductB', 3, 'Y2019', 1120),
    ('ProductB', 4, 'Y2020', 750),
    ('ProductB', 3, 'Y2021', 1500),
    ('ProductB', 2, 'Y2022', 1980);

    The total sales figures for each quarter across each year can be queried as follows:

    SELECT
    PS.ProductName,
    Q.QuarterItem,
    PS.Year, SUM(PS.Sales) as QuarterSales
    FROM ProductSales AS PS
    INNER JOIN QuarterTbl AS Q
    ON PS.QuarterID = Q.QuarterID
    GROUP BY PS.ProductName, Q.QuarterItem, PS.Year
    ORDER BY 1, 2, 3;

    The resulting query output for quarterly sales dummy data between 2017-2022 is as follows:

    ProductName QuarterItem Year     QuarterSales
    ----------- ----------- -----    ------------
    ProductA          Q1    Y2017     100
    ProductA          Q1    Y2019     300
    ProductA          Q1    Y2022     675
    ProductA          Q2    Y2018     350
    ProductA          Q2    Y2020     500
    ProductA          Q3    Y2021     450
    ProductB          Q1    Y2018     900
    ProductB          Q2    Y2017     0
    ProductB          Q2    Y2022     1980
    ProductB          Q3    Y2019     1120
    ProductB          Q3    Y2021     1500
    ProductB          Q4    Y2020     750
    (12 rows affected)
  3. Create a stored procedure called GetProductSalesReport that utilizes the PIVOT function and a dynamic query:
    CREATE OR ALTER PROCEDURE GetProductSalesReport
    @columns NVARCHAR(MAX)
    AS
    DECLARE  @sql NVARCHAR(MAX);
    
    SET @sql = 'SELECT * FROM
    (
    SELECT PS.ProductName, Q.QuarterItem, PS.Year, PS.Sales
    FROM ProductSales PS
    INNER JOIN QuarterTbl Q
    ON PS.QuarterID = Q.QuarterID
    ) t
    PIVOT (
    SUM(Sales)
    FOR [Year] IN ( ' + @columns + ')
    ) AS PV
    ORDER BY 1, 2;';
    
    -- execute the dynamic SQL
    EXECUTE sp_executesql @sql;
    GO
  4. Execute the stored procedure with a parameter specifying the list of dynamic columns to obtain the pivoted results:
    EXEC GetProductSalesReport N'[Y2017], [Y2018], [Y2019], [Y2020], [Y2021], [Y2022]'
    GO

    The output from the pivot query will be:

    ProductName QuarterItem Y2017   Y2018   Y2019   Y2020   Y2021   Y2022
    ----------- ----------- ------- ------- ------- ------- ------- -------
    ProductA    Q1          100     NULL    300     NULL    NULL    675
    ProductA    Q2          NULL    350     NULL    500     NULL    NULL
    ProductA    Q3          NULL    NULL    NULL    NULL    450     NULL
    ProductB    Q1          NULL    900     NULL    NULL    NULL    NULL
    ProductB    Q2          0       NULL    NULL    NULL    NULL    1980
    ProductB    Q3          NULL    NULL    1120    NULL    1500    NULL
    ProductB    Q4          NULL    NULL    NULL    750     NULL    NULL
    (7 rows affected)

At this stage, you have observed how to utilize the PIVOT function to create a sales report in SQL Server. By specifying a dynamic column list, SQL Server can return a result set in a standard table format. In practical scenarios, it is common for customers to generate numerous dynamic columns by querying a primary table.

Translating similar logic to PostgreSQL poses a challenge due to its requirement for predefined declarations. The following section outlines a workaround for addressing this issue in PostgreSQL.

Implement pivot-like functionality on the PostgreSQL side:

In PostgreSQL, you will create sample tables that mirror those in SQL Server and populate them with identical data. Subsequently, a function named get_dynamic_pivot_data will be created to replicate the functionality of GetProductSalesReport from SQL Server. The corresponding code is available in our GitHub repository.

  1. Create sample tables in PostgreSQL:
    CREATE TABLE quarter_tbl(
    quarter_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    quarter_item VARCHAR(2),
    PRIMARY KEY(quarter_id)
    );
    
    CREATE TABLE product_sales(
    product_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    product_name VARCHAR(10),
    quarter_id INTEGER,
    year VARCHAR(5),
    sales INTEGER,
    CONSTRAINT fk_quarter
    FOREIGN KEY(quarter_id)
    REFERENCES quarter_tbl(quarter_id)
    );
  2. Populate the two tables in PostgreSQL with the same data as in SQL Server:
    CREATE TABLE QuarterTbl (
    QuarterID INT NOT NULL IDENTITY PRIMARY KEY,
    QuarterItem varchar(2)
    );
    
    INSERT INTO QuarterTbl([QuarterItem])
    VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4');

    0

  3. To utilize the CROSSTAB function in PostgreSQL, the tablefunc extension must be installed and enabled in your database. This can be accomplished with the following SQL command:
    CREATE TABLE QuarterTbl (
    QuarterID INT NOT NULL IDENTITY PRIMARY KEY,
    QuarterItem varchar(2)
    );
    
    INSERT INTO QuarterTbl([QuarterItem])
    VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4');

    1
    Unlike SQL Server’s PIVOT function, which accommodates multiple row_name columns, PostgreSQL’s crosstab function supports only a single row_name column. Therefore, it is necessary to consolidate all row_name columns into a single composite column, separated by commas. During the creation of the output table, all columns from this composite column must be unpacked.

  4. Utilize the function get_dynamic_pivot_data to create a pivot table that includes multiple row_name columns (fixed columns). The function accepts two parameters: one for a cursor using a refcursor data type and another for the dynamic column list using a text data type. To facilitate processing, the code creates a temporary table to hold the intermediate results of the crosstab. This function returns the cursor containing the pivot-like query result.

The following code illustrates the equivalent of the SQL Server function in PostgreSQL. Two columns are split from the fixed column, while the remaining columns (Y2017 through Y2022) are generated dynamically. It is crucial to specify the cursor name (pivot_cur) when invoking the function, and it should be executed within a transaction since data is fetched from a cursor.

Invoke the PostgreSQL pivot-like function from C#:

To build a sample C# project on Windows, follow the steps below. The Npgsql package is essential for accessing PostgreSQL from C#.NET.

  1. Open a Command Prompt terminal and navigate to your work directory.
  2. Create a new .NET project:
  3. Change into the newly created directory.
  4. Add Npgsql to the .NET project:
  5. Add JSON support:
  6. Add AWS SDK:
  7. Add Secrets Manager Caching:
  8. Use Visual Studio Code to open the project folder and edit the cs file.
  9. Run the .NET code in the terminal of Visual Studio Code.

You can review the Program.cs file in the GitHub repository for further details. The NpgsqlCommand is employed to call the get_dynamic_pivot_data PostgreSQL function, while the NpgsqlDataReader retrieves the contents of the refcursor.

From a security standpoint, both the Aurora PostgreSQL and the Windows EC2 instance hosting the .NET application are situated in private subnets, safeguarded by security groups against external threats. Furthermore, database credentials are securely stored in AWS Secrets Manager, enhancing internal security by managing sensitive information. This blog aims to demonstrate the implementation of SQL Server-like PIVOT queries on Aurora PostgreSQL or Amazon RDS for PostgreSQL. Note that SSL connection implementation is not covered in this sample code, as it falls outside the scope of this demonstration.

The following screenshot showcases the use of the Visual Studio Code editor to edit the C# code and execute the dotnet command in its terminal, producing results consistent with running the stored function through psql on the command line.

Clean up

After completing your testing, it is important to follow these steps to avoid unnecessary charges:

In this discussion, we explored how to utilize the crosstab function of PostgreSQL with a cursor to achieve results akin to SQL Server’s PIVOT function. Additionally, we demonstrated how to invoke the PostgreSQL function that provides pivot functionality using C# code.

Have you experimented with different methods for implementing PIVOT functionality in PostgreSQL? We invite you to share your experiences in the comments section. For further insights into migrating from SQL Server to PostgreSQL, refer to Migrate SQL Server to Amazon Aurora PostgreSQL using best practices and lessons learned from the field.


About the Author

Jian (Ken) Zhang is a Senior Database Migration Specialist at AWS. He collaborates with AWS customers to provide guidance and technical assistance on migrating commercial databases to AWS open-source databases. In his leisure time, he enjoys exploring good restaurants and playing Go, a strategic board game.

Tech Optimizer