Protect sensitive data with dynamic data masking for Amazon Aurora PostgreSQL

Today marks the introduction of a dynamic data masking feature for the Amazon Aurora PostgreSQL-Compatible Edition. This enhancement to the Aurora security toolkit provides column-level protection, complementing PostgreSQL’s native row-level security to ensure comprehensive and granular access control.

Amazon Aurora is a relational database that offers compatibility with both MySQL and PostgreSQL, delivering high performance alongside open-source flexibility. Organizations utilizing Aurora face the challenge of safeguarding sensitive data while allowing varying access levels based on user roles.

This article illustrates how dynamic data masking can assist in meeting data privacy requirements, detailing its implementation and functionality within the PostgreSQL role hierarchy. The feature is available for Aurora PostgreSQL versions 16.10 and higher, as well as 17.6.

Data masking: Requirement and use cases

Enterprises operating in regulated sectors such as banking, insurance, FinTech, and healthcare encounter intricate challenges in managing access to sensitive customer data. Consider a bank that must provide different levels of data access across various teams:

  1. Customer support representatives need to verify customer identities during service calls. They require partial access to account information, such as the last few digits of account numbers, to authenticate customers without revealing complete account details.
  2. Industry analysts analyze customer financial behavior to develop new products and services. They require anonymized data to identify patterns while safeguarding individual customer privacy.
  3. Applications that support end customer queries, such as internet banking applications, must display unmasked data to authenticated users.

The challenge: Traditional data masking falls short

Organizations have historically struggled to protect sensitive data while ensuring it remains accessible for legitimate business needs. Traditional approaches often come with significant trade-offs.

Creating separate sanitized datasets may seem straightforward—simply generate different copies of production data for various teams. However, this practice quickly becomes a maintenance nightmare. Constantly synchronizing data across multiple copies leads to redundant storage costs, and as organizations expand, managing access across these datasets grows increasingly complex.

Database-level masked views appear to offer a cleaner solution but introduce serious security risks. Sophisticated users may gain unintended access through complex SQL expressions, allowing them to glimpse underlying data protected by the view. Additionally, these views can slow down queries, adversely affecting application performance when it is needed most.

Application-level masking merely shifts the problem rather than resolving it. Each application interacting with the data requires its own masking logic, resulting in inconsistent protection across the environment. Development teams waste valuable time reimplementing the same security controls, and every new application adds to the maintenance burden.

A better approach: Dynamic data masking

The Amazon Aurora PostgreSQL-Compatible Edition now introduces dynamic data masking—a solution that circumvents these traditional limitations entirely.

This innovative approach allows organizations to maintain a single copy of their data, eliminating duplication, synchronization headaches, and storage bloat. The actual data within the database remains unaltered.

The magic unfolds through the pg_columnmask extension, which operates at the database level during query processing. This means there is no need to modify applications. Instead of distributing masking logic across multiple codebases, policy-based rules can be defined once, directly within the database.

These policies dictate how sensitive data is presented to different users based on their roles. For instance, an executive may see full credit card numbers, while a customer service representative only sees the last four digits—all derived from the same underlying data. Unlike masked views, this method does not introduce security vulnerabilities or performance penalties.

Dynamic data masking empowers organizations to:

  • Protect sensitive information in real-time without data duplication
  • Maintain data utility for users who require full access
  • Comply with regulations such as GDPR, HIPAA, and PCI DSS
  • Implement role-based access controls efficiently and consistently

pg_columnmask extension

The pg_columnmask extension leverages two core PostgreSQL mechanisms:

  1. Policy-based Masking – The pg_columnmask extension enhances PostgreSQL’s row-level security (RLS) capabilities to establish column-level masking policies. These policies utilize built-in or custom functions to:
    • Conceal sensitive information
    • Replace values with wildcards
    • Employ weights to determine which masking policy should apply when multiple policies pertain to a column.
  2. Runtime Query Rewrite – The pg_columnmask extension integrates with PostgreSQL’s query processing pipeline to mask data during execution. This integration preserves query functionality while safeguarding sensitive information.

A policy administrator can create masking policies using standard SQL commands, incorporating PostgreSQL’s built-in functions or custom functions. These policies transform data based on user roles and access levels, enabling precise control over sensitive information.

For instance, when a customer logs into their banking application, the application connects to the database using an authorized user role. The pg_columnmask extension recognizes this role and allows the application to read unmasked data. Conversely, support staff assisting customers can verify identities without accessing complete sensitive information. They might see partial details, such as the last four digits of a customer identifier, ensuring just enough information is available for customer verification while protecting sensitive data.

Furthermore, the pg_columnmask extension employs policy weights to determine precedence when multiple masking policies target the same column. This hierarchical structure allows organizations to implement role-specific masking rules while maintaining consistent data protection across varying access levels.

How dynamic data masking works: Under the hood

To grasp how dynamic data masking protects your data, it is essential to understand PostgreSQL’s query processing. When executing a SQL query, PostgreSQL navigates through several stages:

  1. Parsing: The parser reads the SQL query and constructs a basic structure known as a parse tree.
  2. Analysis: The analyzer validates the parse tree, ensuring that the tables, columns, and data types referenced are legitimate.
  3. Rewriting: This stage modifies the query before execution, where dynamic data masking performs its work.
  4. Planning and execution: The query planner generates a cost-based query plan for processing by the executor.

Where masking happens

The pg_columnmask extension integrates directly into the rewrite stage. As a query progresses, the dynamic data masking (DDM) rewriter automatically injects masking logic based on the defined policies. This masking occurs transparently within PostgreSQL’s normal query flow, allowing applications to send standard queries while the rewriter transforms them to mask sensitive columns based on user roles.

For example, if a policy masks email addresses for junior analysts, their queries are automatically rewritten to return masked values, while senior analysts with different permissions see the actual data. This seamless integration ensures that applications remain unaware of the masking process, allowing the database to enforce security policies consistently across all queries and applications.

The following sections will demonstrate how to create a sample database, users, roles, and configure dynamic data masking and masking policies.

Prerequisites

To begin, create an Amazon Elastic Compute Cloud (Amazon EC2) instance, followed by creating an Aurora PostgreSQL database using version 16.10 or 17.6. Once the Aurora PostgreSQL cluster is available, connect to the EC2 instance and install the required client tools (psql), then test connectivity to the database.

Create the users, database, and test data

Connect to the cluster endpoint using a role with rds_superuser permissions:

psql -h test-ddm.cluster-xxxxxxxx.us-east-1.rds.amazonaws.com -U postgres

Create users who will own user tables, serve as policy administrators, and be members of analyst and senior analyst roles:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

Create a new database called test_ddm and grant necessary privileges to dbowner:

CREATE DATABASE test_ddm;
c test_ddm
GRANT USAGE, CREATE ON SCHEMA public TO dbowner;
GRANT USAGE ON SCHEMA public TO analyst;

Connect to test_ddm using dbowner:

psql -h test-ddm.cluster-xxxxxxxx.us-east-1.rds.amazonaws.com -U dbowner -d test_ddm

Create test tables named accounts and ledger:

CREATE TABLE accounts
(
    account_number text NOT NULL PRIMARY KEY,
    account_holder_name text NOT NULL,
    account_balance numeric(24,4),
    account_contact_email text NOT NULL,
    customer_id text NOT NULL
);
GRANT SELECT ON TABLE public.accounts TO analyst;

CREATE TABLE ledger
(
    ledger_entry_id integer GENERATED BY DEFAULT AS IDENTITY NOT NULL PRIMARY KEY,
    transaction_id text NOT NULL,
    account_number text NOT NULL,
    transaction_amt numeric(24,4),
    transaction_type text,
    ledger_entry_type text NOT NULL,
    transaction_merchant_code text
);
GRANT SELECT ON TABLE public.ledger TO analyst;

Insert some sample data into both tables:

-- accounts table
INSERT INTO accounts (account_number, account_holder_name, account_balance, account_contact_email, customer_id) VALUES
('CHK-2024-001', 'Jorge Souza', 3247.8900, 'jorge.souza@example.com', 'C-1001'),
('SAV-2024-002', 'John Doe', 12500.3400, 'john.doe@example.com', 'C-1002'),
('CHK-2024-003', 'John Roe', 567.2300, 'john.roe@example.com', 'C-1003'),
('SAV-2024-004', 'Mary Major', 28900.5600, 'mary.major@example.com', 'C-1004');

-- ledger table
INSERT INTO ledger (transaction_id, account_number, transaction_amt, transaction_type, ledger_entry_type, transaction_merchant_code)
VALUES
('TPT-001', 'SAV-2024-002', -200.2300, 'THIRD-PARTY-TRANSFER', 'DEBIT', NULL ),
('TPT-001', 'SAV-2024-004', 200.0000, 'THIRD-PARTY-TRANSFER', 'CREDIT', NULL ),
('POS-002','SAV-2024-002', -1000.0000, 'POS','DEBIT','5722'), -- Spent on household appliance
('POS-003','SAV-2024-004', -100.0000, 'POS','DEBIT','5139'),  -- Spent on footwear
('POS-004','SAV-2024-004', -300.0000, 'POS','DEBIT','5722'); -- Spent on household appliance

Configure policy admin

Create a new DB cluster parameter group using the same engine version as the Amazon Aurora cluster. Modify the DB cluster parameter group to set the parameter pgcolumnmask.policy_admin_rolname to policy_admin. Modify the Amazon Aurora cluster to utilize the newly created DB cluster parameter group. This is a dynamic parameter, allowing modification without a restart. The pgcolumnmask.policy_admin_rolname sets the role authorized to create data masking policies for user tables. Table owners and members of the rds_superuser role can also associate masking policies with a table.

Set up dynamic data masking

To utilize the masking feature, connect to the Aurora cluster endpoint using a user with rds_superuser role:

psql -h test-ddm.cluster-xxxxxxxx.us-east-1.rds.amazonaws.com -U postgres -d test_ddm

Create the pg_columnmask extension:

CREATE EXTENSION pg_columnmask;

After setting up the extension, a new function pgcolumnmask.create_masking_policy will be available to define the masking policy. You can utilize the built-in masking functions provided by the pg_columnmask extension.

test_ddm=> SELECT proname FROM pg_proc
           WHERE pronamespace = 'pgcolumnmask'::regnamespace;
                proname
----------------------------------------
 create_masking_policy
 alter_masking_policy
 rename_masking_policy
 drop_masking_policy
 ddm_internal_masking_policy_identifier
 get_masking_policy_info
 mask_text
 mask_timestamp
 mask_email
(9 rows)

Create a masking policy

Log in to the database using a user with the policy admin role (policy_admin).

psql -h test-ddm.cluster-xxxxxxxx.us-east-1.rds.amazonaws.com -U samkumar -d test_ddm

Create a masking policy using the function create_masking_policy:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

This command creates a new masking policy named ‘fully_mask_account_data’ applicable to the ‘public.accounts’ table. The JSONB column defines the following masking rules:

  • account_holder_name will be masked using the mask_text function provided by the pg_columnmask extension, replacing all characters in a string with “X”.
  • customer_id will also be masked using the mask_text function.
  • account_balance will be masked using PostgreSQL’s built-in round function to round off the amount to the nearest thousand.
  • account_contact_email will be masked using the mask_email function provided by the pg_columnmask extension, which masks strings in email address format.

The ARRAY object includes all roles subject to this policy, with a weight of 50 assigned to each rule. Additionally, create a masking policy to mask transaction details from analysts:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

In these policies, built-in functions provided by the pg_columnmask extension are utilized, but you can also employ PostgreSQL’s built-in functions or define custom SQL or PL/pgSQL functions for data masking. Ensure that the return type of the function matches the data type of the column being masked.

Note that policy administrators do not have access to the actual data. Users with the policy_admin role (e.g., samkumar) cannot view data in the account and ledger tables, but they can define masking policies for sensitive data:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

Create a separate policy for a senior_analyst to allow them to view accurate amounts and transaction details:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

To view the policies, describe a table:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

Additionally, you can retrieve details of the masking policy defined on a table using get_masking_policy_info.

Validate data masking

Once the masking policies are defined, users under the analyst role will only see masked data. To test this, connect using the user jane and execute a query:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

As an analyst, jane has access to the ledger table and can perform a JOIN between accounts and ledger, but will only see masked data:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

When shirley, a member of the senior_analyst role, connects to the database test_ddm, she will be able to JOIN accounts with ledger and see partially masked data:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

In this scenario, shirley observes masked data due to the policies defined for both analyst and senior_analyst. Initially, the higher weight masking policies—partially_mask_ledger_data and partially_mask_account_data—are applied, allowing her to see transaction_merchant_code, transaction_amt, and account_balance. Subsequently, lower weight policies—fully_mask_ledger_data and fully_mask_account_data—are enforced, preventing her from viewing account_holder_name or account_contact_email.

In contrast, the dbowner will have full visibility of the data:

CREATE USER dbowner LOGIN password 'Secure-Password';
CREATE ROLE policy_admin NOLOGIN;
CREATE ROLE analyst NOLOGIN;
CREATE ROLE senior_analyst NOLOGIN;
GRANT analyst TO senior_analyst;
CREATE ROLE samkumar LOGIN password 'Secure-Password';
GRANT policy_admin TO samkumar;
CREATE ROLE jane LOGIN password 'Secure-Password';
GRANT analyst TO jane;
CREATE ROLE shirley LOGIN password 'Secure-Password';
GRANT senior_analyst TO shirley;

Clean up

To clean up the resources utilized in this post, follow these steps:

  1. Delete the instances in the Aurora PostgreSQL cluster created during the prerequisite steps.
  2. If necessary, adjust the deletion protection setting for the Aurora PostgreSQL cluster.
  3. Delete the Aurora PostgreSQL cluster.
  4. Terminate the Amazon EC2 instance used as a client for connection to the Aurora PostgreSQL cluster.
  5. If needed, delete the Amazon Elastic Block Store (Amazon EBS) volume attached to the Amazon EC2 instance.
Tech Optimizer