Take Control: Customer-Managed Keys for Lakebase Postgres

Encryption at rest has become a fundamental requirement for cloud environments, particularly for enterprises navigating the complexities of highly regulated sectors. To address these needs, Lakebase introduces Customer Managed Keys (CMK), empowering organizations to maintain control over their encryption keys sourced from their preferred Key Management Service (KMS), such as AWS KMS, Azure Key Vault, or Google Cloud KMS. This capability ensures that data protection and management are seamlessly integrated throughout the Lakebase lifecycle.

Unlike traditional managed databases that typically focus solely on storage encryption, Lakebase CMK extends its management capabilities to encompass both persistent storage and ephemeral compute. This holistic approach provides enterprises with a robust framework for data security.

The Architecture of Lakebase Encryption

The architecture of Lakebase is designed with a clear separation between storage and compute layers, facilitating elastic scaling and serverless operations. The storage layer, comprising Pageserver and Safekeeper, is responsible for maintaining long-lived, persistent data in object storage and local caches. Meanwhile, the compute layer operates independent Postgres instances that dynamically scale based on demand, whether that means scaling up, down, or even to zero.

This architectural separation presents a unique challenge for encryption, as both layers—and all associated caches—must be encrypted and remain under the customer’s control. Lakebase CMK effectively addresses this challenge through a hierarchical Envelope Encryption model.

The Key Hierarchy

Envelope Encryption is a sophisticated security model where data is encrypted using unique data keys (DEKs), which are in turn secured by higher-level keys. This hierarchy guarantees that your CMK remains within your cloud KMS, with Databricks only receiving wrapped (encrypted) versions of the keys necessary for data decryption. This design not only enhances performance by minimizing KMS interactions but also facilitates seamless key rotation and timely revocation when required.

The key hierarchy consists of three distinct levels:

  1. Customer Managed Key (CMK): This is the Root of Trust, residing securely in your cloud KMS (AWS KMS, Azure Key Vault, or Google Cloud KMS). Databricks never accesses the plaintext version of this key.
  2. Key Encryption Key (KEK): A transient key utilized by the Databricks Key Manager Service to wrap data keys.
  3. Data Encryption Keys (DEKs): Unique keys generated for each data segment, stored alongside the data in an encrypted (wrapped) state.

When access to data is required, Lakebase components retrieve the necessary DEK by unwrapping keys obtained from your KMS. In the event of revocation, this unwrapping process fails, rendering the data cryptographically inaccessible. Additionally, all ephemeral compute instances are terminated to eliminate access to cached data.

CMK in Practice: Storage and Compute

The practical application of CMK varies between the storage and compute layers:

1. Persistence Layer (Storage)

All data segments managed by Lakebase, including Write-Ahead Logging (WAL) segments and data files, are encrypted using keys safeguarded by your CMK. This multi-layered defense ensures that data at rest is protected by encryption keys that remain under your control, rather than that of Databricks.

2. Ephemeral Layer (Compute)

The Postgres compute virtual machine (VM) manages ephemeral data utilized by the operating system and PostgreSQL, including performance caches and temporary files. It is essential that this data is also governed by a CMK. The CMK secures ephemeral compute data through:

  • Per Boot Keys: Each time a Lakebase compute instance starts, it generates a unique ephemeral key.
  • Automatic Shredding: Upon CMK revocation, the Lakebase Manager terminates the instance, destroying ephemeral in-memory keys and making local disk data inaccessible.

Implementing CMK in the Lakebase Workflow

The implementation of CMK follows the established Databricks Account to Workspace delegation model, ensuring that Security Admins can manage keys without requiring access to the underlying data. Once a key is configured at the workspace level, all Lakebase projects automatically incorporate the CMK into their encryption workflows.

Step 1: Key Configuration

An Account Admin initiates the process by creating a Key Configuration in the Databricks Account Console. This configuration includes the key identifier (ARN for AWS KMS, Key Vault URL for Azure, or Key ID for Google Cloud KMS) along with the IAM role or service principal that Lakebase will utilize for Wrap and Unwrap operations.

Step 2: Workspace Binding

This configuration is subsequently linked to a specific Workspace. For Lakebase, this entails:

  • New Projects: All new Lakebase projects automatically inherit the CMK associated with the workspace.
  • Isolation: Different workspaces can implement distinct CMKs to meet multi-tenant or departmental security requirements.

Step 3: Lifecycle Management and Rotation

Lakebase facilitates seamless key rotation. When you rotate your CMK within your cloud provider’s console:

  • The envelope encryption hierarchy allows for smooth rotation—your CMK can be updated in your cloud KMS without the need for re-encrypting data or altering DEKs.
  • This process incurs zero downtime or manual re-encryption efforts.

Security Auditability

Since the CMK resides within your cloud account, all cryptographic operations involving your key are logged by your provider’s audit service, such as AWS CloudTrail, Azure Monitor, or Google Cloud Audit Logs.

Get Started with Enhanced Data Sovereignty

For organizations seeking the utmost control over their cryptographic practices for Postgres workloads, Lakebase CMK is now accessible for Enterprise tier customers. To take charge of your encryption keys, reach out to your Databricks account team to enable Customer Managed Keys for your workspace. Alternatively, consult our technical documentation to review the necessary IAM policies and KMS configurations. If you are not yet a Databricks customer, consider starting with a trial.

Tech Optimizer
Take Control: Customer-Managed Keys for Lakebase Postgres