Databricks has taken a significant step towards enhancing data security with the introduction of customer-managed keys (CMK) for its Lakehouse Postgres offering. This innovative feature empowers users with more granular control over their data encryption, addressing the pressing need for robust security in regulated environments. Organizations can now utilize their own Key Management Service (KMS) to safeguard sensitive information, moving away from the traditional reliance on provider-managed keys.
As outlined in Databricks’ blog, this new approach allows users to leverage keys from major cloud providers such as AWS KMS, Azure Key Vault, or Google Cloud KMS. By shifting the responsibility of key management to the users, Databricks ensures that the root of trust remains firmly within the customer’s control, a vital aspect for compliance in today’s data-driven landscape.
Encryption Across the Lakehouse Stack
The architecture of Lakehouse Postgres features a separation between storage and compute, which, while enhancing scalability, introduces unique challenges for encryption. Both long-term storage and transient compute caches require robust protection to ensure data integrity and confidentiality.
To address these challenges, Databricks employs a hierarchical envelope encryption model. In this framework, customer-managed keys (CMKs) are securely housed within the user’s cloud KMS and are never exposed to Databricks. Instead, the platform receives encrypted versions of the keys necessary for data decryption, maintaining a secure boundary around sensitive information.
This hierarchical model consists of three tiers: the customer-managed root key (CMK), a Key Encryption Key (KEK) utilized by Databricks’ Key Manager Service, and Data Encryption Keys (DEKs) that are unique to each data segment. When access to data is required, the system unwraps DEKs using keys retrieved from the customer’s KMS, ensuring a secure and efficient decryption process.
Revocation as a ‘Kill Switch’
A notable advantage of this encryption model is the ability to revoke access to sensitive data. If a CMK is revoked, the unwrapping process fails, rendering the data cryptographically inaccessible. This action also triggers the termination of active compute instances, effectively destroying ephemeral keys and shredding any local disk data. This capability serves as a powerful failsafe, particularly for high-compliance Postgres workloads, transforming the KMS into a technical ‘kill switch’ for sensitive information.
Practical Implementation and Workflow
The implementation of CMK integrates seamlessly with Databricks’ existing account-to-workspace delegation model, which delineates responsibilities between security administrators and data access personnel. The process begins with configuring the key at the account level, binding it to a specific workspace, and subsequently applying it to all new Lakehouse projects within that workspace. This structure allows for granular control, enabling different workspaces to utilize distinct CMKs tailored to meet multi-tenant or departmental security requirements.
Moreover, the system supports seamless key rotation, allowing users to rotate their CMK directly in their cloud provider’s console without necessitating data re-encryption or incurring downtime. All cryptographic operations involving the CMK are logged within the customer’s cloud audit services, thereby enhancing security auditability. This advanced level of cryptographic control is now available to Databricks Enterprise tier customers, reinforcing the platform’s commitment to robust data sovereignty and security for demanding workloads.