- Knowledge Base
- Google Cloud Platform
- GCP Dataproc Service
- Enable Dataproc Cluster Encryption with Customer-Managed Keys
Ensure that your Google Cloud Dataproc clusters on Compute Engine are encrypted with Customer-Managed Keys (CMKs) in order to control the cluster data encryption/decryption process. You can create and manage your own Customer-Managed Keys (CMKs) with Cloud Key Management Service (Cloud KMS). Cloud KMS provides secure and efficient encryption key management, controlled key rotation, and revocation mechanisms.
This rule resolution is part of the Conformity Security & Compliance tool for GCP.
By default, the Dataproc service encrypts all data at rest using Google-managed encryption keys. The Dataproc cluster data is encrypted using a Google-generated Data Encryption Key (DEK) and a Key Encryption Key (KEK). If you need to control and manage your cluster data encryption yourself, you can use your own Customer-Managed Keys (CMKs). Cloud KMS Customer-Managed Keys can be implemented as an additional security layer on top of existing data encryption, and are often used in the enterprise world, where compliance and security controls are very strict.
Audit
To determine if your Google Cloud Dataproc Clusters on Compute Engine are encrypted with Customer-Managed Keys (CMKs), perform the following operations:
Using GCP Console
01 Sign in to Google Cloud Management Console.
02 Select the Google Cloud Platform (GCP) project that you want to access from the console top navigation bar.
03 Navigate to Dataproc service console at https://console.cloud.google.com/dataproc.
04 In the navigation panel, select Clusters to access the list of the Dataproc clusters deployed in the selected project.
05 Click on the name of the Dataproc cluster that you want to examine.
06 Select the CONFIGURATION tab, and check the Encryption type configuration attribute value. If Encryption type value is set to Google-managed key, the data stored on the selected Google Cloud SQL Dataproc cluster is not encrypted with a Customer-Managed Key (CMK).
07 Repeat step no. 5 and 6 for each Dataproc cluster provisioned for the selected GCP project.
08 Repeat steps no. 2 – 7 for each project deployed within your Google Cloud account.
Using GCP CLI
01 Run projects list command (Windows/macOS/Linux) using custom query filters to list the IDs of all the Google Cloud Platform (GCP) projects available in your cloud account:
gcloud projects list --format="table(projectId)"
02 The command output should return the requested GCP project identifiers:
PROJECT_ID cc-bigdata-project-123123 cc-web-app-project-112233
03 Run dataproc clusters list command (Windows/macOS/Linux) using custom query filters to describe the name of each Dataproc cluster provisioned for the selected Google Cloud project:
gcloud dataproc clusters list --project cc-bigdata-project-123123 --region=us-central1 --format="(NAME)"
04 The command output should return the requested Dataproc cluster name(s):
NAME cc-dataproc-prod-cluster cc-dataproc-test-cluster cc-dataproc-hda1-cluster
05 Run dataproc clusters describe command (Windows/macOS/Linux) using the name of the Google Cloud Dataproc cluster that you want to examine as identifier parameter and custom query filters to describe the resource ID of the Customer-Managed Key used to encrypt the cluster data:
gcloud dataproc clusters describe cc-dataproc-prod-cluster --region=us-central1 --format=json | jq '.config.encryptionConfig.gcePdKmsKeyName'
06 The command output should return the full resource ID of the CMK used to encrypt Dataproc cluster data:
null
If the dataproc clusters describe command output returns null, as shown in the example above, the data on the selected Google Cloud Dataproc cluster is not encrypted with a Customer-Managed Key (CMK).
07 Repeat step no. 5 and 6 for each Dataproc cluster created within the selected project.
08 Repeat steps no. 3 – 7 for each GCP project deployed in your Google Cloud account.
Remediation / Resolution
To enable encryption with Cloud KMS Customer-Managed Keys (CMKs) for your Google Cloud Dataproc clusters on Compute Engine, you have to re-create the existing Dataproc clusters with the appropriate encryption configuration by performing the following operations:
Using GCP Console
01 Sign in to Google Cloud Management Console.
02 Select the GCP project that you want to access from the console top navigation bar.
03 To create and configure your new Customer-Managed Key (CMK), perform the following:
- Navigate to Cloud Key Management Service (Cloud KMS) dashboard at https://console.cloud.google.com/security/kms.
- Before you can set up and manage any Customer-Managed Keys (CMKs), you must create a key ring. A KMS key ring is a grouping of cryptographic keys made available for organizational purposes in a specific location. In the navigation panel, select Cryptographic Keys, and click on the CREATE KEY RING button to set up the required key ring and the new Customer-Managed Key (CMK).
- A key ring requires a name and location. On the Create key ring page, provide a unique name in the Key ring name box, then choose the appropriate location from the Key ring location dropdown list. The location can be either global or associated with a particular region. If the CMKs created later within the key ring will be used to encrypt/decrypt resources in a given region, select that region as the key ring location. Click CREATE to deploy the new key ring.
- On the Create key page, select Generated key as the type of the CMK that you want to create. Provide a name for your new key in the Key name box, choose the protection level (software or Hardware Security Module) that you want to use, select Symmetric encrypt/decrypt from the Purpose dropdown list to define the types of operations that your cryptographic key can perform, and configure the key rotation parameters. Click CREATE to deploy your new Cloud KMS Customer-Managed Key (CMK).
04 Navigate to Dataproc service console at https://console.cloud.google.com/dataproc.
05 In the navigation panel, select Clusters to access the list of the Dataproc clusters deployed in the selected project.
06 Click on the name of the Dataproc cluster that you want to re-create and collect all the configuration information available for the selected resource.
07 Go back to the Clusters console and click on the CREATE CLUSTER button from the dashboard top menu to initiate the Dataproc cluster setup process. When prompted to select the infrastructure service, choose Cluster on Compute Engine.
08 On the Create a Dataproc cluster on Compute Engine page, perform the following actions:
- Under the Set up cluster panel on the left hand menu, provide a unique identifier for the new cluster in the Name box.
- Use the Region and Zone settings to deploy the new Dataproc cluster in the same region and zone as the source cluster.
- Select the cluster type from the Cluster type radio buttons to determine how many master and worker nodes will be available (must match the configuration used by the source cluster).
- Select the appropriate Auto-scaling policy and Enhanced flexibility mode (if applicable).
- Select the appropriate Versioning and Components settings to match the source cluster configuration.
- Under the Configure Nodes panel on the left hand menu, select the appropriate hardware configurations, including the machine family, series, machine type, GPU type (if applicable), primary disk size and disk type for both the Master Node and Worker Nodes.
- If applicable, under the Customize cluster panel, configure additional settings including network configurations, labels, metadata, properties, scheduled deletion settings and storage. Ensure that the new cluster has the same network, security and storage configuration as the source cluster.
- Under the Manage security panel on the left hand menu, perform the following actions:
- In the Encryption section, choose the Customer-managed encryption key option, and select the CMK created at step no. 3 from the Select a customer-managed key dropdown list. If the newly created CMK does not appear in the dropdown list, select Don't see your key? Enter key resource ID and provide the full resource ID of your Customer-Managed Key (CMK).
- Configure the remaining cluster settings based on the configuration information taken from the source instance at step no. 6.
- Click Create to launch your new Google Cloud Dataproc cluster.
09 If required, migrate the source cluster data to the newly created (target) cluster.
10 Update your application(s) to reference the new Dataproc cluster.
11 Once the new cluster is operating successfully, you can remove the source cluster in order to stop adding charges to your Google Cloud bill. Click on the name of the resource that you want to delete (see Audit section part I to identify the source cluster).
12 Click on the DELETE button from the dashboard top menu to initiate the removal process.
13 Within Confirm deletion dialog box, click DELETE to confirm the cluster deletion.
14 Repeat steps no. 6 – 13 to enable encryption at rest with Customer-Managed Keys (CMKs) for other Google Cloud Dataproc clusters available in the selected project.
15 Repeat steps no. 2 – 14 for each GCP project available in your Google Cloud account.
Using GCP CLI
01 Before you can set up and manage your Customer-Managed Keys (CMKs), you must create a key ring to store the CMKs. Run kms keyrings create command (Windows/macOS/Linux) to create a new Cloud KMS key ring in the specified location. If the CMKs created later within this key ring will be used to encrypt/decrypt resources in a given region, select that region as the key ring location:
gcloud kms keyrings create cc-cloud-sql-key-ring --location=us --project=cc-bigdata-project-123123 --format="table(name)"
02 The command output should return the identifier (name) of the newly created key ring:
NAME projects/cc-bigdata-project-123123/locations/us/keyRings/cc-dataproc-key-ring
03 Run kms keys create command (Windows/macOS/Linux) to create a new Cloud KMS Customer-Managed Key (CMK) within the KMS key ring created at the previous steps:
gcloud kms keys create cc-dataproc-cluster-cmk --location=us-central1 --keyring=cc-dataproc-key-ring --purpose=encryption --protection-level=software --rotation-period=90d --next-rotation-time=2020-9-15T10:00:00.0000Z --format="table(name)"
04 The command output should return the name of the new Customer-Managed Key (CMK):
NAME projects/cc-bigdata-project-123123/locations/us-central1/keyRings/cc-dataproc-key-ring/cryptoKeys/cc-dataproc-cluster-cmk
05 Run projects add-iam-policy-binding command (Windows/macOS/Linux) to assign the Cloud KMS "CryptoKey Encrypter/Decrypter" role to the appropriate service account. Replace <kms-project-id>
with the ID of the Google Cloud project where the Customer-Managed Keys are provisioned, and replace <dataproc-project-number>
with the project number of the Google Cloud project that is running your Dataproc clusters:
gcloud projects add-iam-policy-binding<kms-project-id>
--member serviceAccount:service-<dataproc-project-number>
@compute-system.iam.gserviceaccount.com --role roles/cloudkms.cryptoKeyEncrypterDecrypter
06 The command output should return the updated IAM policy (YAML format):
Updated IAM policy for project <kms-project-id>. bindings: - members: - serviceAccount:service-<project-number>@compute-system.iam.gserviceaccount.com role: roles/cloudkms.cryptoKeyEncrypterDecrypter - members: - user:admin@cloudconformity.com role: roles/owner etag: abcdabcdabcd version: 1
07 Run dataproc clusters describe command (Windows/macOS/Linux) using the name of the Google Cloud Dataproc cluster that you want to examine as identifier parameter and custom query filters to describe the configuration metadata available for the selected cluster:
gcloud dataproc clusters describe cc-dataproc-prod-cluster --region=us-central1 --format=json
08 The command output should return the requested configuration metadata:
{ "clusterName": "cc-dataproc-prod-cluster", "config": { "configBucket": "dataproc-staging-us-central1-123456789012-abcdabcd", "masterConfig": { "diskConfig": { "bootDiskSizeGb": 500, "bootDiskType": "pd-standard" }, "machineTypeUri": "https://www.googleapis.com/compute/v1/projects/cc-bigdata-project-123123/zones/us-central1-a/machineTypes/n1-standard-1", "minCpuPlatform": "AUTOMATIC", }, ... "tempBucket": "dataproc-temp-us-central1-6123456789012-abcdabcd" }, "projectId": "cc-bigdata-project-123123", "status": { "state": "RUNNING", "stateStartTime": "2020-07-19T08:20:00.000Z" }, "statusHistory": [ { "state": "CREATING", "stateStartTime": "2020-07-19T08:20:00.000Z" } ] }
09 Run dataproc clusters create command (Windows/macOS/Linux) using the information returned at the previous step as configuration data for the command parameters, to create a new Google Cloud Dataproc cluster, encrypted with the Customer-Managed Key (CMK) created at step no. 3:
gcloud dataproc clusters create cc-encrypted-dataproc-cluster --region=us-central1 —-project=cc-bigdata-project-123123 --single-node --master-machine-type=n1-standard-1 --master-boot-disk-size=500GB --master-boot-disk-type=pd-standard --gce-pd-kms-key=projects/cc-bigdata-project-123123/locations/us-central1/keyRings/cc-dataproc-key-ring/cryptoKeys/cc-dataproc-cluster-cmk
10 The command output should return the metadata (region and URL) for the newly created Dataproc cluster:
Waiting for cluster creation operation...done. Created [https://dataproc.googleapis.com/v1/projects/cc-bigdata-project-123123/regions/us-central1/clusters/cc-encrypted-dataproc-cluster] Cluster placed in zone [us-central1-c].
11 If required, migrate the source cluster data to the newly created (target) cluster.
12 Update your application(s) to reference the new Google Cloud Dataproc cluster.
13 Once the new cluster is operating successfully, you can remove the source cluster in order to stop adding charges to your Google Cloud bill. Run dataproc clusters delete command (Windows/macOS/Linux) using the name of the resource that you want to remove as identifier parameter (see Audit section part II to identify the right cluster), to delete the specified Dataproc cluster:
gcloud dataproc clusters delete cc-dataproc-prod-cluster --region=us-central1
14 Type Y to confirm the resource removal. All the cluster disks will be permanently deleted, therefore make sure that your data has been successfully exported to the new cluster before removal:
The cluster 'cc-dataproc-prod-cluster' and all attached disks will be deleted. Do you want to continue (Y/n)? Y
15 The output should return the dataproc clusters delete command request status:
Waiting for cluster deletion operation...done. Deleted [https://dataproc.googleapis.com/v1/projects/cc-bigdata-project-123123/regions/us-central1/clusters/cc-dataproc-prod-cluster].
16 Repeat steps no. 7 – 15 to enable encryption at rest with Customer-Managed Keys (CMKs) for other Google Cloud Dataproc clusters provisioned in the selected project
17 Repeat steps no. 1 – 16 for each GCP project deployed in your Google Cloud account.
References
- Google Cloud Platform (GCP) Documentation
- Cloud Key Management
- Creating symmetric encryption keys
- Cloud KMS resources
- Encryption at rest in Google Cloud
- Customer managed encryption keys (CMEK)