Use the Conformity Knowledge Base AI to help improve your Cloud Posture

EMR Cluster Logging

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Low (generally tolerable level of risk)
Rule ID: EMR-002

Ensure that all Elastic MapReduce (EMR) cluster log files are periodically archived and uploaded to Amazon S3 in order to keep the logging data for historical purposes or to track and analyze the EMR cluster behavior for a long period of time.

This rule can help you with the following compliance standards:

  • GDPR
  • APRA
  • MAS
  • NIST4

For further details on compliance standards supported by Conformity, see here.

This rule can help you work with the AWS Well-Architected Framework.

This rule resolution is part of the Conformity Security & Compliance tool for AWS.

Operational
excellence

By default, all EMR cluster log files are automatically deleted after the retention period ends. With the logging feature enabled, Elastic MapReduce (EMR) uploads the log files from the cluster master instance(s) to Amazon S3, therefore the logging data (step logs, Hadoop logs, instance state logs, etc.) can be utilized later for troubleshooting or compliance purposes.


Audit

To determine if your Amazon EMR clusters capture log data and send it to S3, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

03 In the main navigation panel, under EMR on EC2, choose Clusters.

04 Click on the name (link) of the Amazon EMR cluster that you want to examine.

05 Select the Summary tab and check the Log URI configuration attribute value (i.e. the path to the S3 location) listed in the Configuration details section. If the Log URI attribute does not have a value, the logging feature is not enabled for the selected Amazon Elastic MapReduce (EMR) cluster.

06 Repeat steps no. 4 and 5 for each Amazon EMR cluster available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.

Using AWS CLI

01 Run list-clusters command (OSX/Linux/UNIX) with custom query filters to list the name of each active Amazon EMR cluster provisioned in the selected AWS region:

aws emr list-clusters
  --region us-east-1
  --active
  --output table
  --query 'Clusters[*].Id'

02 The command output should return a table with the requested EMR cluster ID(s):

--------------------
|   ListClusters   |
+------------------+
|  j-ABCDABCDABCD  |
|  j-ABCD1234ABCD  |
+------------------+

03 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to examine as the identifier parameter and custom query filters to describe the Amazon S3 location URI used by the selected EMR cluster for the log files storage:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-ABCDABCDABCD
  --query 'Cluster.LogUri'

04 The command output should return the S3 location (path) used by the EMR cluster logging system:

null

If the describe-cluster command output returns null, as shown in the output example above, the logging feature is not currently enabled for the selected Amazon Elastic MapReduce (EMR) cluster.

05 Repeat steps no. 3 and 4 for each Amazon EMR cluster available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.

Remediation / Resolution

To enable EMR cluster logging to Amazon S3 you must re-create the clusters with the appropriate logging configuration. To re-create (clone) your EMR clusters perform the following operations:

Using AWS CloudFormation

01 CloudFormation template (JSON):

{
	"AWSTemplateFormatVersion": "2010-09-09",
	"Description": "Enable Logging to Amazon S3",
	"Parameters" : {
		"ReleaseLabel" : {
			"Type" : "String"
		},
		"ClusterInstanceType" : {
			"Type" : "String"
		},
		"EbsRootVolumeSize" : {
			"Type" : "String"
		},
		"SubnetId" : {
			"Type" : "String"
		}
	},
	"Resources": {
		"EMRCluster": {
			"Type": "AWS::EMR::Cluster",
			"Properties": {
			"Name": "cc-prod-emr-cluster",
			"ReleaseLabel" : {"Ref" : "ReleaseLabel"},
			"Instances": {
				"MasterInstanceGroup": {
				"InstanceCount": 1,
				"InstanceType": {"Ref" : "ClusterInstanceType"},
				"Market": "ON_DEMAND",
				"Name": "cc-master-instance"
				},
				"CoreInstanceGroup": {
				"InstanceCount": 1,
				"InstanceType": {"Ref" : "ClusterInstanceType"},
				"Market": "ON_DEMAND",
				"Name": "cc-core-instance"
				},
				"TaskInstanceGroups": [
					{
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-task-instance-1"  
					},
					{
						"InstanceCount": 1,
						"InstanceType": {"Ref" : "ClusterInstanceType"},
						"Market": "ON_DEMAND",
						"Name": "cc-task-instance-2"  
					}
				],
				"Ec2SubnetId" : {"Ref" : "SubnetId"}
			},
			"EbsRootVolumeSize" : {"Ref" : "EbsRootVolumeSize"},
			"ServiceRole" : {"Ref": "EMRRole"},
			"JobFlowRole" : {"Ref": "EMREC2InstanceProfile"},
			"VisibleToAllUsers" : true,
			"LogUri" : "s3n://aws-logs-123456789012-us-east-1/emr-cluster-logs/"
			}
		},
		"EMRRole": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
						{
							"Sid": "",
							"Effect": "Allow",
							"Principal": {
								"Service": "elasticmapreduce.amazonaws.com"
							},
							"Action": "sts:AssumeRole"
						}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"]
			}
		},
		"EMREC2Role": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2008-10-17",
					"Statement": [
						{
							"Sid": "",
							"Effect": "Allow",
							"Principal": {
								"Service": "ec2.amazonaws.com"
							},
							"Action": "sts:AssumeRole"
						}
					]
				},
				"Path": "/",
				"ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"]
			}
		},
		"EMREC2InstanceProfile": {
			"Type": "AWS::IAM::InstanceProfile",
			"Properties": {
			"Path": "/",
			"Roles": [ {
				"Ref": "EMREC2Role"
			} ]
			}
		}
	}
}

02 CloudFormation template (YAML):

AWSTemplateFormatVersion: '2010-09-09'
	Description: Enable Logging to Amazon S3
	Parameters:
		ReleaseLabel:
		Type: String
		ClusterInstanceType:
		Type: String
		EbsRootVolumeSize:
		Type: String
		SubnetId:
		Type: String
	Resources:
		EMRCluster:
		Type: AWS::EMR::Cluster
		Properties:
			Name: cc-prod-emr-cluster
			ReleaseLabel: !Ref 'ReleaseLabel'
			Instances:
			MasterInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-master-instance
			CoreInstanceGroup:
				InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-core-instance
			TaskInstanceGroups:
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-1
				- InstanceCount: 1
				InstanceType: !Ref 'ClusterInstanceType'
				Market: ON_DEMAND
				Name: cc-task-instance-2
			Ec2SubnetId: !Ref 'SubnetId'
			EbsRootVolumeSize: !Ref 'EbsRootVolumeSize'
			ServiceRole: !Ref 'EMRRole'
			JobFlowRole: !Ref 'EMREC2InstanceProfile'
			VisibleToAllUsers: true
			LogUri: s3n://aws-logs-123456789012-us-east-1/emr-cluster-logs/
		EMRRole:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: elasticmapreduce.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole
		EMREC2Role:
		Type: AWS::IAM::Role
		Properties:
			AssumeRolePolicyDocument:
			Version: '2008-10-17'
			Statement:
				- Sid: ''
				Effect: Allow
				Principal:
					Service: ec2.amazonaws.com
				Action: sts:AssumeRole
			Path: /
			ManagedPolicyArns:
			- arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
		EMREC2InstanceProfile:
		Type: AWS::IAM::InstanceProfile
		Properties:
			Path: /
			Roles:
			- !Ref 'EMREC2Role'

Using Terraform (AWS Provider)

01 Terraform configuration file (.tf):

terraform {
	required_providers {
		aws = {
			source  = "hashicorp/aws"
			version = "~> 4.0"
		}
	}

	required_version = ">= 0.14.9"
}

provider "aws" {
	region  = "us-east-1"
}

resource "aws_emr_cluster" "emr-cluster" {

	name          = "cc-prod-emr-cluster"
	release_label = "emr-5.35.0"
	applications  = ["Spark"]

	master_instance_group {
	instance_type = "c4.large"
	}

	core_instance_group {
		instance_type  = "c4.large"
		instance_count = 1

		ebs_config {
			size                 = "50"
			type                 = "gp2"
			volumes_per_instance = 1
		}

	}

	ebs_root_volume_size = 50
	service_role = aws_iam_role.iam_emr_service_role.arn

	ec2_attributes {
		subnet_id                         = "subnet-01234123412341234"
		emr_managed_master_security_group = "sg-01234abcd1234abcd"
		emr_managed_slave_security_group  = "sg-0abcd1234abcd1234"
		instance_profile                  = aws_iam_instance_profile.emr_instance_profile.arn
	}

	# Enable Logging to Amazon S3
	log_uri = "s3n://aws-logs-123456789012-us-east-1/emr-cluster-logs/"

}

resource "aws_iam_role" "iam_emr_service_role" {
	name = "cc-emr-service-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
		"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
				"Service": "elasticmapreduce.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_role_policy" "iam_emr_service_policy" {
	name = "cc-emr-service-role-policy"
	role = aws_iam_role.iam_emr_service_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"ec2:AuthorizeSecurityGroupEgress",
			"ec2:AuthorizeSecurityGroupIngress",
			"ec2:CancelSpotInstanceRequests",
			"ec2:CreateNetworkInterface",
			"ec2:CreateSecurityGroup",
			"ec2:CreateTags",
			"ec2:DeleteNetworkInterface",
			"ec2:DeleteSecurityGroup",
			"ec2:DeleteTags",
			"ec2:DescribeAvailabilityZones",
			"ec2:DescribeAccountAttributes",
			"ec2:DescribeDhcpOptions",
			"ec2:DescribeInstanceStatus",
			"ec2:DescribeInstances",
			"ec2:DescribeKeyPairs",
			"ec2:DescribeNetworkAcls",
			"ec2:DescribeNetworkInterfaces",
			"ec2:DescribePrefixLists",
			"ec2:DescribeRouteTables",
			"ec2:DescribeSecurityGroups",
			"ec2:DescribeSpotInstanceRequests",
			"ec2:DescribeSpotPriceHistory",
			"ec2:DescribeSubnets",
			"ec2:DescribeVpcAttribute",
			"ec2:DescribeVpcEndpoints",
			"ec2:DescribeVpcEndpointServices",
			"ec2:DescribeVpcs",
			"ec2:DetachNetworkInterface",
			"ec2:ModifyImageAttribute",
			"ec2:ModifyInstanceAttribute",
			"ec2:RequestSpotInstances",
			"ec2:RevokeSecurityGroupEgress",
			"ec2:RunInstances",
			"ec2:TerminateInstances",
			"ec2:DeleteVolume",
			"ec2:DescribeVolumeStatus",
			"ec2:DescribeVolumes",
			"ec2:DetachVolume",
			"iam:GetRole",
			"iam:GetRolePolicy",
			"iam:ListInstanceProfiles",
			"iam:ListRolePolicies",
			"iam:PassRole",
			"s3:CreateBucket",
			"s3:Get*",
			"s3:List*",
			"sdb:BatchPutAttributes",
			"sdb:Select",
			"sqs:CreateQueue",
			"sqs:Delete*",
			"sqs:GetQueue*",
			"sqs:PurgeQueue",
			"sqs:ReceiveMessage"
		]
	}]
}
EOF
}

resource "aws_iam_role" "iam_emr_profile_role" {
	name = "emr-instance-profile-role"

	assume_role_policy = <<EOF
{
	"Version": "2008-10-17",
	"Statement": [
		{
			"Sid": "",
			"Effect": "Allow",
			"Principal": {
				"Service": "ec2.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
EOF
}

resource "aws_iam_instance_profile" "emr_instance_profile" {
	name = "emr-instance-profile"
	role = aws_iam_role.iam_emr_profile_role.name
}

resource "aws_iam_role_policy" "iam_emr_profile_policy" {
	name = "emr-instance-profile-policy"
	role = aws_iam_role.iam_emr_profile_role.id

	policy = <<EOF
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Resource": "*",
		"Action": [
			"cloudwatch:*",
			"dynamodb:*",
			"ec2:Describe*",
			"elasticmapreduce:Describe*",
			"elasticmapreduce:ListBootstrapActions",
			"elasticmapreduce:ListClusters",
			"elasticmapreduce:ListInstanceGroups",
			"elasticmapreduce:ListInstances",
			"elasticmapreduce:ListSteps",
			"kinesis:CreateStream",
			"kinesis:DeleteStream",
			"kinesis:DescribeStream",
			"kinesis:GetRecords",
			"kinesis:GetShardIterator",
			"kinesis:MergeShards",
			"kinesis:PutRecord",
			"kinesis:SplitShard",
			"rds:Describe*",
			"s3:*",
			"sdb:*",
			"sns:*",
			"sqs:*"
		]
	}]
}
EOF
}

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.

03 In the main navigation panel, under EMR on EC2, choose Clusters.

04 Select the Amazon EMR cluster that you want to re-create and choose Clone from the console top menu.

05 In the Cloning <emr-cluster-id> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Choose Clone to start the cloning process.

06 On the Create Cluster - Advanced Options page, perform the following operations:

  1. Choose Step 1: Software and Steps from the left navigation panel and configure the software stack that will be installed on the new cluster. Choose Next to continue the setup process.
  2. For Step 2: Hardware, choose the VPC network and subnet where the EMR cluster instances will be deployed from the Networking section, set the EBS volume size for the root device, and configure the cluster nodes (instances) as needed. Choose Next to continue.
  3. For Step 3: General Cluster Settings, select the Logging checkbox to enable the cluster logging feature. Once the checkbox is selected, the console will display the default S3 location (path) where the cluster log files will be saved automatically. You can also choose to encrypt the log files stored in Amazon S3 with a KMS key and/or enable the console debugging functionality. Select Termination protection to enable the Termination Protection safety feature, and create any necessary tag sets. Choose Next to continue.
  4. For Step 4: Security, make sure that the right permissions are applied to the new cluster, select the EC2 key pair, configure the security options, then choose Create cluster to provision your new Amazon EMR cluster.

07 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for that EMR resource. To terminate the source Amazon EMR cluster, perform the following actions:

  1. Select the EMR cluster that you want to shut down and choose Terminate from the console top menu.
  2. Choose the Terminate button from the console top menu.
  3. Within the Terminate clusters confirmation box, review the cluster details, set the Termination protection to Off, then choose Terminate to remove the source EMR cluster from your AWS account.

08 Repeat steps no. 4 – 7 for each Amazon EMR cluster that you want to redeploy, available within the current AWS region.

09 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other AWS regions.

Using AWS CLI

01 Get the configuration details from the source (original) EMR cluster. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to re-create as the identifier parameter, to list the configuration information available for the selected cluster:

aws emr describe-cluster
  --region us-east-1
  --cluster-id j-AAAABBBBCCCCD

02 The command output should return the requested cluster configuration information:

{
   "Cluster": {
     "Name": "cc-hadoop-cluster",
     "ServiceRole": "EMR_DefaultRole",
     "Tags": [],
     "TerminationProtected": false,
     "NormalizedInstanceHours": 4,

     ...

     "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR",
     "VisibleToAllUsers": true,
     "BootstrapActions": [],
     "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/",
     "AutoTerminate": false,
     "Id": "j-AAAABBBBCCCCD"
   }
}

03 Run create-cluster command (OSX/Linux/UNIX) to re-create the existing Amazon EMR cluster with the logging feature enabled using the configuration information returned at the previous step. The following command example creates an EMR cluster with one c4.xlarge type master instance and two c4.xlarge type core instances, named "cc-emr-prod-cluster", that sends log data to Amazon S3 at "s3n://aws-logs-123456789012-us-east-1/emr-cluster-logs/":

aws emr create-cluster
  --region us-east-1
  --name cc-vpc-emr-cluster
  --release-label emr-4.0.0
  --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c4.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=c4.xlarge
  --service-role EMR_DefaultRole
  --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-0abcd1234abcd1234,EmrManagedSlaveSecurityGroup=sg-01234abcd1234abcd,AvailabilityZone=us-east-1a,SubnetId=subnet-0abcd1234abcd1234
  --log-uri s3n://aws-logs-123456789012-us-east-1/emr-cluster-logs/
  --visible-to-all-users
  --no-auto-terminate
  --no-termination-protected

04 The command output should return the ID of your new Amazon EMR cluster:

{
  "ClusterId": "j-BBBBCCCCDDDDE"
}

05 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for it. To terminate the source Amazon EMR cluster, run terminate-clusters command (OSX/Linux/UNIX) using the ID of the cluster that you want to delete as the identifier parameter (the command does not produce an output):

aws emr terminate-clusters
  --region us-east-1
  --cluster-ids j-AAAABBBBCCCCD

06 Repeat steps no. 1 – 5 for each Amazon EMR cluster that you want to redeploy, available in the selected AWS region.

07 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Feb 24, 2017