- Knowledge Base
- Amazon Web Services
- Amazon EMR
- AWS EMR Instance Type Generation
Ensure that all the Amazon EMR cluster instances are using the latest generation of instance types in order to get the best performance with lower costs. If you are using cluster instances from the previous generation, Trend Cloud One™ – Conformity strongly recommends that you upgrade your instances with their latest generation equivalents.
This rule can help you work with the AWS Well-Architected Framework.
This rule resolution is part of the Conformity Security & Compliance tool for AWS.
efficiency
optimisation
Using the latest generation of Amazon EMR cluster instances instead of the previous generation instances has tangible benefits such as better hardware performance (more computing capacity and faster CPUs, memory optimization and higher network throughput), and lower costs for memory and storage. For example, the new generation memory-optimized (R3) instances are 9% faster than the previous ones and the compute-optimized (C3 and C4) instances are 37% faster than the old generation (C1) instances.
Audit
To determine if your Amazon EMR clusters are using instances from the previous generation, perform the following actions:
Using AWS Console
01 Sign in to the AWS Management Console.
02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.
03 In the main navigation panel, under EMR on EC2, choose Clusters.
04 Click on the name (link) of the Amazon EMR cluster that you want to examine.
05 Select the Hardware tab and check the instance type for each instance provisioned within the selected cluster, listed in the Instance type column, to determine if the instance type is from the previous generation. If the instance type is from the previous generation, the instance type configured for the Amazon EMR cluster instances should be upgraded to the latest generation.
06 Repeat steps no. 4 and 5 for each Amazon EMR cluster available within the current AWS region.
07 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.
Using AWS CLI
01 Run list-clusters command (OSX/Linux/UNIX) with custom query filters to list the name of each active Amazon EMR cluster provisioned in the selected AWS region:
aws emr list-clusters --region us-east-1 --active --output table --query 'Clusters[*].Id'
02 The command output should return a table with the requested EMR cluster ID(s):
-------------------- | ListClusters | +------------------+ | j-ABCDABCDABCD | | j-ABCD1234ABCD | +------------------+
03 Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to examine as the identifier parameter and custom query filters to describe the Amazon S3 location URI used by the selected EMR cluster for the log files storage:
aws emr describe-cluster --region us-east-1 --cluster-id j-ABCDABCDABCD --query 'Cluster.InstanceGroups[*].InstanceType'
04 The command output should return the EMR cluster instance type(s):
[ "m1.xlarge" ]
Compare the instance type returned by the describe-cluster command output with the instance type(s) from the previous generation. If the instance type is from the previous generation, the instance type configured for the Amazon EMR cluster instances should be upgraded to the latest generation.
05 Repeat steps no. 3 and 4 for each Amazon EMR cluster available in the selected AWS region.
06 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.
Remediation / Resolution
To upgrade your previous generation EMR cluster instances to their latest generation equivalents, perform the following actions:
Using AWS CloudFormation
01 CloudFormation template (JSON):
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Upgrade Cluster Instance Generation by Setting the Latest Generation Instance Type Equivalent for 'ClusterInstanceType' Stack Parameter", "Parameters" : { "ReleaseLabel" : { "Type" : "String" }, "ClusterInstanceType" : { "Type" : "String" }, "EbsRootVolumeSize" : { "Type" : "String" }, "SubnetId" : { "Type" : "String" } }, "Resources": { "EMRCluster": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": "cc-emr-production-cluster", "ReleaseLabel" : {"Ref" : "ReleaseLabel"}, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": {"Ref" : "ClusterInstanceType"}, "Market": "ON_DEMAND", "Name": "cc-master-instance" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": {"Ref" : "ClusterInstanceType"}, "Market": "ON_DEMAND", "Name": "cc-core-instance" }, "TaskInstanceGroups": [ { "InstanceCount": 1, "InstanceType": {"Ref" : "ClusterInstanceType"}, "Market": "ON_DEMAND", "Name": "cc-task-instance-1" }, { "InstanceCount": 1, "InstanceType": {"Ref" : "ClusterInstanceType"}, "Market": "ON_DEMAND", "Name": "cc-task-instance-2" } ], "Ec2SubnetId" : {"Ref" : "SubnetId"} }, "EbsRootVolumeSize" : {"Ref" : "EbsRootVolumeSize"}, "ServiceRole" : {"Ref": "EMRRole"}, "JobFlowRole" : {"Ref": "EMREC2InstanceProfile"}, "VisibleToAllUsers" : true } }, "EMRRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "elasticmapreduce.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "Path": "/", "ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"] } }, "EMREC2Role": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "Path": "/", "ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"] } }, "EMREC2InstanceProfile": { "Type": "AWS::IAM::InstanceProfile", "Properties": { "Path": "/", "Roles": [ { "Ref": "EMREC2Role" } ] } } } }
02 CloudFormation template (YAML):
AWSTemplateFormatVersion: '2010-09-09' Description: Upgrade Cluster Instance Generation by Setting the Latest Generation Instance Type Equivalent for 'ClusterInstanceType' Stack Parameter Parameters: ReleaseLabel: Type: String ClusterInstanceType: Type: String EbsRootVolumeSize: Type: String SubnetId: Type: String Resources: EMRCluster: Type: AWS::EMR::Cluster Properties: Name: cc-emr-production-cluster ReleaseLabel: !Ref 'ReleaseLabel' Instances: MasterInstanceGroup: InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-master-instance CoreInstanceGroup: InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-core-instance TaskInstanceGroups: - InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-task-instance-1 - InstanceCount: 1 InstanceType: !Ref 'ClusterInstanceType' Market: ON_DEMAND Name: cc-task-instance-2 Ec2SubnetId: !Ref 'SubnetId' EbsRootVolumeSize: !Ref 'EbsRootVolumeSize' ServiceRole: !Ref 'EMRRole' JobFlowRole: !Ref 'EMREC2InstanceProfile' VisibleToAllUsers: true EMRRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2008-10-17' Statement: - Sid: '' Effect: Allow Principal: Service: elasticmapreduce.amazonaws.com Action: sts:AssumeRole Path: / ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole EMREC2Role: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2008-10-17' Statement: - Sid: '' Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole Path: / ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role EMREC2InstanceProfile: Type: AWS::IAM::InstanceProfile Properties: Path: / Roles: - !Ref 'EMREC2Role'
Using Terraform (AWS Provider)
01 Terraform configuration file (.tf):
terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 4.0" } } required_version = ">= 0.14.9" } provider "aws" { region = "us-east-1" } resource "aws_emr_cluster" "emr-cluster" { name = "cc-prod-emr-cluster" release_label = "emr-5.35.0" applications = ["Spark"] master_instance_group { # Upgrade Master Instance Generation instance_type = "m5.xlarge" } core_instance_group { # Upgrade Core Instance Generation instance_type = "m5.xlarge" instance_count = 1 ebs_config { size = "50" type = "gp2" volumes_per_instance = 1 } } ebs_root_volume_size = 50 service_role = aws_iam_role.iam_emr_service_role.arn ec2_attributes { subnet_id = "subnet-01234123412341234" emr_managed_master_security_group = "sg-01234abcd1234abcd" emr_managed_slave_security_group = "sg-0abcd1234abcd1234" instance_profile = aws_iam_instance_profile.emr_instance_profile.arn } } resource "aws_iam_role" "iam_emr_service_role" { name = "cc-emr-service-role" assume_role_policy = <<EOF { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "elasticmapreduce.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF } resource "aws_iam_role_policy" "iam_emr_service_policy" { name = "cc-emr-service-role-policy" role = aws_iam_role.iam_emr_service_role.id policy = <<EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Resource": "*", "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CancelSpotInstanceRequests", "ec2:CreateNetworkInterface", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:DeleteNetworkInterface", "ec2:DeleteSecurityGroup", "ec2:DeleteTags", "ec2:DescribeAvailabilityZones", "ec2:DescribeAccountAttributes", "ec2:DescribeDhcpOptions", "ec2:DescribeInstanceStatus", "ec2:DescribeInstances", "ec2:DescribeKeyPairs", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaces", "ec2:DescribePrefixLists", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSpotInstanceRequests", "ec2:DescribeSpotPriceHistory", "ec2:DescribeSubnets", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcs", "ec2:DetachNetworkInterface", "ec2:ModifyImageAttribute", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RevokeSecurityGroupEgress", "ec2:RunInstances", "ec2:TerminateInstances", "ec2:DeleteVolume", "ec2:DescribeVolumeStatus", "ec2:DescribeVolumes", "ec2:DetachVolume", "iam:GetRole", "iam:GetRolePolicy", "iam:ListInstanceProfiles", "iam:ListRolePolicies", "iam:PassRole", "s3:CreateBucket", "s3:Get*", "s3:List*", "sdb:BatchPutAttributes", "sdb:Select", "sqs:CreateQueue", "sqs:Delete*", "sqs:GetQueue*", "sqs:PurgeQueue", "sqs:ReceiveMessage" ] }] } EOF } resource "aws_iam_role" "iam_emr_profile_role" { name = "emr-instance-profile-role" assume_role_policy = <<EOF { "Version": "2008-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF } resource "aws_iam_instance_profile" "emr_instance_profile" { name = "emr-instance-profile" role = aws_iam_role.iam_emr_profile_role.name } resource "aws_iam_role_policy" "iam_emr_profile_policy" { name = "emr-instance-profile-policy" role = aws_iam_role.iam_emr_profile_role.id policy = <<EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Resource": "*", "Action": [ "cloudwatch:*", "dynamodb:*", "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps", "kinesis:CreateStream", "kinesis:DeleteStream", "kinesis:DescribeStream", "kinesis:GetRecords", "kinesis:GetShardIterator", "kinesis:MergeShards", "kinesis:PutRecord", "kinesis:SplitShard", "rds:Describe*", "s3:*", "sdb:*", "sns:*", "sqs:*" ] }] } EOF }
Using AWS Console
01 Sign in to the AWS Management Console.
02 Navigate to Amazon Elastic MapReduce (EMR) console at https://console.aws.amazon.com/elasticmapreduce/.
03 In the main navigation panel, under EMR on EC2, choose Clusters.
04 Select the Amazon EMR cluster that you want to re-create and choose Clone from the console top menu.
05 In the Cloning <emr-cluster-id> dialog box, choose Yes to include the steps from the original cluster in the cloned cluster or No to clone the original cluster's configuration without including any of the existing steps. Choose Clone to start the cloning process.
06 On the Create Cluster - Advanced Options page, perform the following operations:
- Choose Step 1: Software and Steps from the left navigation panel and configure the software stack that will be installed on the new cluster. Choose Next to continue the setup process.
- For Step 2: Hardware, select the equivalent latest generation instance type for each provisioned instance listed in the Cluster Nodes and Instances section, regardless of the instance node type (i.e. master, core, or task). Choose the VPC network and subnet where the EMR cluster instances will be deployed from the Networking section, and set the EBS volume size for the root device from the EBS Root Volume section. Choose Next to continue.
- For Step 3: General Cluster Settings, choose whether to enable the Termination Protection safety feature, configure the cluster logging, and create any required tag sets. Choose Next to continue.
- For Step 4: Security, make sure that the right permissions are applied to the new cluster, select the appropriate EC2 key pair, configure the security options, then choose Create cluster to provision your new Amazon EMR cluster.
07 (Optional) You can now terminate the source (original) cluster in order to stop incurring charges for that EMR resource. To terminate the source Amazon EMR cluster, perform the following actions:
- Select the EMR cluster that you want to shut down and choose Terminate from the console top menu.
- Choose the Terminate button from the console top menu.
- Within the Terminate clusters confirmation box, review the cluster details, set the Termination protection to Off, then choose Terminate to remove the source EMR cluster from your AWS account.
08 Repeat steps no. 4 – 7 for each Amazon EMR cluster that you want to redeploy, available within the current AWS region.
09 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other AWS regions.
Using AWS CLI
01 Get the configuration details from the source (original) EMR cluster. Run describe-cluster command (OSX/Linux/UNIX) using the ID of the Amazon EMR cluster that you want to re-create as the identifier parameter, to list the configuration information available for the selected cluster:
aws emr describe-cluster --region us-east-1 --cluster-id j-AAAABBBBCCCCD
02 The command output should return the requested cluster configuration information:
{ "Cluster": { "Name": "cc-hadoop-cluster", "ServiceRole": "EMR_DefaultRole", "Tags": [], "TerminationProtected": false, "NormalizedInstanceHours": 4, ... "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR", "VisibleToAllUsers": true, "BootstrapActions": [], "LogUri": "s3n://aws-logs-123456789012-us-east-1/elasticmapreduce/", "AutoTerminate": false, "Id": "j-AAAABBBBCCCCD" } }
03 Run create-cluster command (OSX/Linux/UNIX) to re-create your Amazon EMR cluster with instances configured with the equivalent instance type(s) from the current generation. The following command example creates an EMR cluster with one m5.xlarge-type master instance and two m5.xlarge-type core instances, named "cc-emr-production-cluster":
aws emr create-cluster --region us-east-1 --name cc-emr-production-cluster --release-label emr-4.0.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge --service-role EMR_DefaultRole --ec2-attributes KeyName=SSHAccessKey,InstanceProfile=EMR_EC2_DefaultRole,EmrManagedMasterSecurityGroup=sg-0abcd1234abcd1234,EmrManagedSlaveSecurityGroup=sg-01234abcd1234abcd,AvailabilityZone=us-east-1a,SubnetId=subnet-0abcd1234abcd1234 --visible-to-all-users --no-auto-terminate
04 The command output should return the ID of your new Amazon EMR cluster:
{ "ClusterId": "j-BBBBCCCCDDDDE" }
05 (Optional) You can now terminate the source cluster in order to stop incurring charges for it. To terminate the source Amazon EMR cluster, run terminate-clusters command (OSX/Linux/UNIX) using the ID of the cluster that you want to delete as the identifier parameter (the command does not produce an output):
aws emr terminate-clusters --region us-east-1 --cluster-ids j-AAAABBBBCCCCD
06 Repeat steps no. 1 – 5 for each Amazon EMR cluster that you want to redeploy, available in the selected AWS region.
07 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.
References
- AWS Documentation
- Amazon EMR Pricing
- Configure Amazon EC2 instances
- Cloning a cluster using the console
- AWS Command Line Interface (CLI) Documentation
- emr
- list-clusters
- describe-cluster
- create-cluster
- terminate-clusters
- CloudFormation Documentation
- Amazon EMR resource type reference
- Terraform Documentation
- AWS Provider