Use the Conformity Knowledge Base AI to help improve your Cloud Posture

Enable Inter-Container Traffic Encryption

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Medium (should be achieved)

To protect the communication between ML compute instances in a distributed training job, ensure that inter-container traffic encryption is enabled for your Amazon SageMaker training jobs.

Security

Distributed machine learning (ML) frameworks and algorithms typically transmit model-related information, such as weights, rather than the training dataset itself. During distributed training, you can further safeguard transmitted data between container instances, aiding compliance with regulatory requirements. This is achieved by using inter-container traffic encryption.


Audit

To determine if inter-container traffic encryption is enabled for your SageMaker training jobs, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Training, select Training jobs.

04 Click on the name (link) of the SageMaker training job that you want to examine, available in the Name column.

05 In the Network section, check the Enable inter-container traffic encryption attribute value to determine the Inter-container Traffic Encryption feature status for your SageMaker training job. If Enable inter-container traffic encryption is set to False, inter-container traffic encryption is not enabled for the selected Amazon SageMaker training job.

06 Repeat steps no. 4 and 5 for each Amazon SageMaker training job available within the current AWS region.

07 Change the AWS cloud region from the navigation bar to repeat the Audit process for other regions.

Using AWS CLI

01 Run list-training-jobs command (OSX/Linux/UNIX) to list the name of each Amazon SageMaker training job available in the selected AWS cloud region:

aws sagemaker list-training-jobs
  --region us-east-1
  --query 'TrainingJobSummaries[*].TrainingJobName'

02 The command output should return the requested SageMaker training job names:

[
	"cc-ml-sampler-training-job",
	"cc-ml-project5-training-job"
]

03 Run describe-training-job command (OSX/Linux/UNIX) with the name of the Amazon SageMaker training job that you want to examine as the identifier parameter and custom output filters to describe the Inter-container Traffic Encryption feature status for the selected training job:

aws sagemaker describe-training-job
  --region us-east-1
  --training-job-name cc-ml-sampler-training-job
  --query 'EnableInterContainerTrafficEncryption'

04 The command output should return the requested feature status:

false

If the describe-training-job command output returns false, as shown in the example above, inter-container traffic encryption is not enabled for the selected Amazon SageMaker training job.

05 Repeat steps no. 3 and 4 for each Amazon SageMaker training job available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat steps no. 1 – 5 to perform the Audit process for other regions.

Remediation / Resolution

To enable inter-container traffic encryption for your Amazon SageMaker training job, you have to re-create those jobs with the appropriate in-transit encryption configuration. To deploy your new SageMaker training jobs, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Training, select Training jobs.

04 Click on the name (link) of the SageMaker training job that you want to re-create (i.e. source job) and note the training job configuration information such as algorithm, IAM role, network and output settings.

05 Navigate back to the Training jobs page, choose Actions, select Clone, and perform the following operations to create your new SageMaker training job:

  1. For Job settings, provide a unique name for your new training job in the Job name box, choose the IAM role used by the source training job from the IAM role dropdown list, and verify that algorithm and resource configuration options are set up correctly (must match the source training job configuration).
  2. For Network, perform the following actions:
    1. Select Enable network isolation to enable network isolation for the new Amazon SageMaker training job. When you enable network isolation, the containers are restricted from making any outbound network calls.
    2. Select the ID of the Virtual Private Cloud (VPC) where you want to deploy your resources, from the VPC - optional dropdown list. For better security, AWS recommends using a private VPC.
    3. Once the VPC network is selected, choose the ID of the appropriate VPC subnet(s) from the Subnet(s) dropdown list.
    4. Select one or more security groups from the Security group(s) list, based on your access policy requirements.
    5. Select Enable inter-container traffic encryption to protect (encrypt) the data exchanged between containers during training.
  3. For Hyperparameters, ensure that algorithm hyperparameters are set up correctly (must match the hyperparameters used by the source training job).
  4. For Input data configuration, ensure that input data channels are properly configured (must match the input data configuration used by the source job).
  5. (Optional) For Checkpoint configuration - optional, select the appropriate location for algorithm-generated checkpoints.
  6. For Output data configuration, ensure that output data location is properly configured (must match the output data configuration used by the source training job).
  7. (Optional) For Managed spot training, choose whether to enable enable managed spot training (must match the managed spot training settings used by the source job)
  8. (Optional) For Tags - optional, create any required tag sets, according to the source training job tagging scheme.
  9. Choose Create training job to create your new, compliant Amazon SageMaker training job.

06 Repeat steps no. 4 and 5 for each SageMaker training job that you want to re-create, available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other regions.

Using AWS CLI

01 Run create-training-job command (OSX/Linux/UNIX) to create a new SageMaker training job. Include the --enable-inter-container-traffic-encryption parameter in the command request to encrypt all traffic between ML compute instances in distributed training:

aws sagemaker create-training-job
  --region us-east-1
  --training-job-name cc-new-sampler-training-job
  --algorithm-specification TrainingImage="123456789012.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1",TrainingInputMode="File"
  --role-arn "arn:aws:iam::123456789012:role/service-role/cc-sagemaker-iam-role"
  --output-data-config S3OutputPath="s3://trendmicro.com",CompressionType="GZIP",KmsKeyId="arn:aws:kms:us-east-1:123456789012:key/1234abcd-1234-abcd-1234-abcd1234abcd"
  --resource-config InstanceType="ml.m5.large",InstanceCount=1,VolumeSizeInGB=20,VolumeKmsKeyId="arn:aws:kms:us-east-1:123456789012:key/1234abcd-1234-abcd-1234-abcd1234abcd",KeepAlivePeriodInSeconds=300
  --stopping-condition MaxRuntimeInSeconds=86400
  --vpc-config SecurityGroupIds="sg-0abcd1234abcd1234",Subnets="subnet-01234abcd1234abcd"
  --enable-network-isolation
  --enable-inter-container-traffic-encryption

02 The command output should return the Amazon Resource Name (ARN) of the new SageMaker training job:

{
	"TrainingJobArn": "arn:aws:sagemaker:us-east-1:123456789012:training-job/cc-new-sampler-training-job"
}

03 Repeat steps no. 1 and 2 for each SageMaker training job that you want to re-create, available in the selected AWS region.

04 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Jun 12, 2024