Use the Conformity Knowledge Base AI to help improve your Cloud Posture

Enable Network Isolation for SageMaker Training Jobs

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Medium (should be achieved)

Ensure that network isolation is enabled for your Amazon SageMaker training jobs in order to prevent external network access to your training or inference containers. Network isolation restricts SageMaker training jobs from making outbound connections, even to other AWS cloud services such as Amazon S3. This enhances security by preventing unauthorized access and potential data leaks. Network isolation is mandatory for AWS ML Marketplace products and can be enabled for additional security on your own training jobs.

Security

By default, SageMaker training inference containers have Internet access, enabling them to interact with external services and resources on the public Internet during training and inference tasks. However, this approach could potentially expose your data to unauthorized access. For instance, malicious users or code from publicly available source code libraries that you inadvertently install on the container could exploit this access to retrieve and transfer your data to a remote host. Enabling network isolation for SageMaker training jobs can shield them from unapproved access.


Audit

To determine the Network Isolation feature status for your Amazon SageMaker training jobs, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Training, select Training jobs.

04 Click on the name (link) of the SageMaker training job that you want to examine, available in the Name column.

05 In the Network section, check the Enable network isolation attribute value to determine the Network Isolation feature status for your SageMaker training job. If Enable network isolation is set to False, the Network Isolation feature is not enabled for the selected SageMaker training job.

06 Repeat steps no. 4 and 5 for each Amazon SageMaker training job available within the current AWS region.

07 Change the AWS cloud region from the navigation bar to repeat the Audit process for other regions.

Using AWS CLI

01 Run list-training-jobs command (OSX/Linux/UNIX) to list the name of each Amazon SageMaker training job available in the selected AWS cloud region:

aws sagemaker list-training-jobs
  --region us-east-1
  --query 'TrainingJobSummaries[*].TrainingJobName'

02 The command output should return the requested SageMaker training job names:

[
	"cc-ml-sampler-training-job",
	"cc-ml-project5-training-job"
]

03 Run describe-training-job command (OSX/Linux/UNIX) with the name of the Amazon SageMaker training job that you want to examine as the identifier parameter and custom output filters to describe the Network Isolation feature status for the selected training job:

aws sagemaker describe-training-job
  --region us-east-1
  --training-job-name cc-ml-sampler-training-job
  --query 'EnableNetworkIsolation'

04 The command output should return the requested feature status:

false

If the describe-training-job command output returns false, as shown in the example above, the Network Isolation feature is not enabled for the selected SageMaker training job.

05 Repeat steps no. 3 and 4 for each Amazon SageMaker training job available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat steps no. 1 – 5 to perform the Audit process for other regions.

Remediation / Resolution

To enable network isolation for your Amazon SageMaker training job, you have to re-create those jobs with the appropriate network configuration. To deploy your new SageMaker training jobs, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Training, select Training jobs.

04 Click on the name (link) of the SageMaker training job that you want to re-create (i.e. source job) and note the training job configuration information such as algorithm, IAM role, network and output settings.

05 Navigate back to the Training jobs page, choose Actions, select Clone, and perform the following operations to create your new SageMaker training job:

  1. For Job settings, provide a unique name for your new training job in the Job name box, choose the IAM role used by the source training job from the IAM role dropdown list, and verify that algorithm and resource configuration options are set up correctly (must match the source training job configuration).
  2. For Network, perform the following actions:
    1. Select Enable network isolation to enable network isolation for the new Amazon SageMaker training job. When you enable network isolation, the containers are restricted from making any outbound network calls, including those to other AWS services such as Amazon S3.
    2. Select the ID of the Virtual Private Cloud (VPC) where you want to deploy your resources, from the VPC - optional dropdown list. For better security, AWS recommends using a private VPC.
    3. Once the VPC network is selected, choose the ID of the appropriate VPC subnet(s) from the Subnet(s) dropdown list.
    4. Select one or more security groups from the Security group(s) list, based on your access policy requirements.
    5. Select Enable inter-container traffic encryption to protect the communication between ML compute instances in a distributed training job.
  3. For Hyperparameters, ensure that algorithm hyperparameters are set up correctly (must match the hyperparameters used by the source training job).
  4. For Input data configuration, ensure that input data channels are properly configured (must match the input data configuration used by the source job).
  5. (Optional) For Checkpoint configuration - optional, select the appropriate location for algorithm-generated checkpoints.
  6. For Output data configuration, ensure that output data location is properly configured (must match the output data configuration used by the source training job).
  7. (Optional) For Managed spot training, choose whether to enable enable managed spot training (must match the managed spot training settings used by the source job)
  8. (Optional) For Tags - optional, create any required tag sets, according to the source training job tagging scheme.
  9. Choose Create training job to create your new, compliant Amazon SageMaker training job.

06 Repeat steps no. 4 and 5 for each SageMaker training job that you want to re-create, available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other regions.

Using AWS CLI

01 Run create-training-job command (OSX/Linux/UNIX) to create a new SageMaker training job. Include the --enable-network-isolation parameter in the command request to enable the Network Isolation feature for your new training job. When Network Isolation is enabled, the containers are restricted from making any outbound network calls, including those to other AWS cloud services:

aws sagemaker create-training-job
  --region us-east-1
  --training-job-name cc-new-sampler-training-job
  --algorithm-specification TrainingImage="123456789012.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.3-1",TrainingInputMode="File"
  --role-arn "arn:aws:iam::123456789012:role/service-role/cc-sagemaker-iam-role"
  --output-data-config S3OutputPath="s3://trendmicro.com",CompressionType="GZIP",KmsKeyId="arn:aws:kms:us-east-1:123456789012:key/1234abcd-1234-abcd-1234-abcd1234abcd"
  --resource-config InstanceType="ml.m5.large",InstanceCount=1,VolumeSizeInGB=20,VolumeKmsKeyId="arn:aws:kms:us-east-1:123456789012:key/1234abcd-1234-abcd-1234-abcd1234abcd",KeepAlivePeriodInSeconds=300
  --stopping-condition MaxRuntimeInSeconds=86400
  --vpc-config SecurityGroupIds="sg-0abcd1234abcd1234",Subnets="subnet-01234abcd1234abcd"
  --enable-network-isolation

02 The command output should return the Amazon Resource Name (ARN) of the new SageMaker training job:

{
	"TrainingJobArn": "arn:aws:sagemaker:us-east-1:123456789012:training-job/cc-new-sampler-training-job"
}

03 Repeat steps no. 1 and 2 for each SageMaker training job that you want to re-create, available in the selected AWS region.

04 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Jun 12, 2024