Use the Conformity Knowledge Base AI to help improve your Cloud Posture

Enable Data Capture for SageMaker Endpoints

Trend Cloud One™ – Conformity is a continuous assurance tool that provides peace of mind for your cloud infrastructure, delivering over 1000 automated best practice checks.

Risk Level: Medium (should be achieved)

Ensure that the Data Capture feature is enabled for your SageMaker endpoints in order to allow Amazon SageMaker to store prediction request and response data from your endpoints at a designated location.

Security

In Amazon SageMaker, Data Capture enables the collection of input and output data from ML models deployed in production. This feature helps in monitoring and analyzing model performance by capturing real-time inference data, facilitating debugging, auditing, and compliance. Enabling Data Capture allows for the detection of data drift, ensuring model accuracy over time, and supports retraining models with real-world data, ultimately enhancing the robustness and reliability of machine learning deployments.


Audit

To determine the Data Capture feature status for your Amazon SageMaker endpoints, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Inference, select Endpoints.

04 Click on the name (link) of the SageMaker endpoint that you want to examine, available in the Name column.

05 Select the Settings tab to access the configuration settings available for the selected endpoint.

06 In the Data capture settings section, check the Enable data capture attribute value to determine the Data Capture feature status for your SageMaker endpoint. If Enable data capture is set to No, the Data Capture feature is not enabled for the selected SageMaker endpoint.

07 Repeat steps no. 4 - 6 for each Amazon SageMaker endpoint available within the current AWS region.

08 Change the AWS cloud region from the navigation bar to repeat the Audit process for other regions.

Using AWS CLI

01 Run list-endpoints command (OSX/Linux/UNIX) to list the name of each Amazon SageMaker endpoint available in the selected AWS region:

aws sagemaker list-endpoints
  --region us-east-1
  --query 'Endpoints[*].EndpointName'

02 The command output should return the requested SageMaker endpoint names:

[
	"cc-ml-sagemaker-endpoint",
	"cc-ml-production-endpoint"
]

03 Run describe-endpoint command (OSX/Linux/UNIX) with the name of the Amazon SageMaker endpoint that you want to examine as the identifier parameter and custom output filters to describe the name of the associated endpoint configuration:

aws sagemaker describe-endpoint
  --region us-east-1
  --endpoint-name cc-ml-sagemaker-endpoint
  --query 'EndpointConfigName'

04 The command output should return the requested endpoint configuration name:

"cc-ml-endpoint-config"

05 Run describe-endpoint-config command (OSX/Linux/UNIX) to describe the Data Capture feature configuration available for the selected Amazon SageMaker endpoint:

aws sagemaker describe-endpoint-config
  --region us-east-1
  --endpoint-config-name cc-ml-sagemaker-endpoint-config
  --query 'DataCaptureConfig'

06 The command output should return the requested configuration information:

null

If the describe-endpoint-config command output returns null, as shown in the example above, there is no configuration information available for Data Capture, therefore, the Data Capture feature is not enabled for the selected SageMaker endpoint.

07 Repeat steps no. 3 - 6 for each Amazon SageMaker endpoint available in the selected AWS region.

08 Change the AWS cloud region by updating the --region command parameter value and repeat steps no. 1 – 7 to perform the Audit process for other regions.

Remediation / Resolution

To ensure that Data Capture is enabled for your SageMaker endpoints, perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SageMaker console available at https://console.aws.amazon.com/sagemaker/.

03 In the main navigation panel, under Inference, select Endpoints.

04 Select the SageMaker endpoint that you want to configure and choose Update endpoint.

05 Choose Create a new endpoint configuration from the Change the Endpoint configuration section to re-create the endpoint configuration with the appropriate settings.

06 In the New endpoint configuration section, perform the following operations:

  1. For Endpoint configuration name, provide a unique name for your new endpoint configuration.
  2. For Type of endpoint, select the correct endpoint type (must match the endpoint type of the source, non-compliant endpoint configuration).
  3. For Encryption key - optional, select the name (alias) of the Amazon KMS Customer Managed Key (CMK) that you want to use for data encryption.
  4. (Optional) For Async Invocation Config - optional, configure the necessary Async Invocation settings (must match the source endpoint configuration settings).
  5. For Data capture - optional, select Enable data capture to enable the Data Capture feature. Ensure that Prediction request and Prediction response are selected under Data capture options, and provide the S3 location for the collected data in the S3 location to store data collected box. By enabling Data Capture, Amazon SageMaker can save prediction request and prediction response data from your endpoint to a dedicated location.
  6. For Variants, specify the model that you want to host and the resources chosen to deploy for hosting it (must match the source endpoint configuration settings).
  7. Choose Create endpoint configuration to create your new, compliant endpoint configuration.

07 Choose Update endpoint to apply the new endpoint configuration.

08 Repeat steps no. 4 – 7 for each SageMaker endpoint that you want to configure, available within the current AWS region.

09 Change the AWS cloud region from the navigation bar and repeat the Remediation process for other regions.

Using AWS CLI

01 Run create-endpoint-config command (OSX/Linux/UNIX) to create a new endpoint configuration for your SageMaker endpoint. Use the --data-capture-config command parameter to enable and configure the Data Capture feature for the specified endpoint:

aws sagemaker create-endpoint-config
  --region us-east-1
  --endpoint-config-name cc-ml-new-endpoint-config
  --production-variants VariantName="cc-prod-variant",ModelName="cc-ml-model",InitialInstanceCount=1,InstanceType="ml.m4.xlarge",InitialVariantWeight=1.0
  --data-capture-config EnableCapture=true,InitialSamplingPercentage=30,DestinationS3Uri="s3://cc-data-capture-bucket/",CaptureOptions=[{CaptureMode="InputAndOutput"}]

02 The command output should return the Amazon Resource Name (ARN) of the new endpoint configuration:

{
	"EndpointConfigArn": "arn:aws:sagemaker:us-east-1:123456789012:endpoint-config/cc-ml-new-endpoint-config"
}

03 Run update-endpoint command (OSX/Linux/UNIX) to apply the SageMaker endpoint configuration created at the previous steps:

aws sagemaker update-endpoint
  --region us-east-1
  --endpoint-name cc-ml-sagemaker-endpoint
  --endpoint-config-name cc-ml-new-endpoint-config
  --retain-all-variant-properties

04 The command output should return the ARN of the updated SageMaker endpoint:

{
	"EndpointArn": "arn:aws:sagemaker:us-east-1:123456789012:endpoint/cc-ml-sagemaker-endpoint"
}

05 Repeat steps no. 1 – 4 for each SageMaker endpoint that you want to configure, available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat the Remediation process for other regions.

References

Publication date Jun 12, 2024