Cluster Status

Risk Level: High (not acceptable risk)

Rule ID: ES-021

Ensure that your Amazon OpenSearch clusters (domains) are healthy, i.e. they all have shard allocation status set to "Green". When an Amazon OpenSearch domain is unhealthy, the shard allocation status is set to "Red", which means that at least one primary shard and its replicas are not allocated to a node. The most common cause of an OpenSearch cluster with the status set to "Red" is the one with the failed cluster nodes (or when some process crashes due to a continuous heavy processing load). To get notified when your Amazon OpenSearch clusters become unhealthy and implement a plan to recover them, Trend Cloud One™ – Conformity recommends creating CloudWatch alarms that get triggered whenever your OpenSearch clusters health status becomes "Red" for longer than one minute.

The Amazon CloudWatch metric used to detect unhealthy OpenSearch clusters (Red) is:

ClusterStatus.red – which indicates that the primary and replica shards of at least one index are not allocated to nodes within an OpenSearch cluster. Relevant statistic: Maximum. Units: Count.

This rule can help you with the following compliance standards:

NIST4

For further details on compliance standards supported by Conformity, see here.

This rule can help you work with the AWS Well-Architected Framework.

This rule resolution is part of the Conformity Security & Compliance tool for AWS.

Performance
efficiency

Detecting unhealthy Amazon OpenSearch clusters with the status set to "Red" is imperative for your OpenSearch application availability. Also, the OpenSearch service stops taking automatic snapshots while the cluster status is set to "Red" and when this status persists for more than 16 days, permanent data loss can occur.

Audit

To identify unhealthy Amazon OpenSearch domains (clusters), perform the following operations:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon OpenSearch console at https://console.aws.amazon.com/esv3/.

03 In the main navigation panel, under Dashboard, select Domains.

04 Click on the name (link) of the OpenSearch domain that you want to examine.

05 In the General information section, check the Cluster health attribute value to determine the cluster (domain) health. If the attribute value is set to Red, the selected Amazon OpenSearch cluster is unhealthy, therefore you must take action in order to recover the selected OpenSearch cluster.

06 Repeat steps no. 4 and 5 for each Amazon OpenSearch cluster available within the current AWS region.

07 Change the AWS cloud region from the navigation bar and repeat the Audit process for other regions.

Using AWS CLI

01 Run list-domain-names command (OSX/Linux/UNIX) to list the name of each Amazon OpenSearch domain available in the selected AWS region:

aws es list-domain-names
  --region us-east-1
  --query 'DomainNames[*].DomainName'

02 The command output should return the identifier (name) of each OpenSearch domain provisioned in the selected region:

[
	"trendmicro",
	"cloudconformity"
]

03 Run get-metric-statistics command (OSX/Linux/UNIX) to obtain the statistics recorded by Amazon CloudWatch for the ClusterStatus.red metric, which indicates that the primary and replica shards of at least one index are not allocated the selected OpenSearch cluster nodes. Change the--start-time (start recording date) and --end-time (stop recording date) parameters value to choose your own time frame for recording the ClusterStatus.red metric usage. Set the --periodparameter value to define the granularity (in seconds) of the returned datapoints, based on your requirements. A period can be as short as 1 minute (60 seconds) or as long as 1 day (86400 seconds). The following command example can return positive values if the selected Amazon OpenSearch cluster, identified by the name "trendmicro", has the shard allocation status set to Red:

aws cloudwatch get-metric-statistics
  --region us-east-1
  --metric-name ClusterStatus.red
  --start-time 2018-12-16T17:03:10Z
  --end-time 2018-12-17T17:03:10Z
  --period 3600
  --namespace AWS/ES
  --statistics Maximum
  --dimensions Name=DomainName,Value=trendmicro

04 The command output should return the **ClusterStatus.red** metric details requested:

{
	"Datapoints": [
		{
			"Timestamp": "2018-12-16T17:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		},
		{
			"Timestamp": "2018-12-16T18:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		},
		{
			"Timestamp": "2018-12-16T19:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		},

		...

		{
			"Timestamp": "2018-12-17T15:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		},
		{
			"Timestamp": "2018-12-17T16:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		},
		{
			"Timestamp": "2018-12-17T17:03:10Z",
			"Maximum": 1.333,
			"Unit": "Count"
		}
	],
	"Label": "ClusterStatus.red"
}

If the "Maximum" (statistic) property value is greater than or equal to 1, as shown in the output example above, the selected Amazon OpenSearch cluster has the shard allocation status set to Red, therefore the selected cluster (domain) is unhealthy.

05 Repeat steps no. 3 and 4 for each Amazon OpenSearch cluster available in the selected AWS region.

06 Change the AWS cloud region by updating the --region command parameter value and repeat the Audit process for other regions.

Remediation / Resolution

Step 1: Create and configure the CloudWatch alarm required to send alert notifications whenever the health status of your Amazon OpenSearch cluster becomes Red for more than one minute:

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to Amazon SNS console at https://console.aws.amazon.com/sns/.

03 In the main navigation panel, under Dashboard, select Topics.

04 Choose Create topic to initiate the setup process for the new SNS topic.

05 On the Create topic setup page, perform the following actions:

For Type, select Standard.
For Name, provide a unique name for the new SNS topic.
(Optional) For Encryption – optional, choose Enable encryption if you want to enable Server Side Encryption for the new topic. Select a Customer Master Key (CMK) or enter the ARN of an existing CMK in the Customer master key (CMK) box.
(Optional) For Tags – optional, create and configure tags sets for the new SNS topic. You can use tags to search and filter your topics and track your costs.
Choose Create topic to create your new Amazon SNS topic.

06 On the newly created SNS topic page, select the Subscriptions tab, and choose Create subscription.

07 On the Create subscription setup page, select Email from the Protocol dropdown list, provide the email address where you want to receive alert notifications in the Endpoint box, then choose Create subscription to apply the new subscription to your Amazon SNS topic.

08 Use your preferred email client to open the subscription message from the AWS Notifications, then click on the appropriate link to confirm your SNS subscription.

09 Navigate to Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

10 In the main navigation panel, under Alarms, choose All alarms.

11 Choose Create alarm from the console top menu to initiate the CloudWatch alarm setup process.

12 On the Create alarm page, perform the following actions:

For Step 1 Specify metric and conditions, perform the following operations:
- Choose Select metric, select the Browse tab, search for the ClusterStatus.red metric, and select the resulted metric entry.
- In the Metric section, select Average from the Statistic dropdown list, and choose 5 minutes from the Period dropdown list.
- In the Conditions section, select Static as Threshold type. For Whenever ConfigChange is…, select Greater/Equal (greater than or equal to), and enter 1 as the threshold value within than… configuration box to trigger the CloudWatch alarm every time the OpenSearch cluster status becomes Red.
- Choose Next to continue the setup process.
For Step 2 Configure actions, define the alarm state that will trigger the CloudWatch alarm action by selecting In alarm under Alarm state trigger, then choose Select an existing SNS topic and select the name of the SNS topic created at step no. 5 from the Send a notification to… list. Choose Next to continue.
For Step 3 Add name and description, provide a unique name and a short description (optional) for your new CloudWatch alarm in the Alarm name and Alarm description boxes. Choose Next to continue.
For Step 4 Preview and create, review the alarm configuration details, then choose Create alarm to create your new Amazon CloudWatch alarm. Once the data is loaded, the State (status) of the newly created alarm will change from Insufficient datato OK.

Using AWS CLI

01 Run create-topic command (OSX/Linux/UNIX) to create the Amazon SNS topic required to send alert notifications whenever the specified CloudWatch alarm is fired:

aws sns create-topic
  --region us-east-1
  --name cc-cloud-alert-sns-topic

02 The command output should return the Amazon Resource Name (ARN) of the newly created SNS topic:

{
	"TopicArn": "arn:aws:sns:us-east-1:123456789012:cc-cloud-alert-sns-topic"
}

03 Run subscribe command (OSX/Linux/UNIX) to subscribe to the Amazon SNS topic created at the previous step using one or more email addresses as subscription endpoints:

aws sns subscribe
  --region us-east-1
  --topic-arn arn:aws:sns:us-east-1:123456789012:cc-cloud-alert-sns-topic
  --protocol email
  --notification-endpoint alert@trendmicro.com
  --return-subscription-arn

04 The command output should return the ARN of the new SNS subscription:

{
	"SubscriptionArn": "arn:aws:sns:us-east-1:123456789012:cc-cloud-alert-sns-topic:abcdabcd-1234-abcd-1234-abcd1234abcd"
}

05 Run confirm-subscription command (OSX/Linux/UNIX) to confirm the new SNS subscription by validating the token sent to the subscription endpoint (i.e. your email address) specified at the previous step (the command should not produce an output):

aws sns confirm-subscription
  --region us-east-1
  --topic-arn arn:aws:sns:us-east-1:123456789012:cc-cloud-alert-sns-topic
  --token 3567392f37fb687f5d51e6e241d7700ae02f7124d8268910b858cb4db727ceeb2474bb937929d3bdd7ce5d0cce19325d036bcc58d3c217426bcafa9c501a2cac5646456gf1dd3797627467553dc438a8c974119496fc3eff026eaa5d15578ded6f9a5c43aec62d83ef5f49109da730139

06 Run put-metric-alarm command (OSX/Linux/UNIX) to create the CloudWatch alarm that will fire every time the health status of the specified Amazon OpenSearch cluster becomes Red, i.e. unhealthy (if successful, the put-metric-alarm command does not produce an output):

aws cloudwatch put-metric-alarm
  --region us-east-1
  --alarm-name cc-unhealthy-os-cluster-alarm
  --alarm-description "Triggered by the OpenSearch cluster Red health status."
  --metric-name ClusterStatus.red
  --namespace AWS/ES
  --statistic Maximum
  --comparison-operator GreaterThanOrEqualToThreshold
  --evaluation-periods 1
  --period 60
  --threshold 1
  --actions-enabled
  --alarm-actions arn:aws:sns:us-east-1:123456789012:cc-cloud-alert-sns-topic

Step 2: Recovering unhealthy Amazon OpenSearch clusters can be a complex task, therefore you may need the AWS Support team to assist. To create a support case for recovering unhealthy OpenSearch clusters, perform the following operations:

Creating a support case to request recovering unhealthy OpenSearch resources using the AWS Command Line Interface (AWS CLI) is not currently supported

Using AWS Console

01 Sign in to the AWS Management Console.

02 Navigate to AWS Support Center console at https://console.aws.amazon.com/support/.

03 In the Open support cases section, choose Create case to initiate the request process.

04 On the Create case page, perform the following actions:

Select the Technical Support option.
Choose OpenSearch from the Service dropdown list.
Select Cluster Issue from the Category dropdown list.
Provide the request subject in the Subject box, e.g. "Recover unhealthy Amazon OpenSearch cluster".
For Description, provide a concise description of the issue and include the identifier of the Amazon OpenSearch cluster that you want to recover. This will help the AWS support team to evaluate your request.
For Contact options, choose your preferred correspondence language from the Preferred contact language dropdown list, then select a preferred contact method that AWS support team can use to respond to your request from the Contact methods section.
Choose Submit to send your request to Amazon Web Services. A customer support representative should contact you shortly.

References

AWS Command Line Interface (CLI) Documentation
es
list-domain-names
sns
create-topic
subscribe
confirm-subscription
cloudwatch
get-metric-statistics
put-metric-alarm

Publication date Oct 12, 2018

Audit

Using AWS Console

Using AWS CLI

Remediation / Resolution

Using AWS Console

Using AWS CLI

Using AWS Console

References

Related Elasticsearch rules