AWS
Guardrails to Avoid Cloud Misconfigurations
The stakes and opportunities are higher than ever to ensure that strong operational excellence strategies are implemented. Explore how to help ensure you are holding up your end of the bargain as part of your CSP’s Shared Responsibility Model.
Building the Foundation of Great Architecture
There is no shortage of benefits when it comes to the cloud, and your teams are taking notice. Capitalizing on the advantages of the cloud, your organization is racing to make the shift, however, you need to take a step back to ensure operational excellence is a priority.
When it comes to cloud operational excellence, some jump to the assumption that it doesn’t require the same attention as traditional on-premises environments. But the truth is, there are many aspects that need to be considered to achieve this type of excellence. If anything, the stakes and opportunities are higher than ever to ensure that strong operational excellence strategies are implemented. This is especially true when it comes to partnering with cloud service providers (CSP) and ensuring you are holding up your end of the bargain as part of your CSP’s Shared Responsibility Model.
A Cloud Configuration Framework for Simplicity & Breach Protection
The first step in the journey to operational excellence is ensuring cloud builders are following best practice architectures, like the Amazon Web Services (AWS) Well-Architected Framework and Microsoft® Azure™ Well-Architected Framework. These frameworks were developed to help cloud architects and developers build secure, high-performing, resilient, and efficient infrastructure for their applications.
Operational excellence is a key theme in both frameworks to keep a system running in production and provide a consistent approach to evaluate architectures and implement designs that will scale over time. It’s important to ensure your architecture and workloads are aligned with engineering best practices and standards to ensure they are truly operationally excellent.[1] These frameworks provide a foundation for businesses to build in the cloud more effectively and deliver greater business value. Let’s dive into some of the ways operational excellence can help to build architecture that enables business success.
Rest Easy with Operational Guardrails
When it is time for your company to organize Cloud Centers of Excellence and implement shared services across your cloud environments, you will want to ensure best practices are consistently enforced. These operational guardrails move organization towards operational excellence, ensuring standard functions occur predictably and consistently across the organization. With these controls in place, you will have confidence that:
- Critical data stored in the cloud is protected by automatic enforcement
- Network access policies and security groups are always properly configured to minimize unrestricted access
- Identity and access management permissions are defined for controlled access
Automatic operational controls ensure rules for these shared services are enforced at scale and are following best practices, external regulatory compliance, and your organization’s internal governance. Now, take a deep breath and rest easy knowing your organization won’t be in tomorrow’s headline for the latest security breach.
Did Someone say Automation?
To leverage the agility of the cloud or experience the cost savings typically associated with cloud adoption, automation will reign supreme. Even the most skilled, dedicated, and experienced developer makes errors, it’s just human nature.
Treating your operations as code, such as scripting your runbook and playbook activities, reduces the risk of human error, but introduces different risks to operational excellence. Developers often find themselves in high-pressured scenarios, forced to meet deadlines and deliver something that works—even if they know that they are not following coding best practices. As an example, in a rush to meet a deliverable, your IT team may decide not to configure granular IAM permissions for a virtual server. Granular permissions using IAM roles provide an additional level of protection by ensuring that your infrastructure is aware of its users, so it enforces coarse-grain permissions on what they can do. Now, without the proper configuration, the organization could easily suffer a devastating security breach. The bottom line is, it’s important to ensure best practices are followed across the development process, even on the tightest of timelines.
Automation can ensure you get the most out of your cloud infrastructure by utilizing things like auto-scaling, self-healing, deployment scripts, customized reporting, and more. Operations as code allow architects and DevOps engineers to version the application infrastructure as much as the developers are versioning the code. Building and operating architecture that maximizes efficiency and is highly responsive will free your teams to build applications to support business goals.
Infrastructure as Code = Fast Innovation
As discussed, the increasing preference for automation, alongside the accelerated adoption of cloud computing and CI/CD practices, means infrastructure is now designed, deployed, and configured in an entirely new way. Needless to say, the cloud is your oyster and you can achieve almost anything you wish.
In the cloud, you can:
- Apply the same engineering discipline that you use for application code to your entire cloud environment
- Define your entire workload as code and update it with code
- Script your operations’ procedures and automate their execution by triggering them in response to events
Another way to increase your usage of automation is with Infrastructure as Code (IaC). This entails the provisioning and management of cloud resources and infrastructure through formatted, machine-readable files. The management of virtualization through automation and using automation tools, like AWS CloudFormation or Terraform templates, is a great way to do this. CloudFormation can be used to create and provision cloud infrastructure resources, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, with a simple text file. This text file describes a collection or stack of AWS resources to be deployed and configured together.
The business benefit of using IaC is its consistency, speed, and the lower costs for projects to be created and deployed. This advanced and efficient infrastructure deployment method means critical changes on your cloud environments can be completed quicker than ever. So, what’s the catch?
Unfortunately, security, compliance, and performance implications can also be introduced just as easily. To instill more confidence in using IaC, there are solutions that test your CloudFormation scripts before deployment, so only the cleanest and most secure templates make it to your environments. Thus, potentially damaging changes can be easily inspected or rolled back. For example, if an Amazon Simple Storage Service (Amazon S3) bucket is created without server access logging enabled, an AWS Lambda function could be triggered to automatically implement the best practice. Checks for improvements and the quality of your CloudFormation collection without the need to execute the code first is extremely valuable for cloud builders.
A Giant Step to the Left
DevOps has brought a methodology of “fail fast, fail often” to the masses, which has helped teams innovate and move faster than ever. While this may seem great, a lack of quality can be hard to explain when a critical failure is discovered, such as an unencrypted Amazon S3 bucket, resulting in a data leak.
Ideally, you would have guardrails as far left as possible in the CI/CD pipeline—right into the developers’ hands. Leading cloud builders are using these automated, preventative measures before code is deployed to ensure security and compliance. Here are some examples of common and easily missed misconfigurations:
- Allowing public access to Amazon S3 buckets that are storing sensitive data
- Opening too many TCP ports within Amazon EC2 security groups
- Allowing unrestricted access through Azure Network Security Groups (NSG)
- Permitting malicious behavior in Azure SQL Database
- Granting permissions to wrong IAM users and roles
To enable full confidence that security vulnerabilities, cloud resource leaks, and performance and reliability issues won’t make it into production, you need a solution that can:
- Predict if an incident will happen and then provide remediation early in development—resolving multiple concerns before they even occur
- Check your workloads against rules before deploying them live to your cloud infrastructure. Each resource should be checked against hundreds of industry best practices, including the AWS Well-Architected Framework, CIS Microsoft Azure Foundations Security Benchmark, ISO 27001, HIPPA, PCI DSS, and GDPR
Shifting operational excellence, security, governance, and compliance checking to the earliest phase of the CI/CD pipeline enables automated, proactive prevention of misconfigurations. What’s more, these same checks and self-healing can also be performed in live cloud environments. Regardless of when you scan your code to check for alignment to best practices, give your organization peace of mind that they are building great architecture.
Too Many Cooks in the Kitchen
One of the biggest challenges in modern software development is that every deployment is dependent on multiple teams. Developers, operations, infrastructure engineers, and business units all have a role to play in ensuring that an application is delivered successfully. Getting alignment from all of these different teams can be tough. Regardless of your team’s structure, working towards operational excellence will help overcome the challenge.
Rather than being a burden, operational excellence can serve as a cultural goal that is shared by all teams and team members during the software development and deployment process. By transforming operational excellence into a culture, your teams can have an overarching goal to strive towards, which is important when working with cross-functional teams. A culture of operational excellence helps to set a standard of best practices, continuous improvement, and collective pride in what the team is building and deploying, ultimately contributing to the success of the business.[2]
Times are Changing…Are You?
Cloud service providers are constantly coming out with new services and best practices. Even if your accounts were completely optimized, reliable, efficient, and secure a few weeks ago, there’s no guarantee they are today or tomorrow.
How valuable would it be to have comprehensive visibility of your infrastructure and automatically adhere to best practices, security, and compliance? With this information, you can continue to evolve your cloud infrastructure, while continually building great architecture. Ultimately, helping to foster innovation and the foundations for business success in your organization.
Operational excellence is a combination of processes and continuous improvement to ensure your infrastructure remains secure, reliable, efficient, and cost effective. Every operational event and failure should be treated as an opportunity to improve your architecture. For developers and IT teams, this can seem like a daunting task, but with a culture of operational excellence, you may find teams are up for the challenge.
Now What?
Enabling cloud operational excellence to support your business’s innovation goals relies on finding a solution that has:
- Multi-cloud visibility for a real-time view of security, compliance, and governance within your cloud infrastructure
- Hundreds of automated checks with self-healing based on cloud service provider’s well-architected framework, the latest best practices, and industry compliance requirements—eliminating risks
- Reporting features that can run reports on an endless combination of filters to exhaustively audit your infrastructure
- Seamless integration into your CI/CD pipeline and existing workflows through APIs, enabling the ability to have deep and intuitive integration into your live public cloud environments
- Template scanners that are used during the coding process to ensure your teams are building well-architecture for automated, proactive prevention of vulnerabilities
Trend Micro Cloud One™ – Conformity provides continuous security, compliance, and governance in a SaaS platform, designed to help you manage misconfigurations of cloud resources in a multi-cloud environment. Conformity helps cloud builders have the confidence their cloud infrastructure is configured and compliant to grow and scale their business.
References:
1. Fitzsimons, P., B. C., Steele, J., & King, R. (2018). Amazon Web Services – Operational Excellence AWS Well-Architected Framework. Retrieved from https://d0.awsstatic.com/whitepapers/architecture/AWS-Operational-Excellence-Pillar.pdf?ref=wellarchitected-wp
2. Tozzi, C. (2019, November 19). Operational Excellence and the Success of Software Deployments. Retrieved from https://devops.com/operational-excellence-and-the-success-of-software-deployments/