AWS
Guardrails to Avoid Cloud Misconfigurations
The stakes and opportunities are higher than ever to ensure that strong operational excellence strategies are implemented. Explore how to help ensure you are holding up your end of the bargain as part of your CSP’s Shared Responsibility Model.
Building the Foundation of Great Architecture
There is no shortage of benefits when it comes to the cloud, and your teams are taking notice. Capitalising on the advantages of the cloud, your organisation is racing to make the shift, however, you need to take a step back to ensure operational excellence is a priority.
When it comes to cloud operational excellence, some jump to the assumption that it doesn’t require the same attention as traditional on-premises environments. But the truth is, there are many aspects that need to be considered to achieve this type of excellence. If anything, the stakes and opportunities are higher than ever to ensure that strong operational excellence strategies are implemented. This is especially true when it comes to partnering with cloud service providers (CSP) and ensuring you are holding up your end of the bargain as part of your CSP’s Shared Responsibility Model.
A Cloud Configuration Framework for Simplicity & Breach Protection
The first step in the journey to operational excellence is ensuring cloud builders are following best practise architectures, like the Amazon Web Services (AWS) Well-Architected Framework and Microsoft® Azure™ Well-Architected Framework. These frameworks were developed to help cloud architects and developers build secure, high-performing, resilient, and efficient infrastructure for their applications.
Operational excellence is a key theme in both frameworks to keep a system running in production and provide a consistent approach to evaluate architectures and implement designs that will scale over time. It’s important to ensure your architecture and workloads are aligned with engineering best practises and standards to ensure they are truly operationally excellent.[1] These frameworks provide a foundation for businesses to build in the cloud more effectively and deliver greater business value. Let’s dive into some of the ways operational excellence can help to build architecture that enables business success.
Rest Easy with Operational Guardrails
When it is time for your company to organise Cloud Centres of Excellence and implement shared services across your cloud environments, you will want to ensure best practises are consistently enforced. These operational guardrails move organisation towards operational excellence, ensuring standard functions occur predictably and consistently across the organisation. With these controls in place, you will have confidence that:
- Critical data stored in the cloud is protected by automatic enforcement
- Network access policies and security groups are always properly configured to minimise unrestricted access
- Identity and access management permissions are defined for controlled access
Automatic operational controls ensure rules for these shared services are enforced at scale and are following best practises, external regulatory compliance, and your organisation’s internal governance. Now, take a deep breath and rest easy knowing your organisation won’t be in tomorrow’s headline for the latest security breach.
Did Someone say Automation?
To leverage the agility of the cloud or experience the cost savings typically associated with cloud adoption, automation will reign supreme. Even the most skilled, dedicated, and experienced developer makes errors, it’s just human nature.
Treating your operations as code, such as scripting your runbook and playbook activities, reduces the risk of human error, but introduces different risks to operational excellence. Developers often find themselves in high-pressured scenarios, forced to meet deadlines and deliver something that works—even if they know that they are not following coding best practises. As an example, in a rush to meet a deliverable, your IT team may decide not to configure granular IAM permissions for a virtual server. Granular permissions using IAM roles provide an additional level of protection by ensuring that your infrastructure is aware of its users, so it enforces coarse-grain permissions on what they can do. Now, without the proper configuration, the organisation could easily suffer a devastating security breach. The bottom line is, it’s important to ensure best practises are followed across the development process, even on the tightest of timelines.
Automation can ensure you get the most out of your cloud infrastructure by utilising things like auto-scaling, self-healing, deployment scripts, customised reporting, and more. Operations as code allow architects and DevOps engineers to version the application infrastructure as much as the developers are versioning the code. Building and operating architecture that maximises efficiency and is highly responsive will free your teams to build applications to support business goals.
Infrastructure as Code = Fast Innovation
As discussed, the increasing preference for automation, alongside the accelerated adoption of cloud computing and CI/CD practises, means infrastructure is now designed, deployed, and configured in an entirely new way. Needless to say, the cloud is your oyster and you can achieve almost anything you wish.
In the cloud, you can:
- Apply the same engineering discipline that you use for application code to your entire cloud environment
- Define your entire workload as code and update it with code
- Script your operations’ procedures and automate their execution by triggering them in response to events
Another way to increase your usage of automation is with Infrastructure as Code (IaC). This entails the provisioning and management of cloud resources and infrastructure through formatted, machine-readable files. The management of virtualisation through automation and using automation tools, like AWS CloudFormation or Terraform templates, is a great way to do this. CloudFormation can be used to create and provision cloud infrastructure resources, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, with a simple text file. This text file describes a collection or stack of AWS resources to be deployed and configured together.
The business benefit of using IaC is its consistency, speed, and the lower costs for projects to be created and deployed. This advanced and efficient infrastructure deployment method means critical changes on your cloud environments can be completed quicker than ever. So, what’s the catch?
Unfortunately, security, compliance, and performance implications can also be introduced just as easily. To instil more confidence in using IaC, there are solutions that test your CloudFormation scripts before deployment, so only the cleanest and most secure templates make it to your environments. Thus, potentially damaging changes can be easily inspected or rolled back. For example, if an Amazon Simple Storage Service (Amazon S3) bucket is created without server access logging enabled, an AWS Lambda function could be triggered to automatically implement the best practise. Cheques for improvements and the quality of your CloudFormation collection without the need to execute the code first is extremely valuable for cloud builders.
A Giant Step to the Left
DevOps has brought a methodology of “fail fast, fail often” to the masses, which has helped teams innovate and move faster than ever. While this may seem great, a lack of quality can be hard to explain when a critical failure is discovered, such as an unencrypted Amazon S3 bucket, resulting in a data leak.
Ideally, you would have guardrails as far left as possible in the CI/CD pipeline—right into the developers’ hands. Leading cloud builders are using these automated, preventative measures before code is deployed to ensure security and compliance. Here are some examples of common and easily missed misconfigurations:
- Allowing public access to Amazon S3 buckets that are storing sensitive data
- Opening too many TCP ports within Amazon EC2 security groups
- Allowing unrestricted access through Azure Network Security Groups (NSG)
- Permitting malicious behaviour in Azure SQL Database
- Granting permissions to wrong IAM users and roles
To enable full confidence that security vulnerabilities, cloud resource leaks, and performance and reliability issues won’t make it into production, you need a solution that can:
- Predict if an incident will happen and then provide remediation early in development—resolving multiple concerns before they even occur
- Check your workloads against rules before deploying them live to your cloud infrastructure. Each resource should be checked against hundreds of industry best practises, including the AWS Well-Architected Framework, CIS Microsoft Azure Foundations Security Benchmark, ISO 27001, HIPPA, PCI DSS, and GDPR
Shifting operational excellence, security, governance, and compliance checking to the earliest phase of the CI/CD pipeline enables automated, proactive prevention of misconfigurations. What’s more, these same cheques and self-healing can also be performed in live cloud environments. Regardless of when you scan your code to check for alignment to best practises, give your organisation peace of mind that they are building great architecture.
Too Many Cooks in the Kitchen
One of the biggest challenges in modern software development is that every deployment is dependent on multiple teams. Developers, operations, infrastructure engineers, and business units all have a role to play in ensuring that an application is delivered successfully. Getting alignment from all of these different teams can be tough. Regardless of your team’s structure, working towards operational excellence will help overcome the challenge.
Rather than being a burden, operational excellence can serve as a cultural goal that is shared by all teams and team members during the software development and deployment process. By transforming operational excellence into a culture, your teams can have an overarching goal to strive towards, which is important when working with cross-functional teams. A culture of operational excellence helps to set a standard of best practises, continuous improvement, and collective pride in what the team is building and deploying, ultimately contributing to the success of the business.[2]
Times are Changing…Are You?
Cloud service providers are constantly coming out with new services and best practises. Even if your accounts were completely optimised, reliable, efficient, and secure a few weeks ago, there’s no guarantee they are today or tomorrow.
How valuable would it be to have comprehensive visibility of your infrastructure and automatically adhere to best practises, security, and compliance? With this information, you can continue to evolve your cloud infrastructure, while continually building great architecture. Ultimately, helping to foster innovation and the foundations for business success in your organisation.
Operational excellence is a combination of processes and continuous improvement to ensure your infrastructure remains secure, reliable, efficient, and cost effective. Every operational event and failure should be treated as an opportunity to improve your architecture. For developers and IT teams, this can seem like a daunting task, but with a culture of operational excellence, you may find teams are up for the challenge.
Now What?
Enabling cloud operational excellence to support your business’s innovation goals relies on finding a solution that has:
- Multi-cloud visibility for a real-time view of security, compliance, and governance within your cloud infrastructure
- Hundreds of automated cheques with self-healing based on cloud service provider’s well-architected framework, the latest best practises, and industry compliance requirements—eliminating risks
- Reporting features that can run reports on an endless combination of filters to exhaustively audit your infrastructure
- Seamless integration into your CI/CD pipeline and existing workflows through APIs, enabling the ability to have deep and intuitive integration into your live public cloud environments
- Template scanners that are used during the coding process to ensure your teams are building well-architecture for automated, proactive prevention of vulnerabilities
Trend Micro Cloud One™ – Conformity provides continuous security, compliance, and governance in a SaaS platform, designed to help you manage misconfigurations of cloud resources in a multi-cloud environment. Conformity helps cloud builders have the confidence their cloud infrastructure is configured and compliant to grow and scale their business.
References:
1. Fitzsimons, P., B. C., Steele, J., & King, R. (2018). Amazon Web Services – Operational Excellence AWS Well-Architected Framework. Retrieved from https://d0.awsstatic.com/whitepapers/architecture/AWS-Operational-Excellence-Pillar.pdf?ref=wellarchitected-wp
2. Tozzi, C. (2019, November 19). Operational Excellence and the Success of Software Deployments. Retrieved from https://devops.com/operational-excellence-and-the-success-of-software-deployments/