There are many security benefits of immutable infrastructure. In the previous article, we took a high-level look at some of the potential advantages for software development organizations that implement II. Immutability is an important element in improving scalability, business outcomes, and, most importantly, security posture.
Now in part 2, we’ll explore some practical aspects of II. As a concept, immutable infrastructure is straightforward. As a practice, it’s sometimes not so easy. Engineering leadership can perform high-level planning, but going beyond that usually requires organizational buy-in and a detail-oriented technical implementation.
In this article, we’ll discuss some concrete steps and implementation patterns. We’ll see how to use immutability and significantly improve security by:
Remote access mechanisms represent one of the more commonly exploited attack vectors on software infrastructure. Finding a way to limit the attack surface and scope of these mechanisms should be an important focus of software and ops teams. Fortunately, there are a wide variety of tools and architectural patterns available for these teams to implement immutability and reduce their reliance on legacy access patterns.
Traditional server fleets often allow direct remote access via Secure Shell (SSH), Remote Desktop (RDP), or older protocols like Telnet. In legacy, on-premises network architectures, these protocols could be operated with relative safety; WAN or internet access to the network was strictly controlled via firewalls and NAT gateways, and no node could be directly accessed without passing through the firewall. In modern cloud-based architecture, however, nearly every node has or is capable of direct internet access, making remote access to that node a much bigger security liability.
In modern distributed systems, interactive (human) users will often directly access nodes via remote access protocols to make out-of-band configuration changes or perform maintenance tasks. Legacy configuration management tools like Chef, Puppet, and Ansible can use SSH to help make managing large systems more scalable, but they still depend on broad access to every node. Even with these tools, configuration drift is inevitable, and in the context of access management, this is especially dangerous.
How to Solve It with an Immutable Pattern
Instead of depending on the persistent access required by configuration management tools, why not have all of the required configuration completed before the server launches? Tools like HashiCorp’s Packer make that possible.
Packer has an extensive list of features and capabilities, but at a high level, it is used to generate pre-configured machine images for a variety of cloud and compute orchestration platforms. Users can define numerous configuration options for the installation of OS packages, files, and data. Packer will then launch an ephemeral node, perform all the pre-defined configuration tasks against that node as if it were a live environment, and then capture the entire state of the machine in a “golden image” format that the platform utilizes, such as an Amazon Machine Image. This image can then be used to spin up pre-configured nodes en masse.
Packer brings numerous benefits in terms of improved security and reduced complexity, and the pattern it provides is a fundamental component of immutable infrastructure. Engineering teams can now reduce their reliance on configuration management tools that require persistent remote access; instead, they can roll out configuration changes by launching entirely new nodes with new images. Teams can also use Packer in automation, like CI/CD, to further streamline the process and provide additional testing guardrails.
For organizations where direct node access is still a requirement (think Rails or PHP consoles), a different pattern can be used that still follows immutability. Administrative management functions should be run as one-off processes from a dedicated proxy node. Configuration for any node only needs to allow access from a single proxy. Teams can then restrict network access to only to and from the proxy using tooling that’s commonly available on nearly all major cloud platforms.
Modern development environments for even simple web-based applications are surprisingly complex. Maintaining homogenous environments between local development, testing, staging, and production is one of the key drivers of development and deployment velocity. Using immutable principles can help drive this homogeneity by simplifying dependency management.
Returning to the web-based example, for this scenario, imagine a hypothetical environment where a JavaScript-based application is deployed to a standard compute node like EC2 or Google Compute Engine. The JavaScript ecosystem has an incredibly diverse selection of libraries, build and compile tooling, and middleware. A developer may need to install tens or hundreds of versioned dependencies, as well as a variety of compile and middleware tools for a functional development environment. Once work on a feature or code change has been completed, the developer needs to get that change deployed into its destination environment (typically the production stack). The change depends on a fixed-point-in-time representation of all the dependencies and tooling used.
What happens if a developer’s carefully crafted changeset doesn’t work in the test environment? They might change a few dependencies or a build tool version. That might fix the tests, but what happens if that same change breaks in production or worse, causes an outage? The task of identifying the actual issue becomes much more difficult with multiple changes layered on top of each other, slowing deployment velocity measurably.
How to Solve It with an Immutable Pattern
The ultimate goal is to create an immutable build artifact: a fully encapsulated unit containing all the changes and necessary dependency and tooling needed for a successful deployment. Containers are an ideal medium for this approach.
A container build artifact is tagged with a unique hash value, like a version commit. Build and deploy stages are strictly separated; if additional changes need to be made, they are done with a new artifact starting from a fresh build stage. If tests fail or the change breaks in any environment, a new build is created. This strict separation of environments prevents any changes from being populated back into previous environments, avoiding the layered change complexity discussed in the previous section.
Using AWS as an example, teams can use ECR to store tagged images, accessible to other teams, which can then be utilized by the shared build and deployment automation. If the artifact passes all tests, then it’s time to deploy to a managed container environment like ECS or EKS. With this build and deploy pattern, it’s very easy at a glance to understand the application lifecycle in the context of what versions are deployed and available.
In traditional server fleets, the servers are typically long-lived and treated like pets. They have unique and often inconsistent naming, and, over time, they accumulate multiple layers of changes, tweaks, and hacks to accommodate new deployments, tooling, security patches, and other changes. Over time, it becomes nearly impossible to provide a shared base of understanding of what the canonically correct configuration of these servers should be.
As long as the production environment is technically “working,” this problem will probably remain unaddressed. However, there may come a time when engineers need to deploy a critical security fix to an application dependency or to an OS package that has a critical security flaw. Because there are likely to be different versions of packages and configurations everywhere, there is no easy or clear path to centrally manage and audit an upgrade. Things can easily break, and an outage is almost guaranteed.
These types of environments tend to breed a fear of deploys. Fewer deployments are a vicious, devolving cycle of DevOps “badness.” Fewer deploys also means updates to things like security flaws are avoided, resulting in a dangerous degradation of security posture and, most probably, a compromise. Fortunately, the principle of immutable infrastructure can again provide a solution.
How to Solve It with an Immutable Pattern
Tools like Packer again prove valuable here. When combined with CI/CD and infrastructure testing, it helps improve outcomes in deployments and security objectives.
Like any infrastructure-as-code tool, engineers can store Packer’s configuration in version control. Combined with CI/CD and automated tagging, teams can help create a canonical, centrally shared configuration of production servers and workloads. If any engineer or stakeholder wants to understand what is currently running in the production environment, they would simply need to correlate the tag of a running server with the matching code change, along with the resulting build and deploy process.
If a dependency or OS package needs updating, a new server image is built, tested with current application versions, and deployed as a replacement to existing servers. Infrastructure-as-code tooling like Terraform enables a “nuke and pave” approach—tearing down existing infrastructure and replacing it with wholly new servers with the new configuration. This pattern lets engineering teams have more confidence in the safety of their deployments and changes. More confidence leads to more deploys, which is a virtuous cycle that improves security posture and is also an established metric for indicating the relative success of a software engineering organization.
Immutability in infrastructure and software development, in general, is an important element in improving scalability, business outcomes, and, most critically, security posture.
Engineering teams that adopt immutable infrastructure patterns in their processes and architecture will be more confident in their ability to make frequent changes, deploy critical security fixes, and innovate entirely new application features. Software organizations should look for quick wins that can set them on the longer path to full immutability.
For many organizations, this will also require an updated approach to web security. In today’s threat environment, applications and APIs not only need comprehensive protection, they also require a security solution that’s compatible with II and the other practices discussed above.
Link11 offers an all-in-one cloud web security solution. It includes WAF, DDoS protection, advanced rate limiting, bot management, session flow control, API security, real-time traffic visibility, ATO (account takeover) prevention, and more.