Terraform And The Dangers Of Applying Locally
INFO
This post was originally written on July 13th, 2018
Original post: https://medium.com/runatlantis/terraform-and-the-dangers-of-applying-locally-543563782a73
If you're using Terraform then at some point you've likely ran a terraform apply
that reverted someone else's change!
Here's how that tends to happen:
The Setup
Say we have two developers: Alice and Bob. Alice needs to add a new security group rule. She checks out a new branch, adds her rule and creates a pull request:
When she runs terraform plan
locally she sees what she expects.
Meanwhile, Bob is working on an emergency fix. He checks out a new branch and adds a different security group rule called emergency
:
And, because it's an emergency, he immediately runs apply:
Now back to Alice. She's just gotten approval on her pull request change and so she runs terraform apply
:
Did you catch what happened? Did you notice that the apply
deleted Bob's rule?
In this example, it wasn't too hard to see. However if the plan is much longer, or if the change is less obvious then it can be easy to miss.
Possible Solutions
There are some ways to avoid this:
Use terraform plan -out
If Alice had run terraform plan -out plan.tfplan
then when she ran terraform apply plan.tfplan
she would see:
The problem with this solution is that few people run terraform plan
anymore, much less terraform plan -out
!
It's easier to just run terraform apply
and humans will take the easier path most of the time.
Wrap terraform apply
to ensure up to date with master
Another possible solution is to write a wrapper script that ensures our branch is up to date with master
. But this doesn't solve the problem of Bob running apply
locally and not yet merging to master
. In this case, Alice's branch would have been up to date with master
but not the latest apply'd state.
Be more disciplined
What if everyone:
- ALWAYS created a branch, got a pull request review, merged to
master
and then ran apply. And also everyone - ALWAYS checked to ensure their branch was rebased from
master
. And also everyone - ALWAYS carefully inspected the
terraform plan
output and made sure it was exactly what they expected
...then we wouldn't have a problem!
Unfortunately this is not a real solution. We're all human and we're all going to make mistakes. Relying on people to follow a complicated process 100% of the time is not a solution because it doesn't work.
Core Problem
The core problem is that everyone is applying from their own workstations and it's up to them to ensure that they're up to date and that they keep master
up to date. This is like developers deploying to production from their laptops.
What if, instead of applying locally, a remote system did the apply's?
This is why we built Atlantis – an open source project for Terraform automation by pull request. You could also accomplished this with your own CI system or with Terraform Enterprise. Here's how Atlantis solves this issue:
When Alice makes her change, she creates a pull request and Atlantis automatically runs terraform plan
and comments on the pull request.
When Bob makes his change, he creates a pull request and Atlantis automatically runs terraform plan
and comments on the pull request.
Atlantis also locks the directory to ensure that no one else can run plan
or apply
until Alice's plan has been intentionally deleted or she merges the pull request.
If Bob creates a pull request for his emergency change he'd see this error:
Alice can then comment atlantis apply
and Atlantis will run the apply itself:
Finally, she merges the pull request and unlocks Bob's branch:
But what if Bob ran apply
locally?
In that case, Alice is still okay because when Atlantis ran terraform plan
it used -out
. If Alice tries to apply that plan, Terraform will give an error because the plan was generated against an old state.
Why does Atlantis run apply
on the branch and not after a merge to master
?
We do this because terraform apply
fails quite often, despite terraform plan
succeeding. Usually it's because of a dependency issue between resources or because the cloud provider requires a certain format or a certain field to be set. Regardless, in practice we've found that apply
fails a lot.
By locking the directory, we're essentially ensuring that the branch being apply
'd is "master"
since no one else can modify that state. We then get the benefit of being able to iterate on the pull request and push small fixes until we're sure that the changeset is apply
'd. If apply
failed after merging to master
, we'd have to open new pull requests over and over again. There is definitely a tradeoff here, however we believe it's the right tradeoff.
Conclusion
In conclusion, running terraform apply
when you're working with a team of operators can be dangerous. Look to solutions like your own CI, Atlantis or Terraform Enterprise to ensure you're always working off the latest code that was apply
'd.
If you'd like to try Atlantis, you can get started here: https://www.runatlantis.io/guide/