Performance
Easy Wins
Section titled “Easy Wins”Normally, it’s best practice to start by measuring performance before making any changes. This allows you to understand the impact of your changes, and to identify areas for improvement.
However, given the nature of the problems that Terragrunt solves, there are some obvious wins that you can make without measuring performance, if you’re aware of the tradeoffs.
Provider Cache Dir
Section titled “Provider Cache Dir”One of the most expensive things that OpenTofu/Terraform does, from a bandwidth and disk utilization perspective, is download and install providers. These are large binary files that are downloaded from the internet, and not cached across units by default.
If you’re using OpenTofu >= 1.10 and the latest version of Terragrunt, you’ll use the Automatic Provider Cache Dir feature by default.
This feature automatically configures OpenTofu to use its built-in provider caching mechanism by setting the TF_PLUGIN_CACHE_DIR environment variable to a central location on the filesystem, allowing reuse of downloaded providers across multiple Terragrunt runs.
For most users at sensible scales, this is an automatic performance win that you don’t need to do anything to enable.
Provider Cache Dir - Gotchas
Section titled “Provider Cache Dir - Gotchas”At very large scales, you might find that the filesystem lock contention between OpenTofu processes to synchronize access to the provider cache directory is a bottleneck. You might also find that you can’t use the provider cache directory because you are storing your provider cache in a shared NFS mount or are using Terraform or an older version of OpenTofu.
In these scenarios, you can use the Provider Cache Server feature to improve performance.
Provider Cache Server
Section titled “Provider Cache Server”You can significantly reduce the amount of time taken by Terragrunt runs by enabling the provider cache server, like this:
terragrunt run --all plan --provider-cacheProvider Cache - Gotchas
Section titled “Provider Cache - Gotchas”The provider cache server is a single server that is used by all Terragrunt runs being performed in a given Terragrunt invocation. You will see the most benefit if you are using it in a command that will perform many OpenTofu/Terraform operations, like with the --all flag and the --graph flag.
When performing individual runs, like terragrunt plan, the provider cache server can be a net negative to performance, because starting and stopping the server might add more overhead than just downloading the providers (or using the Automatic Provider Cache Dir feature). Whether this is the case depends on many factors, including network speed, the number of providers being downloaded, and whether or not the providers are already cached in the Terragrunt provider cache.
When in doubt, measure the performance before and after enabling the provider cache server to see if it’s a net win for your use case.
Fetching Output From State
Section titled “Fetching Output From State”Under the hood, Terragrunt dependency blocks leverage the OpenTofu/Terraform output -json command to fetch outputs from one unit and leverage them in another.
The OpenTofu/Terraform output -json command does a bit more work than simply fetching output values from state, and a significant portion of that slowdown is loading providers, which it doesn’t really need in most cases.
You can significantly improve the performance of dependency blocks by using the dependency-fetch-output-from-state experiment. When the experiment is active, Terragrunt will resolve outputs by directly fetching the backend state file from S3 and parse it directly, avoiding any overhead incurred by calling the output -json command of OpenTofu/Terraform.
For example:
terragrunt run --all plan --experiment=dependency-fetch-output-from-stateFetching Output From State - Gotchas
Section titled “Fetching Output From State - Gotchas”The first thing you need to be aware of when considering usage of the dependency-fetch-output-from-state experiment is that it only works for S3 backends. If you are using a different backend, this experiment won’t do anything.
Next, you should be aware that there is no guarantee that OpenTofu/Terraform will maintain the existing schema of their state files, so there is also no guarantee that the flag will work as expected in future versions of OpenTofu/Terraform.
We are coordinating with the OpenTofu team to improve the performance of the output command, and we hope that this flag will be unnecessary for most users in the future.
See #1549 for more details.
Measuring Performance
Section titled “Measuring Performance”Before diving into any particular performance optimization, it’s important to first measure performance, and to make sure that you measure performance after any changes so that you understand the impact of your changes.
To measure performance, you can use multiple tools, depending on your role.
End User
Section titled “End User”As an end user, you’re advised to use the following tools to get a better understanding of the performance of Terragrunt.
OpenTelemetry
Section titled “OpenTelemetry”Use OpenTelemetry to collect traces from Terragrunt runs so that you can analyze the performance of individual operations when using Terragrunt.
This can be useful both to identify bottlenecks in Terragrunt, and to understand when performance changes can be attributed to integrations with other tools, like OpenTofu or Terraform.
Benchmark Usage
Section titled “Benchmark Usage”Use benchmarking tools like Hyperfine to run benchmarks of your Terragrunt usage to compare the performance of different versions of Terragrunt, or with different configurations.
You can use configurations like the --warmup flag to do some warmup runs before the actual benchmarking. This is useful to get a more accurate measurement of the performance of Terragrunt with cache populated, etc.
Here’s an example of how to use Hyperfine to benchmark the performance of Terragrunt with two different configurations:
hyperfine -w 3 -r 5 'terragrunt run --all plan' 'terragrunt run --all plan --experiment=dependency-fetch-output-from-state'Terragrunt Developer
Section titled “Terragrunt Developer”As a Terragrunt developer, you’re advised to use the following tools to improve the performance of Terragrunt when improving the codebase.
Benchmark Tests
Section titled “Benchmark Tests”Use Benchmark tests to measure the performance of particular subroutines in Terragrunt.
These benchmarks give you a good indication of the performance of a particular part of Terragrunt, and can help you identify areas for improvement. You can run benchmark tests like this:
go test -bench=BenchmarkSomeFunctionYou can also run benchmarks with different configurations, like the following for getting memory allocation information as well:
go test -bench=BenchmarkSomeFunction -benchmemYou can learn more about benchmarking in Go by reading the official documentation.
Profiling
Section titled “Profiling”Use profiling tools like pprof to get a more detailed view of the performance of Terragrunt.
For example, you could use the following command to profile a particular test:
go test -run 'SomeTest' -cpuprofile=cpu.prof -memprofile=mem.profYou can then use the go tool pprof command to analyze the profile data:
go tool pprof cpu.profIt can be helpful to use the web interface to view the profile data using flame graphs, etc.
go tool pprof -http=:8080 cpu.profYou can learn more about profiling in Go by reading the official documentation.