7
Terraform requires a DAG. AWS allows cycles. Here's how I map the difference.
Error: Cycle: aws_security_group.app -> aws_security_group.db -> aws_security_group.app
If you've ever seen this error while importing AWS infrastructure to Terraform, you know the pain.
Terraform's core engine relies on a Directed Acyclic Graph (DAG). It needs to know: "Create A first, then B."
But AWS is eventually consistent and happily allows cycles.
The Deadlock
The most common culprit is Security Groups. Imagine two microservices:
- SG-App allows outbound traffic to SG-DB - SG-DB allows inbound traffic from SG-App
If you write this with inline rules (which is what terraform import defaults to), you create a cycle:
resource "aws_security_group" "app" {
egress {
security_groups = [aws_security_group.db.id]
}
}
resource "aws_security_group" "db" {
ingress {
security_groups = [aws_security_group.app.id]
}
}
Terraform cannot apply this. It can't create app without db's ID, and vice versa.The Graph Theory View
When building an infrastructure reverse-engineering tool, I realized I couldn't just dump API responses to HCL. We model AWS as a graph: Nodes are Resources, Edges are Dependencies.
In a healthy config, dependencies are a DAG: [VPC] --> [Subnet] --> [EC2]
But Security Groups often form cycles: ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘
Finding the Knots
To solve this for thousands of resources, we use Tarjan's algorithm to find Strongly Connected Components (SCCs). It identifies "knots" — clusters of nodes that are circularly dependent — and flags them for surgery.
In our testing, a typical enterprise AWS account with 500+ SGs contains 3-7 of these clusters.
The Fix: "Shell & Fill"
We use a strategy to break the cycle:
1. Create Empty Shells: Generate SGs with no rules. Terraform creates these instantly. 2. Fill with Rules: Extract rules into separate aws_security_group_rule resources that reference the shells.
Step 1: Create Shells
[SG-App (empty)] [SG-DB (empty)]
Step 2: Create Rules
▲ ▲
│ │
[Rule: egress->DB] [Rule: ingress<-App]
The graph is now acyclic."Why not just always use separate rules?"
Fair question. The problem is: 1. terraform import often generates inline rules. 2. Many existing codebases prefer inline rules for readability. 3. The AWS API presents the "logical" view (rules bundled inside).
The tool needs to detect cycles and surgically convert only the problematic ones.
Why terraform import isn't enough
Standard import reads state as-is. It doesn't build a global dependency graph or perform topological sorting before generating code. It places the burden of refactoring on the human. For brownfield migrations with 2,000+ resources, that's not feasible.
---
I've implemented this graph engine in a tool called RepliMap. I've open-sourced the documentation and IAM policies needed to run read-only scans safely.
If you're interested in edge cases like this (or the root_block_device trap), the repo is here:
https://github.com/RepliMap/replimap-community
Happy to answer questions.
Please don't do this. Ask HN isn't your blogging platform. Per the guidelines its for asking questions of the community.
Appreciate the feedback. To be transparent: I originally submitted this as a standard text post, but after it hit a spam filter, the HN moderators kindly restored it and moved it to /ask themselves to help with visibility.
I'm definitely here for the dialogue, specifically looking to compare notes on graph algorithms with other IaC engineers.
Author here. A few implementation notes:
1. We use NetworkX for the graph operations. Tarjan's SCC detection is O(V+E), so it scales well even for large accounts.
2. The trickiest part isn't the algorithm — it's mapping AWS API responses to graph edges. AWS APIs are... inconsistent. Some resources return IDs, some ARNs, some Names. Security Groups can reference themselves, reference by ID or by name, and have rules scattered across inline blocks and separate resources. Normalizing this soup into a clean adjacency matrix is where 80% of the engineering work lives.
3. For those wondering about the "Shell & Fill" naming: it's essentially forcing Terraform's create_before_destroy lifecycle behavior manually, by decoupling the resource identity from its configuration.
Would love to hear if others have hit similar graph problems with other IaC tools (Pulumi, CDK, CloudFormation).
Not IAC, but I’ve been doing a similar trick to sequence adding type annotations to python code,
Eg take the module graph, break the SCCs in a similar manner , then take a reverese topological sort of the imports (now a dag by construction).
That's a spot-on parallel! Python circular imports (especially for type hinting) are basically the software equivalent of this infrastructure deadlock.
Do you use string-based forward references ("ClassName") to break the cycles? That's essentially our "empty shell" trick — decoupling the resource identity from its configuration to satisfy the graph.
Did you stick with Tarjan's for the SCC detection on the module graph?