Modernizing Infrastructure as Code

From Monolithic to Modular: A Terragrunt Journey

Presented by: Adam Holden
Presented to: Carrum Health

1

About Me

  • Name: Adam Holden
  • Role at Current Company: Engineering Manager, DevOps
  • Professional Background: Several years leading DevOps transformations and cloud infrastructure modernization. Experienced in establishing IaC best practices, CI/CD pipelines, and building scalable cloud architectures. Previously led migration from monolithic to microservices architecture, reducing deployment times by 60% while improving reliability and security.
  • Areas of Expertise: Infrastructure as Code, Cloud Architecture, DevOps Practices

2

The Challenge

Inherited Infrastructure State:

  • Single, monolithic Terraform configuration
  • Limited environment separation
  • Minimal state management
  • Inconsistent practices across teams
  • Growing technical debt
  • No clear path for scaling infrastructure needs
"We needed to move from a flat, brittle IAC implementation to something that could support our growth trajectory."
3

My Role & Responsibilities

Primary Responsibilities:

  • Lead IAC modernization initiative
  • Design modular architecture pattern
  • Develop migration strategy with minimal disruption
  • Mentor team on Terragrunt and modular practices
  • Establish new workflows and CI/CD integration
  • Create documentation and governance standards

Team Structure:

  • 1 DevOps Manager (myself)
  • 2 DevOps Engineers
  • Collaboration with 3 development teams
4

Initial Infrastructure State

# Example of our inherited main.tf (simplified)

provider "aws" {
  region = "us-west-2"
}

# Network resources
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  # ... dozens of parameters
}

# Compute resources
resource "aws_instance" "app_servers" {
  count         = 10
  ami           = "ami-12345678"
  instance_type = "t3.medium"
  # ... hundreds of lines of configuration
}

# Database resources
resource "aws_db_instance" "database" {
  allocated_storage = 20
  engine            = "postgres"
  # ... more configuration
}

# Dozens more resources for load balancing, security, etc.

Problems:

No modularity
Hard to maintain
Long apply times
Risky changes
Limited reuse
Development bottlenecks
5

The Vision: Modular Infrastructure

Architecture Diagram

Key Design Goals:

  • Domain-specific modules
  • Environment parity
  • Configuration inheritance
  • Clear dependency management
  • Simplified state management
  • Improved developer experience
6

Solution: Terragrunt + Modular Design

Approach:

  1. Create reusable Terraform modules by domain
  2. Implement Terragrunt for configuration management
  3. Establish environment hierarchy
  4. Design clear dependency chains
  5. Migrate resources incrementally
  6. Integrate with CI/CD pipeline
infrastructure/
├── _envcommon/            
│   ├── networking.hcl
│   ├── compute.hcl
│   └── database.hcl
├── prod/                  
│   ├── terragrunt.hcl     
│   ├── us-west-2/         
│   │   ├── networking/
│   │   ├── compute/
│   │   └── database/
│   └── us-east-1/         
├── staging/               
└── dev/                   
7

Terragrunt Implementation

# Root terragrunt.hcl:
locals {
  environment = path_relative_to_include()
  # Parse account and region from the directory structure
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  region_vars  = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  env_vars     = read_terragrunt_config(find_in_parent_folders("env.hcl"))
  
  # Extract commonly used variables
  aws_region   = local.region_vars.locals.aws_region
  account_id   = local.account_vars.locals.aws_account_id
  environment  = local.env_vars.locals.environment
}

# Generate provider configuration for all child modules
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents  = <
8

Module Example: Networking

Module structure:

modules/
└── networking/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── README.md

_envcommon/networking.hcl:

# Common settings for the networking module
terraform {
  source = "${get_parent_terragrunt_dir()}/modules//networking"
}

# Dependencies
dependencies {
  paths = []
}

# Common input variables
inputs = {
  vpc_name = "carrum-vpc"
  
  # These can be overridden in environment-specific configs
  enable_vpc_flow_logs = true
  vpc_flow_logs_retention_days = 30
}

prod/us-west-2/networking/terragrunt.hcl:

# Include the common configuration
include {
  path = "${find_in_parent_folders()}"
}

# Include the envcommon configuration
include "envcommon" {
  path = "${dirname(find_in_parent_folders())}/_envcommon/networking.hcl"
}

# Override inputs
inputs = {
  vpc_cidr     = "10.0.0.0/16"
  subnet_cidrs = {
    public_a  = "10.0.0.0/24"
    public_b  = "10.0.1.0/24"
    private_a = "10.0.2.0/24"
    private_b = "10.0.3.0/24"
  }
  enable_nat_gateway = true
}
9

Project Execution

Migration Strategy:

The timeline depends on the needs of Carrum Health. The following is a general timeline that can be adjusted based on the needs of the business.

1. Assessment Phase (2 weeks)

  • Document current state
  • Identify resource dependencies
  • Develop migration plan

2. Foundation Setup (3 weeks)

  • Create modular architecture
  • Implement Terragrunt structure
  • Set up state management and locking
  • Establish CI/CD pipeline integration

3. Incremental Migration (12 weeks)

  • Domain by domain migration
  • Starting with least critical resources
  • Parallel infrastructure during transition
  • Gradual cutover with validation

4. Completion & Optimization (3 weeks)

  • Final state cleanup
  • Performance optimization
  • Documentation finalization
  • Team training
10

People Leadership

Team Development:

  • Weekly knowledge sharing sessions on Terragrunt concepts
  • Pair programming for module development
  • Regular code reviews to ensure standards
  • Documentation-first approach
  • Created reusable templates and examples

Stakeholder Management:

  • Bi-weekly demos to development teams
  • Regular updates to engineering leadership
  • Transparent timeline and progress tracking
  • Early identification of blockers and dependencies
11

Key Decisions & Trade-offs

Decision: Module Granularity

Options: Few large modules vs. Many small modules

Choice: Middle ground - domain-focused modules

Rationale: Balance between reusability and management complexity

Decision: State Management

Options: Single state vs. State per environment vs. State per component

Choice: Hybrid approach - state per environment per component

Rationale: Optimal balance of isolation and dependency management

Decision: Migration Approach

Options: Big bang cutover vs. Incremental migration

Choice: Incremental with parallel infrastructure

Rationale: Minimize risk and business disruption

12

Risks & Mitigations

Risk Mitigation
State file corruption Implemented versioning and locking with DynamoDB
Knowledge gaps in team Regular training sessions and pair programming
Disruption to production Blue/green approach for critical infrastructure
Dependency tracking Created visualization tools for module dependencies
Cost management Implemented tagging strategy and regular cost reviews
Resistance to change Early stakeholder involvement and demos
13

Compliance & Security Integration

Security-by-Design:

  • Integrated compliance checks into CI/CD pipeline
  • Implemented security scanning of Terraform plans
  • Created secure module templates with best practices
  • Enhanced IAM permissions with least-privilege approach

Compliance Controls:

  • Added automated documentation for audit trails
  • Implemented encryption by default
  • Enforced tagging for governance
  • Created compliance-as-code modules
14

Outcomes

Technical Results:

  • 75% reduction in apply times
  • 90% reduction in deployment errors
  • Increased infrastructure test coverage from 20% to 85%
  • Successfully migrated 100+ resources to modular approach

Business Impact:

  • Reduced time-to-market for new services by 40%
  • Enabled self-service infrastructure for development teams
  • Improved disaster recovery capabilities
  • Enhanced security posture with consistent controls

Team Benefits:

  • Increased team velocity and confidence
  • Reduced context switching and toil
  • Clear ownership and responsibility boundaries
  • Enhanced collaboration with development teams
15

Learnings & Future Improvements

What Worked Well:

  • Incremental migration approach
  • Heavy investment in documentation
  • Terragrunt for configuration management
  • CI/CD integration early in the process

What I'd Do Differently:

  • Start with stricter module interface definitions
  • Implement automated testing earlier
  • Create more sophisticated dependency visualization
  • More comprehensive cost management from day one
  • More extensive training before implementation
16

Applying These Learnings at Carrum Health

Direct Application:

  • Modular infrastructure approach for scalability
  • Clear separation of concerns in IAC
  • Integration with CI/CD pipeline for infrastructure
  • Security and compliance by design

Adaptations for Carrum Health:

  • Focus on industry-specific compliance requirements
  • Enhanced data security controls
  • Integration with existing DevOps practices
  • Alignment with cloud strategy
17

Leveraging New Tools

Gen AI Applications:

  • Using AI for infrastructure documentation generation - Confluence
  • AI-assisted code reviews for Terraform - coderabbit.ai
  • Automated dependency analysis
  • Cost optimization recommendations

Modern Tooling:

  • Infrastructure drift detection
  • Policy-as-code with Open Policy Agent
  • Infrastructure observability tools
  • Cost and usage monitoring
18

First 90 Days at Carrum Health

Days 1-30: Assessment & Understanding

  • Understand current infrastructure landscape
  • Review existing IAC practices and pain points
  • Identify key stakeholders and priorities
  • Assess security and compliance requirements
  • Develop initial improvement roadmap

Days 31-60: Foundation Building

  • Establish core infrastructure standards
  • Begin implementing modular approach
  • Set up CI/CD pipeline for infrastructure
  • Create initial documentation framework
  • Deliver quick wins to demonstrate value

Days 61-90: Scaling & Optimization

  • Expand modular infrastructure approach
  • Implement advanced security controls
  • Establish metrics and monitoring
  • Begin knowledge transfer and training
  • Develop long-term infrastructure strategy
19

Questions?

Thank you for your attention!

I'm happy to answer any questions about:

Technical implementation details
Migration strategy and challenges
Team development and collaboration
Future infrastructure roadmap
Or anything else related to this project
20

Additional Resources

References:

21