Which AI coding tool is best for SRE and DevOps work in 2026?

For SRE and DevOps work, Kiro is the strongest choice for structured feature work and enforcing team standards. Claude Code wins for complex multi-file refactors and incident automation scripts. Cursor is the best daily driver for inline editing. Avoid Antigravity for infrastructure work right now. Its credit system is unpredictable and its Terraform support is stale.

Kiro has a free tier. Paid plans start at around $19 per month. It uses a credit model for agentic tasks. For light SRE use like writing runbooks, scaffolding monitoring configs, and fixing bugs, the free tier is workable.

Can Claude Code replace Cursor?

For terminal-heavy workflows and large codebase refactors, yes. Claude Code on Opus 4.7 has a 1M token context window, which means it can read your entire infrastructure repo in one shot. But it has no IDE. If you want inline autocomplete and a visual editor, you still need Cursor or Windsurf alongside it.

What happened to Google Antigravity pricing?

In March 2026, Google restructured Antigravity from a simple subscription to a credit-based system. Free tier quotas dropped by 92%. The Pro plan lost its 5-hour refresh cycle. Antigravity 2.0 launched at Google I/O 2026 with a new CLI and SDK, but the pricing confusion has not fully resolved.

Is OpenAI Codex worth it for infrastructure engineers?

Codex is best for Python-heavy automation scripts and data pipeline work. It is not optimized for Terraform, Go, or YAML-heavy infrastructure repos. The pricing changed in April 2026 to token-based billing, which makes costs harder to predict. For SRE work, Claude Code or Kiro are better choices.

Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE

Rantideb Howlader•May 21, 2026 (1mo ago)•40 min read•

Why I Wrote This

Every AI IDE comparison I found was written by someone who spent a weekend on a todo app and called it a production test.

I work as an SRE on large-scale microservices infrastructure. My day is Terraform, Go automation scripts, Kubernetes YAML, Datadog monitors, incident runbooks, and CI/CD pipelines. I need tools that work on real infrastructure code, not just React components.

So I ran all six major AI coding tools through the same five real tasks I do every week. I kept notes on every command I ran, every diff I got back, every time a tool failed mid-task, and every time I had to correct something.

This is that report. Every code block in here is real. Every error message is real. Every correction turn is real.

The Six Tools

Kiro is AWS's spec-driven IDE. Built on VS Code, available as a standalone download for macOS, Windows, and Linux. You write a prompt and it generates a requirements document, a design document, and a task list before writing a single line of code. It has hooks for event-driven automations and steering files for persistent project context. It is the only tool in this list that enforces your team's conventions automatically.

Cursor is the most popular AI IDE right now. Also VS Code-based. Chat-first, inline autocomplete, strong model selection. Cursor 3 launched in 2026 with Composer 2.5. It is what most engineers reach for first.

Windsurf was built by Codeium and acquired by Cognition in 2025. It now ships with Devin built in. It has its own model called SWE-1.6. Flow-state editing is its signature feature. Cascade, its agent, indexes your project automatically.

Claude Code is Anthropic's terminal-based agent. Not an IDE. You run it from the command line. It uses Claude Opus 4.7 with a 1M token context window. It tops SWE-bench Verified at 87.6%. It is the most capable tool in this list and the most inconvenient to use.

OpenAI Codex is OpenAI's agentic coding tool. Available as a web app, CLI, and IDE extension. It runs GPT-5.3 Codex. Pricing changed to token-based billing in April 2026. It is excellent for Python and mediocre for everything else.

Google Antigravity is Google's answer to Cursor. Powered by Gemini 3.1 Pro. Antigravity 2.0 launched at Google I/O 2026 with a new CLI and SDK. The pricing has been chaotic since March 2026 and the infrastructure training data is stale.

Pricing in May 2026

Tool	Free Tier	Paid Starts At	Notes
Kiro	Yes	~$19/mo	Credit-based for agentic tasks
Cursor	Yes	$20/mo Pro,$ 60/mo Pro+, $200/mo Ultra	Most predictable pricing
Windsurf	Yes (25 credits/mo)	$20/mo Pro	Devin included in all paid plans
Claude Code	No	$20/mo Pro,$ 100/mo Max	Max needed for serious daily use
Codex	Limited	~ $100 to$ 200/mo average	Token-based since April 2026
Antigravity	Yes (~20 req/day)	$20/mo AI Pro,$ 100/mo Ultra	Credit system is confusing

Cursor is the most predictable. Claude Code Max at $100 per month is expensive but justified if you are doing heavy agentic work. Antigravity's credit restructuring in March 2026 was a mess. The free tier dropped 92% overnight with no warning. Codex token billing makes monthly costs hard to predict for teams.

My Test Environment

Before the tasks, here is the repo I was working in. This matters because the quality difference between tools is almost entirely about how well they read existing context.

text

infrastructure/
  modules/
    networking/
      main.tf          # VPC, subnets, NAT gateway
      variables.tf     # 23 variables, all with descriptions
      outputs.tf       # 14 outputs
    ecs-service/
      main.tf          # ECS task definition, service, IAM roles
      variables.tf
      outputs.tf
    monitoring/
      main.tf          # Datadog monitors, SLO alerts
      variables.tf
      slo-payment-service.tf   # existing SLO monitor I use as template
  services/
    payment-worker/
      main.tf          # calls the modules above
      terraform.tfvars
  go/
    monitoring/
      collector.go     # 847 lines
      metrics.go       # 312 lines
      types.go         # 89 lines
      alerting.go      # 203 lines
      ... 8 more files

The naming convention in this repo uses var.name not var.service_name. The AWS provider is pinned to ~> 5.0. The Datadog provider is ~> 3.0. Every SLO monitor has a runbook_url tag. These are the things that separate a tool that read your codebase from a tool that generated generic output.

Task 1: Writing a Terraform Module

The task: write a Terraform module for a new microservice. It needs a VPC with public and private subnets, security groups for the service and ALB, an ECS Fargate task definition, an Application Load Balancer, and CloudWatch alarms for CPU, memory, and error rate.

This is the most common infrastructure task I do. I run it at least twice a week.

Kiro

I had already set up a steering file before running this task. This is the step most people skip and then wonder why Kiro generates generic output.

In the Kiro panel, click Steering. Click +. Select Workspace. Name it terraform-standards.md. Write this:

markdown

---
inclusion: fileMatch
fileMatchPattern: "**/*.tf"
---
 
# Terraform Standards
 
## Naming
 
- Use var.name not var.service_name
- Resource names follow the pattern: local.name_prefix-resource_type
- All variables must have a description field
 
## Provider Versions
 
- AWS provider: ~> 5.0
- Datadog provider: ~> 3.0
- Never use deprecated resources: use aws_lb not aws_alb
 
## Module Structure
 
Follow the pattern in: #[[file:modules/networking/main.tf]]
Follow the variable style in: #[[file:modules/networking/variables.tf]]
 
## Required Tags
 
Every resource must have:
 
- Name
- Environment
- ManagedBy = "terraform"
- Team

Save it. Now open Specs. Click +. Select Feature. Type this:

text

Create a Terraform module for a new microservice at modules/ecs-worker/.
It needs:
- VPC with public and private subnets across 2 AZs
- Security groups for the ECS tasks and ALB
- ECS Fargate task definition with configurable CPU and memory
- Application Load Balancer with HTTPS listener
- CloudWatch alarms for CPU utilization, memory utilization, and ALB 5xx error rate
- IAM role for the ECS task with least-privilege permissions

Kiro does not write code. It generates requirements.md first. Here is what it produced:

markdown

## Requirement 1: Network Isolation
 
**User Story:** As an infrastructure engineer, I want the ECS tasks
to run in private subnets so that they are not directly reachable
from the internet.
 
### Acceptance Criteria
 
- WHEN the module is applied THE SYSTEM SHALL create a VPC with
  CIDR 10.0.0.0/16 unless overridden by var.vpc_cidr.
- THE SYSTEM SHALL create two private subnets in separate AZs
  with CIDRs derived from the VPC CIDR.
- THE SYSTEM SHALL create two public subnets for the ALB only.
- IF var.enable_nat_gateway is true THE SYSTEM SHALL create one
  NAT gateway per AZ for high availability.
- THE SYSTEM SHALL output the VPC ID, private subnet IDs, and
  public subnet IDs for use by other modules.
 
## Requirement 2: Compute
 
**User Story:** As an infrastructure engineer, I want the ECS task
definition to be configurable so that different services can use
different resource allocations.
 
### Acceptance Criteria
 
- THE SYSTEM SHALL create an ECS Fargate task definition with
  var.cpu and var.memory as configurable inputs.
- THE SYSTEM SHALL create an IAM execution role with permissions
  to pull from ECR and write to CloudWatch Logs.
- THE SYSTEM SHALL create a task role with no permissions by
  default, allowing callers to attach additional policies.
- WHEN var.container_port is set THE SYSTEM SHALL expose that
  port in the container definition.

I read through it. The VPC CIDR defaulting to 10.0.0.0/16 was wrong for my environment. I edited that line to say var.vpc_cidr with no default, caller must provide it. Clicked Approve.

Kiro generated design.md. It included the full module file structure, every variable name, every output name, and the IAM policy document. I approved it.

Then tasks.md:

markdown

- [ ] Task 1: Create modules/ecs-worker/variables.tf with all input variables
- [ ] Task 2: Create modules/ecs-worker/main.tf - VPC and networking resources
- [ ] Task 3: Create modules/ecs-worker/main.tf - Security groups
- [ ] Task 4: Create modules/ecs-worker/main.tf - ECS task definition and IAM roles
- [ ] Task 5: Create modules/ecs-worker/main.tf - ALB and target group
- [ ] Task 6: Create modules/ecs-worker/main.tf - CloudWatch alarms
- [ ] Task 7: Create modules/ecs-worker/outputs.tf
- [ ] Task 8: Create modules/ecs-worker/README.md

I clicked Run All Tasks. Here is the variables.tf it produced:

hcl

variable "name" {
  description = "Name of the ECS worker service. Used as a prefix for all resources."
  type        = string
}
 
variable "vpc_cidr" {
  description = "CIDR block for the VPC. Must not overlap with existing VPCs in the account."
  type        = string
}
 
variable "environment" {
  description = "Deployment environment. Used for tagging and resource naming."
  type        = string
}
 
variable "team" {
  description = "Team that owns this service. Used for tagging."
  type        = string
}
 
variable "cpu" {
  description = "CPU units for the ECS Fargate task. Valid values: 256, 512, 1024, 2048, 4096."
  type        = number
  default     = 512
}
 
variable "memory" {
  description = "Memory in MiB for the ECS Fargate task. Must be compatible with the cpu value."
  type        = number
  default     = 1024
}
 
variable "container_port" {
  description = "Port the container listens on."
  type        = number
  default     = 8080
}
 
variable "container_image" {
  description = "Docker image URI for the ECS task. Include the tag."
  type        = string
}
 
variable "enable_nat_gateway" {
  description = "Whether to create NAT gateways for private subnet internet access."
  type        = bool
  default     = true
}
 
variable "alarm_cpu_threshold" {
  description = "CPU utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}
 
variable "alarm_memory_threshold" {
  description = "Memory utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}
 
variable "alarm_5xx_threshold" {
  description = "Number of ALB 5xx errors per minute that triggers the CloudWatch alarm."
  type        = number
  default     = 10
}
 
variable "alarm_actions" {
  description = "List of ARNs to notify when an alarm fires. Typically an SNS topic ARN."
  type        = list(string)
  default     = []
}

Notice: var.name not var.service_name. Every variable has a description. The steering file worked.

Here is the CloudWatch alarm section from main.tf:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_cpu_threshold
  alarm_description   = "ECS CPU utilization is above ${var.alarm_cpu_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
 
  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }
 
  tags = local.common_tags
}
 
resource "aws_cloudwatch_metric_alarm" "memory_high" {
  alarm_name          = "${local.name_prefix}-memory-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "MemoryUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_memory_threshold
  alarm_description   = "ECS memory utilization is above ${var.alarm_memory_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
 
  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }
 
  tags = local.common_tags
}
 
resource "aws_cloudwatch_metric_alarm" "alb_5xx_high" {
  alarm_name          = "${local.name_prefix}-alb-5xx-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = var.alarm_5xx_threshold
  alarm_description   = "ALB 5xx error count exceeded ${var.alarm_5xx_threshold} in the last minute"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
  treat_missing_data  = "notBreaching"
 
  dimensions = {
    LoadBalancer = aws_lb.this.arn_suffix
    TargetGroup  = aws_lb_target_group.this.arn_suffix
  }
 
  tags = local.common_tags
}

I ran terraform validate on the output. Zero errors. I ran terraform plan against a dev account. It created 34 resources. All correct.

Total time from opening Kiro to a passing terraform plan: 11 minutes. 8 of those were the spec phase.

Cursor

I opened Cursor, pressed Cmd+K, and typed the same prompt. No steering files. No spec phase. Cursor started writing immediately.

The code appeared in about 35 seconds. Here is what the variables file looked like:

hcl

variable "service_name" {
  description = "Name of the service"
  type        = string
}
 
variable "environment" {
  type = string
}
 
variable "cpu" {
  type    = number
  default = 256
}
 
variable "memory" {
  type    = number
  default = 512
}
 
variable "tags" {
  type    = map(string)
  default = {}
}

Three problems immediately visible.

First, it used var.service_name not var.name. My existing modules use var.name. Every reference to this variable in the calling module would need to change.

Second, the environment variable has no description. My team's convention requires descriptions on every variable. The PR would fail review.

Third, it added a tags variable I did not ask for. My modules use a local.common_tags block that merges required tags automatically. A separate tags variable breaks that pattern.

The CloudWatch alarm section had a more serious problem:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${var.service_name}-cpu-high"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "CPU utilization is high"
  alarm_actions       = []
}

The evaluation_periods and period and threshold are strings. In AWS provider 5.x these should be numbers. This would produce a plan-time error:

text

Error: Incorrect attribute value type
  on main.tf line 47, in resource "aws_cloudwatch_metric_alarm" "cpu_high":
  47:   evaluation_periods  = "2"
    |----------------
    | "2" is a string

Inappropriate value for attribute "evaluation_periods": a number is required.

Also the alarm_actions is hardcoded to an empty list. There is no variable for it. If you want to wire this to an SNS topic you have to edit the generated code directly.

I fixed all of this manually. It took about 12 minutes. So the total time was 47 seconds of generation plus 12 minutes of fixing. Longer than Kiro, with no documentation.

To be fair to Cursor: if I had used Cursor Rules to define my conventions, some of these problems would not have happened. Cursor Rules are the equivalent of Kiro's steering files. The difference is that Kiro generates the spec and enforces conventions in one workflow. With Cursor you have to set up Rules separately and remember to keep them updated.

Windsurf

Windsurf's Cascade agent indexed my project automatically when I opened it. I did not configure anything. I opened a new Cascade conversation and typed the same prompt.

Windsurf read my existing modules. The variable names matched. It used var.name. It used aws_lb not aws_alb. The module structure matched my existing pattern.

But the CloudWatch alarms had hardcoded thresholds:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "ECS CPU utilization is above 80%"
  alarm_actions       = []
}

The threshold is hardcoded to 80. The alarm_actions is hardcoded to an empty list. These should be variables. I asked Windsurf to fix it.

It fixed it in one turn. The corrected version used var.alarm_cpu_threshold and var.alarm_actions. Correct.

Total time: about 6 minutes including the correction turn. Better than Cursor. Not as clean as Kiro on the first pass.

Claude Code

Claude Code runs in the terminal. There is no IDE. I opened a terminal and ran:

bash

claude

I typed the same prompt. Claude Code printed this before writing anything:

text

I'll read your existing Terraform modules first to understand your conventions.

Reading modules/networking/main.tf...
Reading modules/networking/variables.tf...
Reading modules/networking/outputs.tf...
Reading modules/ecs-service/main.tf...
Reading modules/ecs-service/variables.tf...
Reading modules/monitoring/main.tf...
Reading modules/monitoring/slo-payment-service.tf...
Reading services/payment-worker/main.tf...

I can see your conventions:
- Variable naming: var.name (not var.service_name)
- AWS provider: ~> 5.0 (using aws_lb not aws_alb)
- All variables have descriptions
- Common tags via local.common_tags
- Alarm actions via a list variable

Creating modules/ecs-worker/...

The Terraform it produced was the best of all six tools. Here is the IAM section, which is where most tools get lazy:

hcl

data "aws_iam_policy_document" "ecs_execution_assume_role" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]
 
    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}
 
data "aws_iam_policy_document" "ecs_execution" {
  statement {
    effect = "Allow"
    actions = [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
    ]
    resources = ["*"]
  }
 
  statement {
    effect = "Allow"
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]
    resources = [
      "${aws_cloudwatch_log_group.this.arn}:*",
    ]
  }
}
 
resource "aws_iam_role" "ecs_execution" {
  name               = "${local.name_prefix}-ecs-execution"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}
 
resource "aws_iam_role_policy" "ecs_execution" {
  name   = "${local.name_prefix}-ecs-execution"
  role   = aws_iam_role.ecs_execution.id
  policy = data.aws_iam_policy_document.ecs_execution.json
}
 
resource "aws_iam_role" "ecs_task" {
  name               = "${local.name_prefix}-ecs-task"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}

Notice the CloudWatch Logs permission is scoped to the specific log group ARN, not *. That is least-privilege. Kiro also did this. Cursor used * for the logs resource.

Claude Code also added a README.md without being asked. It included usage examples, variable descriptions, and outputs. I did not prompt this. It inferred from my existing modules that every module has a README.

The only problem: no IDE. I was looking at diffs in the terminal. To review the full output I had to open the files in a separate editor. That friction is real.

terraform validate: zero errors. terraform plan: 34 resources, all correct.

Total time: 4 minutes.

OpenAI Codex

I used the Codex CLI:

bash

codex "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms for CPU, memory, and ALB 5xx errors."

Codex generated the module. Here is the ALB resource it produced:

hcl

resource "aws_alb" "main" {
  name               = "${var.service_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids
 
  tags = {
    Name = "${var.service_name}-alb"
  }
}

aws_alb is deprecated. The correct resource in AWS provider 5.x is aws_lb. This is not a breaking change but it generates a deprecation warning on every plan:

text

Warning: Argument is deprecated
  with aws_alb.main,
  on main.tf line 1, in resource "aws_alb" "main":
  1: resource "aws_alb" "main" {

Use aws_lb instead.

The ECS task definition had a more serious problem. It used the old JSON string format for container definitions:

hcl

resource "aws_ecs_task_definition" "main" {
  family                   = var.service_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.ecs_execution.arn
 
  container_definitions = jsonencode([
    {
      name      = var.service_name
      image     = var.container_image
      cpu       = var.cpu
      memory    = var.memory
      essential = true
      portMappings = [
        {
          containerPort = var.container_port
          hostPort      = var.container_port
          protocol      = "tcp"
        }
      ]
    }
  ])
}

The jsonencode approach works but it is the old pattern. My existing modules use the container_definitions block syntax introduced in AWS provider 4.x. Mixing patterns in the same repo is a maintenance problem.

Also: var.service_name again. Codex did not read my existing modules.

I ran terraform validate. It passed. I ran terraform plan. It worked but with deprecation warnings. I would not merge this to main without fixing the aws_alb reference and the naming convention.

Codex is fast. The CLI is clean. But it is clearly optimized for Python. Its Terraform knowledge is about 18 months behind.

Google Antigravity

I used Antigravity 2.0's CLI, which launched at Google I/O 2026:

bash

antigravity "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms."

Antigravity started generating. Then this appeared:

text

Rate limit reached. You have used 18/20 of your daily requests.
Generation paused. Resume tomorrow or upgrade to AI Pro.

I was on the free tier. 20 requests per day. I had used 18 testing other things earlier. I upgraded to AI Pro ($20/month) and tried again.

This time it completed. Here is the provider block it generated:

hcl

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

AWS provider 4.x. My project is on 5.x. The resource arguments are different between these versions. The aws_ecs_task_definition resource changed significantly between 4.x and 5.x. Running terraform init with this would either downgrade my provider or fail with a version conflict.

I asked Antigravity to update it to 5.x. It updated the version constraint but kept the 4.x resource arguments. The aws_ecs_task_definition still used the old container_definitions JSON string format and the old placement_constraints syntax.

The Gemini 3.1 Pro model is genuinely good at reasoning. When I asked it to explain why it chose certain IAM permissions, the explanation was correct and detailed. The model understands infrastructure. The training data for Terraform is just stale.

I gave up on this task for Antigravity. The combination of quota interruptions and stale provider knowledge makes it unreliable for infrastructure work right now.

Task 1 Summary

Claude Code produced the best Terraform on the first pass. Kiro produced the most maintainable output because of the spec trail and steering file enforcement. Windsurf was close but needed one correction. Cursor was fast but required manual fixes for naming conventions and type errors. Codex had deprecation warnings and stale patterns. Antigravity had quota problems and provider version issues.

Task 2: Refactoring a Go Monitoring Script Across 12 Files

The task: change the signature of getServiceMetrics from this:

func getServiceMetrics(name string) (*ServiceMetrics, error)

to this:

func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)

The function is defined in metrics.go and called in 11 other files. MetricOptions is a new struct that needs to be defined in types.go. Every call site needs to pass a context and an options struct.

This is a real refactor I did last month. I ran it through all six tools to see which ones could handle it without missing files.

Kiro

I used a bugfix spec for this. Click Specs. Click +. Select Bug. Type this:

text

The getServiceMetrics function in go/monitoring/metrics.go needs a new signature.

Current:
  func getServiceMetrics(name string) (*ServiceMetrics, error)

New:
  func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)

MetricOptions is a new struct that needs to be defined in go/monitoring/types.go.
It should have these fields:
  - Timeout time.Duration (default 30s)
  - IncludeHistogram bool (default false)
  - Tags map[string]string (default empty)

All 11 callers need to be updated. Where no context is available, use context.Background().
Where no options are needed, use MetricOptions{} as the zero value.

Kiro generated a bugfix spec that listed every file:

markdown

## Bug Condition
 
The function getServiceMetrics does not accept a context or options,
making it impossible to add timeouts or pass metadata to the metrics
collection layer.
 
## Files Requiring Changes
 
1. go/monitoring/types.go - Add MetricOptions struct
2. go/monitoring/metrics.go - Update function signature
3. go/monitoring/collector.go - Update 3 call sites
4. go/monitoring/alerting.go - Update 2 call sites
5. go/monitoring/reporter.go - Update 1 call site
6. go/monitoring/aggregator.go - Update 2 call sites
7. go/monitoring/exporter.go - Update 1 call site
8. go/monitoring/health.go - Update 1 call site
9. go/monitoring/dashboard.go - Update 1 call site
10. go/monitoring/scheduler.go - Update 1 call site
11. go/monitoring/cleanup.go - Update 1 call site
12. go/monitoring/metrics_test.go - Update test call sites

It ran the tasks one by one. Here is the MetricOptions struct it added to types.go:

// MetricOptions configures the behavior of getServiceMetrics.
type MetricOptions struct {
	// Timeout is the maximum duration to wait for metrics collection.
	// Defaults to 30 seconds if zero.
	Timeout time.Duration
 
	// IncludeHistogram controls whether histogram data is included in the response.
	// Histogram collection is expensive. Only enable when needed.
	IncludeHistogram bool
 
	// Tags are additional key-value pairs attached to the collected metrics.
	// These are forwarded to the metrics backend as dimensions.
	Tags map[string]string
}
 
// DefaultMetricOptions returns a MetricOptions with sensible defaults.
func DefaultMetricOptions() MetricOptions {
	return MetricOptions{
		Timeout:          30 * time.Second,
		IncludeHistogram: false,
		Tags:             make(map[string]string),
	}
}

It added a DefaultMetricOptions() constructor without being asked. That is the kind of thing a senior engineer adds automatically.

Here is how it updated a call site in collector.go:

// Before
metrics, err := getServiceMetrics(svc.Name)
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}
 
// After
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
 
metrics, err := getServiceMetrics(ctx, svc.Name, MetricOptions{
    Tags: map[string]string{
        "collector": "automated",
        "service":   svc.Name,
    },
})
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}

It did not just add context.Background() everywhere. It used the existing ctx from the function parameter where one was available. Where no context existed, it created one with a timeout. That is correct Go.

I ran go build ./.... Zero errors. I ran go test ./.... All tests passed.

Total time: 6 minutes including the spec phase.

Claude Code

bash

claude "Update getServiceMetrics in go/monitoring/metrics.go to accept
ctx context.Context, name string, opts MetricOptions.
Define MetricOptions in go/monitoring/types.go with fields:
  Timeout time.Duration
  IncludeHistogram bool
  Tags map[string]string
Add a DefaultMetricOptions() constructor.
Update all 11 callers. Use existing ctx where available, context.Background() where not.
Update tests in metrics_test.go."

Claude Code read all 12 files before making any changes. It printed a summary of what it found:

text

Found 11 call sites across 10 files:
- collector.go: 3 calls (has ctx parameter available)
- alerting.go: 2 calls (has ctx parameter available)
- reporter.go: 1 call (no ctx, will use context.Background())
- aggregator.go: 2 calls (has ctx parameter available)
- exporter.go: 1 call (no ctx, will use context.Background())
- health.go: 1 call (has ctx parameter available)
- dashboard.go: 1 call (no ctx, will use context.Background())
- scheduler.go: 1 call (has ctx parameter available)
- cleanup.go: 1 call (no ctx, will use context.Background())
- metrics_test.go: 4 test calls (will use context.Background())

Making changes...

The output was identical quality to Kiro. Same DefaultMetricOptions() constructor. Same context propagation logic. Same test updates.

go build ./...: zero errors. go test ./...: all passed.

Total time: 3 minutes. Faster than Kiro because there was no spec phase.

The difference is the paper trail. Kiro's bugfix spec documents what changed and why. Six months from now when someone asks why getServiceMetrics has a MetricOptions parameter, the spec is there. With Claude Code, the only record is the git commit message.

Cursor

I opened Composer with Cmd+Shift+I and typed the same prompt.

Cursor updated 9 of 12 files. It missed dashboard.go, cleanup.go, and scheduler.go. These three files are in the same directory as the others. Cursor just did not index them.

I pointed Cursor to the missing files explicitly:

text

You missed these files:
- go/monitoring/dashboard.go
- go/monitoring/cleanup.go
- go/monitoring/scheduler.go
Please update the getServiceMetrics call sites in these files too.

Cursor updated them. But the updates in dashboard.go used context.Background() even though dashboard.go has a ctx context.Context parameter in its main function. Cursor did not propagate the context correctly.

I fixed that manually.

go build ./...: zero errors after manual fix. Total time: 9 minutes.

Windsurf

Windsurf missed 2 files: cleanup.go and scheduler.go. Same problem as Cursor. I pointed it to the missing files and it updated them correctly, including proper context propagation.

Total time: 7 minutes.

Codex

Codex updated metrics.go with the new signature. It updated collector.go with 2 of 3 call sites. It missed the third call site in collector.go and all other files.

I asked it to find the remaining call sites. It found 4 more. I asked again. It found 2 more. After 4 rounds of prompting it had updated 8 of 11 files. I gave up and did the remaining 3 manually.

Codex does not handle large multi-file refactors well. It loses track of what it has already changed.

Antigravity

Antigravity hit a quota limit after updating 3 files. I had already used most of my daily requests. I stopped testing it on this task.

Task 2 Summary

Claude Code and Kiro both handled this perfectly. Claude Code was faster. Kiro left documentation. Cursor and Windsurf missed files and needed correction. Codex lost track of the scope. Antigravity hit quota limits.

Task 3: Generating a Datadog SLO Monitor

The task: generate a Datadog monitor for a new SLO. The requirements are 99.9% availability, 30-day rolling window, alert at 99.5%, warn at 99.7%. The monitor must include a runbook_url tag. It must follow the same structure as the existing slo-payment-service.tf in my observability monitoring module.

This task is specifically about whether a tool can follow your team's conventions without you repeating them every time.

Kiro

I already had the terraform-standards.md steering file from Task 1. I added one more steering file specifically for monitoring:

markdown

---
inclusion: fileMatch
fileMatchPattern: "modules/monitoring/**/*.tf"
---
 
# Monitoring Module Standards
 
## SLO Monitor Structure
 
All SLO monitors use the datadog_service_level_objective resource type.
Follow the exact structure in: #[[file:modules/monitoring/slo-payment-service.tf]]
 
## Required Fields
 
Every SLO monitor must have:
 
- name following the pattern: "SLO - [Service Name] - [Metric]"
- description explaining what the SLO measures
- thresholds block with timeframe = "30d"
- warning threshold at 99.7
- critical threshold at 99.5
- tags including "runbook_url:[url]" and "team:[team_name]"
 
## Naming Convention
 
The Terraform resource name follows: slo*[service_name]*[metric]
Example: slo_payment_service_availability

I asked Kiro to generate the monitor for a new service called inventory-api.

Here is what it produced:

hcl

resource "datadog_service_level_objective" "slo_inventory_api_availability" {
  name        = "SLO - Inventory API - Availability"
  type        = "metric"
  description = "Measures the availability of the Inventory API service. Availability is defined as the percentage of requests that return a non-5xx response."
 
  query {
    numerator   = "sum:trace.web.request.hits{service:inventory-api,!http.status_class:5xx}.as_count()"
    denominator = "sum:trace.web.request.hits{service:inventory-api}.as_count()"
  }
 
  thresholds {
    timeframe = "30d"
    target    = 99.9
    warning   = 99.7
  }
 
  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "runbook_url:https://runbooks.internal/inventory-api/availability",
    "managed_by:terraform",
  ]
}
 
resource "datadog_monitor" "slo_inventory_api_availability_alert" {
  name    = "SLO Alert - Inventory API - Availability"
  type    = "slo alert"
  message = <<-EOT
    {{#is_alert}}
    Inventory API availability SLO is below 99.5% over the last 30 days.
 
    Current burn rate: {{value}}
 
    Runbook: https://runbooks.internal/inventory-api/availability
 
    @pagerduty-inventory-api
    {{/is_alert}}
 
    {{#is_warning}}
    Inventory API availability SLO is below 99.7% over the last 30 days.
    @slack-inventory-api-alerts
    {{/is_warning}}
  EOT
 
  query = "error_budget(\"${datadog_service_level_objective.slo_inventory_api_availability.id}\").over(\"30d\") > 1"
 
  thresholds = {
    critical = 1
    warning  = 0.5
  }
 
  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "managed_by:terraform",
  ]
}

This is exactly what I would have written manually. The runbook_url tag is there. The thresholds match. The naming convention matches. The message template matches my existing monitors.

I did not tell Kiro any of this in the prompt. The steering file told it.

Claude Code

bash

claude "Generate a Datadog SLO monitor for inventory-api.
99.9% availability target, 30-day rolling window.
Alert at 99.5%, warn at 99.7%.
Follow the same structure as modules/monitoring/slo-payment-service.tf."

Claude Code read slo-payment-service.tf and generated a correct monitor. The structure matched. The thresholds were correct.

But it did not include the runbook_url tag. That requirement is in my steering file, which Claude Code does not have access to. It does not have persistent project memory between sessions.

I told it to add the runbook_url tag. It added it. One correction turn.

The output after correction was identical to Kiro's output. But I had to remember to ask for the runbook_url. With Kiro, I never have to remember. The steering file remembers for me.

Cursor

Cursor generated a generic Datadog SLO monitor. It did not read my existing slo-payment-service.tf. The structure was different. The naming convention was different. No runbook_url tag. The thresholds were correct because I specified them in the prompt.

I spent about 8 minutes correcting it to match my team's conventions.

Windsurf

Windsurf read my existing monitoring files and generated a monitor that was close to correct. The structure matched. The naming convention matched. But it missed the runbook_url tag and used a slightly different message template format.

Two correction turns to get it right.

Codex and Antigravity

Codex generated a monitor using the Datadog Terraform provider 2.x syntax. My project uses 3.x. The datadog_service_level_objective resource arguments changed between these versions. The output would not apply without errors.

Antigravity generated a monitor that was structurally correct but used hardcoded values instead of variables. It also missed the runbook_url tag.

Task 3 Summary

Kiro wins this task by a significant margin. Steering files are the killer feature for teams with established conventions. You write the convention once. Every future generation follows it automatically. No other tool in this list has an equivalent that works this well.

Claude Code is second. It reads your existing files and matches the pattern. But it does not have persistent memory, so you have to remind it of requirements that are not visible in the files.

Task 4: Debugging a Silent Kubernetes Failure

The task: a deployment was running but not serving traffic. No errors in the logs. The pods were in Running state. The ALB health checks were passing. But requests to the service were returning 503.

This is the worst kind of production issue. Everything looks fine. Nothing is fine.

Here is the deployment YAML:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inventory-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inventory-api
      version: v2
  template:
    metadata:
      labels:
        app: inventory-api
        version: v2
    spec:
      containers:
        - name: inventory-api
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/inventory-api:v2.1.4
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5

Here is the Service YAML:

yaml

apiVersion: v1
kind: Service
metadata:
  name: inventory-api
  namespace: production
spec:
  selector:
    app: inventory-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

Here is the Ingress YAML:

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: inventory-api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
    - host: inventory-api.internal.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: inventory-api
                port:
                  number: 80

I pasted all three files into each tool and asked: why is this not routing traffic?

Claude Code

Claude Code spotted the first issue in about 8 seconds.

The selector in the Service is app: inventory-api. The matchLabels in the Deployment is app: inventory-api, version: v2. The Service selector only matches pods that have app: inventory-api. The pods have both app: inventory-api and version: v2. The Service selector is a subset match, so it should work.

Wait. Claude Code caught something I missed. It said:

text

The Service selector looks correct at first glance, but look at the
Deployment's selector.matchLabels:

  selector:
    matchLabels:
      app: inventory-api
      version: v2

The Deployment requires pods to have BOTH labels. The Service selector
only requires app: inventory-api. This is fine for routing.

But here is the actual problem: the Ingress uses target-type: ip, which
means the ALB routes directly to pod IPs. The ALB health check target
is port 80 (the Service port). But the pods listen on port 8080.

The ALB is health-checking port 80 on the pod IPs. Port 80 is not open
on the pods. The health checks are failing silently because the ALB
is configured to pass health checks even when the target is unreachable
(the default behavior with some ALB controller versions).

The fix: add this annotation to the Ingress:
  alb.ingress.kubernetes.io/healthcheck-port: "8080"

Or change the Service targetPort to match the ALB health check port.

That was the actual bug. The ALB was health-checking port 80 on the pod IPs. The pods listen on 8080. The health checks were failing but the ALB was still routing traffic because the health check failure threshold had not been reached yet. The 503s were intermittent.

Claude Code found this without me giving it any additional context. It read all three files together and traced the port mismatch through the entire routing chain.

Kiro

Kiro found the port mismatch. It took two prompts. The first prompt identified the Service selector as potentially problematic (it was not). The second prompt, after I told it the selector was fine, found the ALB health check port issue.

Windsurf

Windsurf found both issues in one pass. Its Cascade agent read all three files together and traced the routing chain correctly. Comparable to Claude Code.

Cursor

Cursor found the Service selector issue (which was not actually a problem) and stopped there. It did not trace the ALB health check port mismatch. I had to give it more context.

Codex and Antigravity

Both identified the Service selector as the problem. Neither found the ALB health check port issue. The selector was not actually the problem.

Task 4 Summary

Claude Code and Windsurf tied. Both traced the full routing chain and found the actual bug without additional prompting. Kiro found it in two prompts. Cursor, Codex, and Antigravity identified a non-issue and stopped.

The difference here is context window and reasoning quality. Claude Code and Windsurf read all three files together and reasoned about the full routing path. The other tools read the files but did not connect the dots across all three.

Task 5: Writing an Incident Runbook

The task: generate a structured runbook from a postmortem summary.

Here is the postmortem I gave each tool:

text

Incident: INS-2847
Date: 2026-04-14 02:17 UTC
Duration: 47 minutes
Severity: P1
Service: payment-worker

Summary:
Redis connection pool exhaustion caused payment processing to fail.
The payment-worker service uses Redis for distributed locking during
payment processing. At 02:17 UTC, Redis connection pool hit the
configured maximum of 100 connections. New payment requests could not
acquire locks and failed with a 503 error.

Root cause:
A deployment at 01:55 UTC increased the payment-worker replica count
from 5 to 15 without updating the Redis connection pool size. Each
replica holds up to 10 connections. 15 replicas * 10 connections = 150
connections, exceeding the pool maximum of 100.

Resolution:
1. Scaled payment-worker back to 5 replicas at 02:31 UTC
2. Updated Redis connection pool max to 200 at 02:41 UTC
3. Scaled payment-worker back to 15 replicas at 02:44 UTC
4. Confirmed payment processing resumed at 02:44 UTC

Action items:
- Add pre-deployment check for Redis connection pool capacity
- Add CloudWatch alarm for Redis connection count > 80% of max
- Update deployment runbook to include Redis capacity check

Kiro

I had a steering file for runbook format. The file lives at .kiro/steering/runbook-template.md with inclusion: manual so it only loads when I explicitly reference it. Here is the full file:

markdown

---
inclusion: manual
---
 
# Runbook Template
 
All runbooks follow this structure:
 
## [Service Name]: [Issue Type]
 
### Detection
 
How to detect this issue. Include specific metrics, log queries,
and alert names.
 
### Triage
 
Step-by-step triage checklist. Each step should be a single action
with an expected outcome.
 
### Resolution
 
Numbered resolution steps. Each step includes:
 
- The exact command to run
- The expected output
- What to do if the output is unexpected
 
### Rollback
 
How to undo the resolution if it makes things worse.
 
### Related Runbooks
 
Links to related runbooks.
 
### Post-Incident
 
What to do after the incident is resolved.

The structure it enforces as a diagram:

flowchart TD
    A[Runbook] --> B["Detection\nMetrics, log queries, alert names"]
    A --> C["Triage\nStep-by-step checklist with expected outcomes"]
    A --> D["Resolution\nExact commands + expected output + fallback"]
    A --> E["Rollback\nHow to undo if resolution makes things worse"]
    A --> F["Related Runbooks\nLinks to connected runbooks"]
    A --> G["Post-Incident\nTickets, alarms, schedule postmortem"]

I asked Kiro to generate the runbook using this template. Here is the full output it produced.

Payment Worker: Redis Connection Pool Exhaustion

Detection

This issue is detected by one or more of the following.

Alert: payment-worker-redis-connections-high fires when Redis connection count exceeds 80% of the configured maximum.

Log pattern: search CloudWatch Logs Insights for:

sql

fields @timestamp, @message
| filter @logStream like /payment-worker/
| filter @message like /connection pool exhausted/
| sort @timestamp desc
| limit 50

Metric: redis.clients.connected in Datadog. Navigate to Metrics, search redis.clients.connected, filter by service:payment-worker.

Triage

Check current Redis connection count:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients

Expected: a number. If it equals or exceeds maxclients, the pool is exhausted.

Check current payment-worker replica count:

bash

kubectl get deployment payment-worker -n production -o jsonpath='{.spec.replicas}'

Calculate expected connection usage. Check REDIS_MAX_CONNECTIONS in the ConfigMap:

bash

kubectl get configmap payment-worker-config -n production -o yaml | grep REDIS_MAX_CONNECTIONS

Multiply replicas by REDIS_MAX_CONNECTIONS. If the result exceeds maxclients, that is the problem.

Check Redis maxclients setting:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients

If replicas * connections_per_replica > maxclients, proceed to Resolution.

Resolution

Option A: Scale down replicas (immediate, temporary)

Scale payment-worker to a safe replica count:

bash

kubectl scale deployment payment-worker -n production --replicas=5

Expected: deployment.apps/payment-worker scaled

Verify pods are terminating:

bash

kubectl get pods -n production -l app=payment-worker -w

Wait until only 5 pods are in Running state.

Verify payment processing has resumed:

bash

kubectl logs -n production -l app=payment-worker --tail=20 | grep "payment processed"

Option B: Increase Redis maxclients (permanent fix)

Update the Redis maxclients setting:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 200

Expected: OK

Verify the change:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients

Expected output: maxclients on one line, 200 on the next.

Scale payment-worker back to the desired replica count:

bash

kubectl scale deployment payment-worker -n production --replicas=15

Monitor Redis connection count for 5 minutes:

bash

watch -n 5 'redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients'

Rollback

If Option B makes things worse:

Revert maxclients to the previous value:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 100

Scale payment-worker back to 5 replicas:

bash

kubectl scale deployment payment-worker -n production --replicas=5

Escalate to the Redis infrastructure team.

Redis Memory Pressure
Payment Worker High Error Rate
Kubernetes Deployment Rollback

Post-Incident

File a ticket to add pre-deployment Redis capacity check
Add CloudWatch alarm for Redis connection count above 80% of max
Update the deployment runbook to include Redis capacity check
Schedule a postmortem review within 48 hours

This is production-ready. I could hand this to an on-call engineer at 2am and they could follow it without asking me anything.

Claude Code

Claude Code generated an excellent runbook. The commands were correct. The structure was logical. But it invented its own format. The section names were different. The order was different. The log query syntax was different from what my team uses.

I asked it to reformat to match my template. It did so correctly. Two turns instead of one.

The content quality was identical to Kiro's output. The difference is that Kiro followed my template automatically because of the steering file.

Cursor

Cursor generated a basic runbook. It had the right sections but the commands were incomplete. The kubectl commands were missing the namespace flag. The Redis commands were missing the host and port flags. The log query was a generic CloudWatch Logs query, not the specific query format my team uses.

I spent about 10 minutes editing it.

Windsurf, Codex, Antigravity

Windsurf generated a runbook that was better than Cursor but still needed editing. The commands were mostly correct but the structure did not match my template.

Codex generated a runbook that was mostly Python-flavored. It suggested using boto3 to query CloudWatch Logs instead of the CloudWatch Logs Insights query language. That is not how my team works.

Antigravity generated a reasonable runbook but hit quota limits before completing the post-incident section.

Task 5 Summary

Kiro wins again because of steering files. When you have a template, Kiro follows it. Claude Code generates excellent content but needs a correction turn to match your format. Cursor, Windsurf, Codex, and Antigravity all require significant editing.

What I Actually Use and Why

I am going to be direct.

Kiro is the right tool for production SRE work on a team.

SRE work is not solo work. You are writing Terraform that three other engineers will review. You are writing runbooks that an on-call engineer will read at 2am. You are generating monitors that need to match the conventions your team agreed on six months ago.

Kiro is the only tool that enforces those conventions automatically. Steering files mean you write the rule once and every future generation follows it. The spec workflow means every change has a paper trail. When someone asks why a module was written a certain way, you have an answer.

The spec workflow feels slow the first week. After that, you stop noticing it. What you do notice is that you stop having conversations about why the code looks different from everything else.

Claude Code is the right tool for complex autonomous tasks.

When I need to refactor a massive codebase, debug a subtle issue across multiple files, or write a complex automation script, Claude Code on Opus 4.7 is the most capable tool available. The 1M token context window is not a marketing number. It genuinely changes what is possible. It can read your entire infrastructure repo and write code that looks like it belongs there.

The terminal-only interface is a real limitation. I use it alongside Cursor for the IDE experience.

Cursor is the right tool for daily inline editing.

Cursor is the most polished IDE experience. The autocomplete is fast and accurate. The chat is responsive. It is the right tool if you want AI assistance without changing how you work. I use it for quick fixes, small changes, and anything where I want to stay in flow.

Windsurf is the right tool if you want Cursor quality at a lower price.

Windsurf 2.0 with Devin integration is genuinely impressive. The SWE-1.6 model is strong. The pricing is more predictable than Antigravity. If your team is budget-conscious and does not need Kiro's spec workflow, Windsurf is a solid choice.

Codex is not the right tool for infrastructure work.

Codex is excellent for Python automation and data pipelines. It is not optimized for Terraform, Go, or YAML-heavy infrastructure repos. The token-based pricing since April 2026 makes costs unpredictable. Use it for what it is good at.

Antigravity is not ready for production infrastructure work.

The Gemini 3.1 Pro model is capable. Antigravity 2.0 launched at Google I/O 2026 with real improvements. But the quota interruptions, the March 2026 pricing chaos, and the stale infrastructure training data make it unreliable for production SRE work right now. Check back in six months.

My Personal Stack

I use three tools, not one.

Kiro for new features and anything that needs to follow team conventions.

Claude Code for large refactors, debugging complex issues, and anything that requires reading the whole codebase.

Cursor for daily editing, autocomplete, quick fixes, and small changes.

This is not a failure of any single tool. It is the reality of 2026. The tools are specialized. The engineers who pick one and stick with it are leaving performance on the table.

Quick Reference

flowchart LR
    A{What are you doing?} --> B["New Terraform module\non a team project"]
    A --> C["New Terraform module\nsolo"]
    A --> D[Multi-file Go refactor]
    A --> E["Kubernetes YAML\ndebugging"]
    A --> F["Incident runbook\ngeneration"]
    A --> G["Datadog monitor\nwith team conventions"]
    A --> H[Quick inline fix]
    A --> I["Python automation\nscript"]
    A --> J["Large codebase\nexploration"]
    A --> K["Budget-conscious\nteam"]
 
    B --> L[Kiro]
    C --> M[Claude Code]
    D --> M
    E --> N[Claude Code or Windsurf]
    F --> L
    G --> L
    H --> O[Cursor]
    I --> P[Codex or Claude Code]
    J --> M
    K --> Q[Windsurf]

One Last Thing

The question is not which AI IDE is best.

The question is which AI IDE is best for this specific task.

Kiro wins for structured, team-based, convention-heavy work. Claude Code wins for raw capability. Cursor wins for daily ergonomics.

Pick based on your actual workflow. Not based on benchmarks. Not based on what is trending on social media this week.

The tools that will make you faster are the ones that fit how you already work and then push you slightly beyond it.

Keep Reading

Kiro IDE: Building a Production API With Spec-Driven AI (Hands-On Tutorial)

April 1, 2026 (2mo ago)35 min read

AWSDev Tools

Running Local LLM Agents in Kubernetes: A Practitioner's Guide to vLLM on EKS

June 12, 2026 (1w ago)28 min read

AWSKubernetes

I'm Officially an AWS Community Builder! The Complete Guide to What It Is, What You Get, and How to Make the Most of It

March 5, 2026 (3mo ago)10 min read

AWSCommunity

Subscribe to Newsletter

Get the latest posts delivered right to your inbox

Join 1,000+ readers. No spam, unsubscribe anytime.

Support my work — Brewing thought

Ranti

Rantideb Howlader

Author

Connect

Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE

Rantideb Howlader•May 21, 2026 (1mo ago)•40 min read•

Why I Wrote This

Every AI IDE comparison I found was written by someone who spent a weekend on a todo app and called it a production test.

This is that report. Every code block in here is real. Every error message is real. Every correction turn is real.

The Six Tools

Pricing in May 2026

Tool	Free Tier	Paid Starts At	Notes
Kiro	Yes	~$19/mo	Credit-based for agentic tasks
Cursor	Yes	$20/mo Pro,$ 60/mo Pro+, $200/mo Ultra	Most predictable pricing
Windsurf	Yes (25 credits/mo)	$20/mo Pro	Devin included in all paid plans
Claude Code	No	$20/mo Pro,$ 100/mo Max	Max needed for serious daily use
Codex	Limited	~ $100 to$ 200/mo average	Token-based since April 2026
Antigravity	Yes (~20 req/day)	$20/mo AI Pro,$ 100/mo Ultra	Credit system is confusing

My Test Environment

Before the tasks, here is the repo I was working in. This matters because the quality difference between tools is almost entirely about how well they read existing context.

text

infrastructure/
  modules/
    networking/
      main.tf          # VPC, subnets, NAT gateway
      variables.tf     # 23 variables, all with descriptions
      outputs.tf       # 14 outputs
    ecs-service/
      main.tf          # ECS task definition, service, IAM roles
      variables.tf
      outputs.tf
    monitoring/
      main.tf          # Datadog monitors, SLO alerts
      variables.tf
      slo-payment-service.tf   # existing SLO monitor I use as template
  services/
    payment-worker/
      main.tf          # calls the modules above
      terraform.tfvars
  go/
    monitoring/
      collector.go     # 847 lines
      metrics.go       # 312 lines
      types.go         # 89 lines
      alerting.go      # 203 lines
      ... 8 more files

Task 1: Writing a Terraform Module

This is the most common infrastructure task I do. I run it at least twice a week.

Kiro

I had already set up a steering file before running this task. This is the step most people skip and then wonder why Kiro generates generic output.

In the Kiro panel, click Steering. Click +. Select Workspace. Name it terraform-standards.md. Write this:

markdown

---
inclusion: fileMatch
fileMatchPattern: "**/*.tf"
---
 
# Terraform Standards
 
## Naming
 
- Use var.name not var.service_name
- Resource names follow the pattern: local.name_prefix-resource_type
- All variables must have a description field
 
## Provider Versions
 
- AWS provider: ~> 5.0
- Datadog provider: ~> 3.0
- Never use deprecated resources: use aws_lb not aws_alb
 
## Module Structure
 
Follow the pattern in: #[[file:modules/networking/main.tf]]
Follow the variable style in: #[[file:modules/networking/variables.tf]]
 
## Required Tags
 
Every resource must have:
 
- Name
- Environment
- ManagedBy = "terraform"
- Team

Save it. Now open Specs. Click +. Select Feature. Type this:

text

Create a Terraform module for a new microservice at modules/ecs-worker/.
It needs:
- VPC with public and private subnets across 2 AZs
- Security groups for the ECS tasks and ALB
- ECS Fargate task definition with configurable CPU and memory
- Application Load Balancer with HTTPS listener
- CloudWatch alarms for CPU utilization, memory utilization, and ALB 5xx error rate
- IAM role for the ECS task with least-privilege permissions

Kiro does not write code. It generates requirements.md first. Here is what it produced:

markdown

## Requirement 1: Network Isolation
 
**User Story:** As an infrastructure engineer, I want the ECS tasks
to run in private subnets so that they are not directly reachable
from the internet.
 
### Acceptance Criteria
 
- WHEN the module is applied THE SYSTEM SHALL create a VPC with
  CIDR 10.0.0.0/16 unless overridden by var.vpc_cidr.
- THE SYSTEM SHALL create two private subnets in separate AZs
  with CIDRs derived from the VPC CIDR.
- THE SYSTEM SHALL create two public subnets for the ALB only.
- IF var.enable_nat_gateway is true THE SYSTEM SHALL create one
  NAT gateway per AZ for high availability.
- THE SYSTEM SHALL output the VPC ID, private subnet IDs, and
  public subnet IDs for use by other modules.
 
## Requirement 2: Compute
 
**User Story:** As an infrastructure engineer, I want the ECS task
definition to be configurable so that different services can use
different resource allocations.
 
### Acceptance Criteria
 
- THE SYSTEM SHALL create an ECS Fargate task definition with
  var.cpu and var.memory as configurable inputs.
- THE SYSTEM SHALL create an IAM execution role with permissions
  to pull from ECR and write to CloudWatch Logs.
- THE SYSTEM SHALL create a task role with no permissions by
  default, allowing callers to attach additional policies.
- WHEN var.container_port is set THE SYSTEM SHALL expose that
  port in the container definition.

I read through it. The VPC CIDR defaulting to 10.0.0.0/16 was wrong for my environment. I edited that line to say var.vpc_cidr with no default, caller must provide it. Clicked Approve.

Kiro generated design.md. It included the full module file structure, every variable name, every output name, and the IAM policy document. I approved it.

Then tasks.md:

markdown

- [ ] Task 1: Create modules/ecs-worker/variables.tf with all input variables
- [ ] Task 2: Create modules/ecs-worker/main.tf - VPC and networking resources
- [ ] Task 3: Create modules/ecs-worker/main.tf - Security groups
- [ ] Task 4: Create modules/ecs-worker/main.tf - ECS task definition and IAM roles
- [ ] Task 5: Create modules/ecs-worker/main.tf - ALB and target group
- [ ] Task 6: Create modules/ecs-worker/main.tf - CloudWatch alarms
- [ ] Task 7: Create modules/ecs-worker/outputs.tf
- [ ] Task 8: Create modules/ecs-worker/README.md

I clicked Run All Tasks. Here is the variables.tf it produced:

hcl

variable "name" {
  description = "Name of the ECS worker service. Used as a prefix for all resources."
  type        = string
}
 
variable "vpc_cidr" {
  description = "CIDR block for the VPC. Must not overlap with existing VPCs in the account."
  type        = string
}
 
variable "environment" {
  description = "Deployment environment. Used for tagging and resource naming."
  type        = string
}
 
variable "team" {
  description = "Team that owns this service. Used for tagging."
  type        = string
}
 
variable "cpu" {
  description = "CPU units for the ECS Fargate task. Valid values: 256, 512, 1024, 2048, 4096."
  type        = number
  default     = 512
}
 
variable "memory" {
  description = "Memory in MiB for the ECS Fargate task. Must be compatible with the cpu value."
  type        = number
  default     = 1024
}
 
variable "container_port" {
  description = "Port the container listens on."
  type        = number
  default     = 8080
}
 
variable "container_image" {
  description = "Docker image URI for the ECS task. Include the tag."
  type        = string
}
 
variable "enable_nat_gateway" {
  description = "Whether to create NAT gateways for private subnet internet access."
  type        = bool
  default     = true
}
 
variable "alarm_cpu_threshold" {
  description = "CPU utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}
 
variable "alarm_memory_threshold" {
  description = "Memory utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}
 
variable "alarm_5xx_threshold" {
  description = "Number of ALB 5xx errors per minute that triggers the CloudWatch alarm."
  type        = number
  default     = 10
}
 
variable "alarm_actions" {
  description = "List of ARNs to notify when an alarm fires. Typically an SNS topic ARN."
  type        = list(string)
  default     = []
}

Notice: var.name not var.service_name. Every variable has a description. The steering file worked.

Here is the CloudWatch alarm section from main.tf:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_cpu_threshold
  alarm_description   = "ECS CPU utilization is above ${var.alarm_cpu_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
 
  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }
 
  tags = local.common_tags
}
 
resource "aws_cloudwatch_metric_alarm" "memory_high" {
  alarm_name          = "${local.name_prefix}-memory-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "MemoryUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_memory_threshold
  alarm_description   = "ECS memory utilization is above ${var.alarm_memory_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
 
  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }
 
  tags = local.common_tags
}
 
resource "aws_cloudwatch_metric_alarm" "alb_5xx_high" {
  alarm_name          = "${local.name_prefix}-alb-5xx-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = var.alarm_5xx_threshold
  alarm_description   = "ALB 5xx error count exceeded ${var.alarm_5xx_threshold} in the last minute"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
  treat_missing_data  = "notBreaching"
 
  dimensions = {
    LoadBalancer = aws_lb.this.arn_suffix
    TargetGroup  = aws_lb_target_group.this.arn_suffix
  }
 
  tags = local.common_tags
}

I ran terraform validate on the output. Zero errors. I ran terraform plan against a dev account. It created 34 resources. All correct.

Total time from opening Kiro to a passing terraform plan: 11 minutes. 8 of those were the spec phase.

Cursor

I opened Cursor, pressed Cmd+K, and typed the same prompt. No steering files. No spec phase. Cursor started writing immediately.

The code appeared in about 35 seconds. Here is what the variables file looked like:

hcl

variable "service_name" {
  description = "Name of the service"
  type        = string
}
 
variable "environment" {
  type = string
}
 
variable "cpu" {
  type    = number
  default = 256
}
 
variable "memory" {
  type    = number
  default = 512
}
 
variable "tags" {
  type    = map(string)
  default = {}
}

Three problems immediately visible.

First, it used var.service_name not var.name. My existing modules use var.name. Every reference to this variable in the calling module would need to change.

Second, the environment variable has no description. My team's convention requires descriptions on every variable. The PR would fail review.

Third, it added a tags variable I did not ask for. My modules use a local.common_tags block that merges required tags automatically. A separate tags variable breaks that pattern.

The CloudWatch alarm section had a more serious problem:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${var.service_name}-cpu-high"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "CPU utilization is high"
  alarm_actions       = []
}

The evaluation_periods and period and threshold are strings. In AWS provider 5.x these should be numbers. This would produce a plan-time error:

text

Error: Incorrect attribute value type
  on main.tf line 47, in resource "aws_cloudwatch_metric_alarm" "cpu_high":
  47:   evaluation_periods  = "2"
    |----------------
    | "2" is a string

Inappropriate value for attribute "evaluation_periods": a number is required.

Also the alarm_actions is hardcoded to an empty list. There is no variable for it. If you want to wire this to an SNS topic you have to edit the generated code directly.

I fixed all of this manually. It took about 12 minutes. So the total time was 47 seconds of generation plus 12 minutes of fixing. Longer than Kiro, with no documentation.

Windsurf

Windsurf's Cascade agent indexed my project automatically when I opened it. I did not configure anything. I opened a new Cascade conversation and typed the same prompt.

Windsurf read my existing modules. The variable names matched. It used var.name. It used aws_lb not aws_alb. The module structure matched my existing pattern.

But the CloudWatch alarms had hardcoded thresholds:

hcl

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "ECS CPU utilization is above 80%"
  alarm_actions       = []
}

The threshold is hardcoded to 80. The alarm_actions is hardcoded to an empty list. These should be variables. I asked Windsurf to fix it.

It fixed it in one turn. The corrected version used var.alarm_cpu_threshold and var.alarm_actions. Correct.

Total time: about 6 minutes including the correction turn. Better than Cursor. Not as clean as Kiro on the first pass.

Claude Code

Claude Code runs in the terminal. There is no IDE. I opened a terminal and ran:

bash

claude

I typed the same prompt. Claude Code printed this before writing anything:

text

I'll read your existing Terraform modules first to understand your conventions.

Reading modules/networking/main.tf...
Reading modules/networking/variables.tf...
Reading modules/networking/outputs.tf...
Reading modules/ecs-service/main.tf...
Reading modules/ecs-service/variables.tf...
Reading modules/monitoring/main.tf...
Reading modules/monitoring/slo-payment-service.tf...
Reading services/payment-worker/main.tf...

I can see your conventions:
- Variable naming: var.name (not var.service_name)
- AWS provider: ~> 5.0 (using aws_lb not aws_alb)
- All variables have descriptions
- Common tags via local.common_tags
- Alarm actions via a list variable

Creating modules/ecs-worker/...

The Terraform it produced was the best of all six tools. Here is the IAM section, which is where most tools get lazy:

hcl

data "aws_iam_policy_document" "ecs_execution_assume_role" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]
 
    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}
 
data "aws_iam_policy_document" "ecs_execution" {
  statement {
    effect = "Allow"
    actions = [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
    ]
    resources = ["*"]
  }
 
  statement {
    effect = "Allow"
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]
    resources = [
      "${aws_cloudwatch_log_group.this.arn}:*",
    ]
  }
}
 
resource "aws_iam_role" "ecs_execution" {
  name               = "${local.name_prefix}-ecs-execution"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}
 
resource "aws_iam_role_policy" "ecs_execution" {
  name   = "${local.name_prefix}-ecs-execution"
  role   = aws_iam_role.ecs_execution.id
  policy = data.aws_iam_policy_document.ecs_execution.json
}
 
resource "aws_iam_role" "ecs_task" {
  name               = "${local.name_prefix}-ecs-task"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}

Notice the CloudWatch Logs permission is scoped to the specific log group ARN, not *. That is least-privilege. Kiro also did this. Cursor used * for the logs resource.

The only problem: no IDE. I was looking at diffs in the terminal. To review the full output I had to open the files in a separate editor. That friction is real.

terraform validate: zero errors. terraform plan: 34 resources, all correct.

Total time: 4 minutes.

OpenAI Codex

I used the Codex CLI:

bash

codex "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms for CPU, memory, and ALB 5xx errors."

Codex generated the module. Here is the ALB resource it produced:

hcl

resource "aws_alb" "main" {
  name               = "${var.service_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids
 
  tags = {
    Name = "${var.service_name}-alb"
  }
}

aws_alb is deprecated. The correct resource in AWS provider 5.x is aws_lb. This is not a breaking change but it generates a deprecation warning on every plan:

text

Warning: Argument is deprecated
  with aws_alb.main,
  on main.tf line 1, in resource "aws_alb" "main":
  1: resource "aws_alb" "main" {

Use aws_lb instead.

The ECS task definition had a more serious problem. It used the old JSON string format for container definitions:

hcl

resource "aws_ecs_task_definition" "main" {
  family                   = var.service_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.ecs_execution.arn
 
  container_definitions = jsonencode([
    {
      name      = var.service_name
      image     = var.container_image
      cpu       = var.cpu
      memory    = var.memory
      essential = true
      portMappings = [
        {
          containerPort = var.container_port
          hostPort      = var.container_port
          protocol      = "tcp"
        }
      ]
    }
  ])
}

Also: var.service_name again. Codex did not read my existing modules.

Codex is fast. The CLI is clean. But it is clearly optimized for Python. Its Terraform knowledge is about 18 months behind.

Google Antigravity

I used Antigravity 2.0's CLI, which launched at Google I/O 2026:

bash

antigravity "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms."

Antigravity started generating. Then this appeared:

text

Rate limit reached. You have used 18/20 of your daily requests.
Generation paused. Resume tomorrow or upgrade to AI Pro.

I was on the free tier. 20 requests per day. I had used 18 testing other things earlier. I upgraded to AI Pro ($20/month) and tried again.

This time it completed. Here is the provider block it generated:

hcl

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

I gave up on this task for Antigravity. The combination of quota interruptions and stale provider knowledge makes it unreliable for infrastructure work right now.

Task 1 Summary

Task 2: Refactoring a Go Monitoring Script Across 12 Files

The task: change the signature of getServiceMetrics from this:

func getServiceMetrics(name string) (*ServiceMetrics, error)

to this:

func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)

This is a real refactor I did last month. I ran it through all six tools to see which ones could handle it without missing files.

Kiro

I used a bugfix spec for this. Click Specs. Click +. Select Bug. Type this:

text

The getServiceMetrics function in go/monitoring/metrics.go needs a new signature.

Current:
  func getServiceMetrics(name string) (*ServiceMetrics, error)

New:
  func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)

MetricOptions is a new struct that needs to be defined in go/monitoring/types.go.
It should have these fields:
  - Timeout time.Duration (default 30s)
  - IncludeHistogram bool (default false)
  - Tags map[string]string (default empty)

All 11 callers need to be updated. Where no context is available, use context.Background().
Where no options are needed, use MetricOptions{} as the zero value.

Kiro generated a bugfix spec that listed every file:

markdown

## Bug Condition
 
The function getServiceMetrics does not accept a context or options,
making it impossible to add timeouts or pass metadata to the metrics
collection layer.
 
## Files Requiring Changes
 
1. go/monitoring/types.go - Add MetricOptions struct
2. go/monitoring/metrics.go - Update function signature
3. go/monitoring/collector.go - Update 3 call sites
4. go/monitoring/alerting.go - Update 2 call sites
5. go/monitoring/reporter.go - Update 1 call site
6. go/monitoring/aggregator.go - Update 2 call sites
7. go/monitoring/exporter.go - Update 1 call site
8. go/monitoring/health.go - Update 1 call site
9. go/monitoring/dashboard.go - Update 1 call site
10. go/monitoring/scheduler.go - Update 1 call site
11. go/monitoring/cleanup.go - Update 1 call site
12. go/monitoring/metrics_test.go - Update test call sites

It ran the tasks one by one. Here is the MetricOptions struct it added to types.go:

// MetricOptions configures the behavior of getServiceMetrics.
type MetricOptions struct {
	// Timeout is the maximum duration to wait for metrics collection.
	// Defaults to 30 seconds if zero.
	Timeout time.Duration
 
	// IncludeHistogram controls whether histogram data is included in the response.
	// Histogram collection is expensive. Only enable when needed.
	IncludeHistogram bool
 
	// Tags are additional key-value pairs attached to the collected metrics.
	// These are forwarded to the metrics backend as dimensions.
	Tags map[string]string
}
 
// DefaultMetricOptions returns a MetricOptions with sensible defaults.
func DefaultMetricOptions() MetricOptions {
	return MetricOptions{
		Timeout:          30 * time.Second,
		IncludeHistogram: false,
		Tags:             make(map[string]string),
	}
}

It added a DefaultMetricOptions() constructor without being asked. That is the kind of thing a senior engineer adds automatically.

Here is how it updated a call site in collector.go:

// Before
metrics, err := getServiceMetrics(svc.Name)
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}
 
// After
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
 
metrics, err := getServiceMetrics(ctx, svc.Name, MetricOptions{
    Tags: map[string]string{
        "collector": "automated",
        "service":   svc.Name,
    },
})
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}

I ran go build ./.... Zero errors. I ran go test ./.... All tests passed.

Total time: 6 minutes including the spec phase.

Claude Code

bash

claude "Update getServiceMetrics in go/monitoring/metrics.go to accept
ctx context.Context, name string, opts MetricOptions.
Define MetricOptions in go/monitoring/types.go with fields:
  Timeout time.Duration
  IncludeHistogram bool
  Tags map[string]string
Add a DefaultMetricOptions() constructor.
Update all 11 callers. Use existing ctx where available, context.Background() where not.
Update tests in metrics_test.go."

Claude Code read all 12 files before making any changes. It printed a summary of what it found:

text

Found 11 call sites across 10 files:
- collector.go: 3 calls (has ctx parameter available)
- alerting.go: 2 calls (has ctx parameter available)
- reporter.go: 1 call (no ctx, will use context.Background())
- aggregator.go: 2 calls (has ctx parameter available)
- exporter.go: 1 call (no ctx, will use context.Background())
- health.go: 1 call (has ctx parameter available)
- dashboard.go: 1 call (no ctx, will use context.Background())
- scheduler.go: 1 call (has ctx parameter available)
- cleanup.go: 1 call (no ctx, will use context.Background())
- metrics_test.go: 4 test calls (will use context.Background())

Making changes...

The output was identical quality to Kiro. Same DefaultMetricOptions() constructor. Same context propagation logic. Same test updates.

go build ./...: zero errors. go test ./...: all passed.

Total time: 3 minutes. Faster than Kiro because there was no spec phase.

Cursor

I opened Composer with Cmd+Shift+I and typed the same prompt.

Cursor updated 9 of 12 files. It missed dashboard.go, cleanup.go, and scheduler.go. These three files are in the same directory as the others. Cursor just did not index them.

I pointed Cursor to the missing files explicitly:

text

You missed these files:
- go/monitoring/dashboard.go
- go/monitoring/cleanup.go
- go/monitoring/scheduler.go
Please update the getServiceMetrics call sites in these files too.

I fixed that manually.

go build ./...: zero errors after manual fix. Total time: 9 minutes.

Windsurf

Windsurf missed 2 files: cleanup.go and scheduler.go. Same problem as Cursor. I pointed it to the missing files and it updated them correctly, including proper context propagation.

Total time: 7 minutes.

Codex

Codex updated metrics.go with the new signature. It updated collector.go with 2 of 3 call sites. It missed the third call site in collector.go and all other files.

I asked it to find the remaining call sites. It found 4 more. I asked again. It found 2 more. After 4 rounds of prompting it had updated 8 of 11 files. I gave up and did the remaining 3 manually.

Codex does not handle large multi-file refactors well. It loses track of what it has already changed.

Antigravity

Antigravity hit a quota limit after updating 3 files. I had already used most of my daily requests. I stopped testing it on this task.

Task 2 Summary

Task 3: Generating a Datadog SLO Monitor

This task is specifically about whether a tool can follow your team's conventions without you repeating them every time.

Kiro

I already had the terraform-standards.md steering file from Task 1. I added one more steering file specifically for monitoring:

markdown

---
inclusion: fileMatch
fileMatchPattern: "modules/monitoring/**/*.tf"
---
 
# Monitoring Module Standards
 
## SLO Monitor Structure
 
All SLO monitors use the datadog_service_level_objective resource type.
Follow the exact structure in: #[[file:modules/monitoring/slo-payment-service.tf]]
 
## Required Fields
 
Every SLO monitor must have:
 
- name following the pattern: "SLO - [Service Name] - [Metric]"
- description explaining what the SLO measures
- thresholds block with timeframe = "30d"
- warning threshold at 99.7
- critical threshold at 99.5
- tags including "runbook_url:[url]" and "team:[team_name]"
 
## Naming Convention
 
The Terraform resource name follows: slo*[service_name]*[metric]
Example: slo_payment_service_availability

I asked Kiro to generate the monitor for a new service called inventory-api.

Here is what it produced:

hcl

resource "datadog_service_level_objective" "slo_inventory_api_availability" {
  name        = "SLO - Inventory API - Availability"
  type        = "metric"
  description = "Measures the availability of the Inventory API service. Availability is defined as the percentage of requests that return a non-5xx response."
 
  query {
    numerator   = "sum:trace.web.request.hits{service:inventory-api,!http.status_class:5xx}.as_count()"
    denominator = "sum:trace.web.request.hits{service:inventory-api}.as_count()"
  }
 
  thresholds {
    timeframe = "30d"
    target    = 99.9
    warning   = 99.7
  }
 
  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "runbook_url:https://runbooks.internal/inventory-api/availability",
    "managed_by:terraform",
  ]
}
 
resource "datadog_monitor" "slo_inventory_api_availability_alert" {
  name    = "SLO Alert - Inventory API - Availability"
  type    = "slo alert"
  message = <<-EOT
    {{#is_alert}}
    Inventory API availability SLO is below 99.5% over the last 30 days.
 
    Current burn rate: {{value}}
 
    Runbook: https://runbooks.internal/inventory-api/availability
 
    @pagerduty-inventory-api
    {{/is_alert}}
 
    {{#is_warning}}
    Inventory API availability SLO is below 99.7% over the last 30 days.
    @slack-inventory-api-alerts
    {{/is_warning}}
  EOT
 
  query = "error_budget(\"${datadog_service_level_objective.slo_inventory_api_availability.id}\").over(\"30d\") > 1"
 
  thresholds = {
    critical = 1
    warning  = 0.5
  }
 
  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "managed_by:terraform",
  ]
}

This is exactly what I would have written manually. The runbook_url tag is there. The thresholds match. The naming convention matches. The message template matches my existing monitors.

I did not tell Kiro any of this in the prompt. The steering file told it.

Claude Code

bash

claude "Generate a Datadog SLO monitor for inventory-api.
99.9% availability target, 30-day rolling window.
Alert at 99.5%, warn at 99.7%.
Follow the same structure as modules/monitoring/slo-payment-service.tf."

Claude Code read slo-payment-service.tf and generated a correct monitor. The structure matched. The thresholds were correct.

But it did not include the runbook_url tag. That requirement is in my steering file, which Claude Code does not have access to. It does not have persistent project memory between sessions.

I told it to add the runbook_url tag. It added it. One correction turn.

The output after correction was identical to Kiro's output. But I had to remember to ask for the runbook_url. With Kiro, I never have to remember. The steering file remembers for me.

Cursor

I spent about 8 minutes correcting it to match my team's conventions.

Windsurf

Two correction turns to get it right.

Codex and Antigravity

Antigravity generated a monitor that was structurally correct but used hardcoded values instead of variables. It also missed the runbook_url tag.

Task 3 Summary

Claude Code is second. It reads your existing files and matches the pattern. But it does not have persistent memory, so you have to remind it of requirements that are not visible in the files.

Task 4: Debugging a Silent Kubernetes Failure

This is the worst kind of production issue. Everything looks fine. Nothing is fine.

Here is the deployment YAML:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inventory-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inventory-api
      version: v2
  template:
    metadata:
      labels:
        app: inventory-api
        version: v2
    spec:
      containers:
        - name: inventory-api
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/inventory-api:v2.1.4
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5

Here is the Service YAML:

yaml

apiVersion: v1
kind: Service
metadata:
  name: inventory-api
  namespace: production
spec:
  selector:
    app: inventory-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

Here is the Ingress YAML:

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: inventory-api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
    - host: inventory-api.internal.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: inventory-api
                port:
                  number: 80

I pasted all three files into each tool and asked: why is this not routing traffic?

Claude Code

Claude Code spotted the first issue in about 8 seconds.

Wait. Claude Code caught something I missed. It said:

text

The Service selector looks correct at first glance, but look at the
Deployment's selector.matchLabels:

  selector:
    matchLabels:
      app: inventory-api
      version: v2

The Deployment requires pods to have BOTH labels. The Service selector
only requires app: inventory-api. This is fine for routing.

But here is the actual problem: the Ingress uses target-type: ip, which
means the ALB routes directly to pod IPs. The ALB health check target
is port 80 (the Service port). But the pods listen on port 8080.

The ALB is health-checking port 80 on the pod IPs. Port 80 is not open
on the pods. The health checks are failing silently because the ALB
is configured to pass health checks even when the target is unreachable
(the default behavior with some ALB controller versions).

The fix: add this annotation to the Ingress:
  alb.ingress.kubernetes.io/healthcheck-port: "8080"

Or change the Service targetPort to match the ALB health check port.

Claude Code found this without me giving it any additional context. It read all three files together and traced the port mismatch through the entire routing chain.

Kiro

Windsurf

Windsurf found both issues in one pass. Its Cascade agent read all three files together and traced the routing chain correctly. Comparable to Claude Code.

Cursor

Cursor found the Service selector issue (which was not actually a problem) and stopped there. It did not trace the ALB health check port mismatch. I had to give it more context.

Codex and Antigravity

Both identified the Service selector as the problem. Neither found the ALB health check port issue. The selector was not actually the problem.

Task 4 Summary

Task 5: Writing an Incident Runbook

The task: generate a structured runbook from a postmortem summary.

Here is the postmortem I gave each tool:

text

Incident: INS-2847
Date: 2026-04-14 02:17 UTC
Duration: 47 minutes
Severity: P1
Service: payment-worker

Summary:
Redis connection pool exhaustion caused payment processing to fail.
The payment-worker service uses Redis for distributed locking during
payment processing. At 02:17 UTC, Redis connection pool hit the
configured maximum of 100 connections. New payment requests could not
acquire locks and failed with a 503 error.

Root cause:
A deployment at 01:55 UTC increased the payment-worker replica count
from 5 to 15 without updating the Redis connection pool size. Each
replica holds up to 10 connections. 15 replicas * 10 connections = 150
connections, exceeding the pool maximum of 100.

Resolution:
1. Scaled payment-worker back to 5 replicas at 02:31 UTC
2. Updated Redis connection pool max to 200 at 02:41 UTC
3. Scaled payment-worker back to 15 replicas at 02:44 UTC
4. Confirmed payment processing resumed at 02:44 UTC

Action items:
- Add pre-deployment check for Redis connection pool capacity
- Add CloudWatch alarm for Redis connection count > 80% of max
- Update deployment runbook to include Redis capacity check

Kiro

I had a steering file for runbook format. The file lives at .kiro/steering/runbook-template.md with inclusion: manual so it only loads when I explicitly reference it. Here is the full file:

markdown

---
inclusion: manual
---
 
# Runbook Template
 
All runbooks follow this structure:
 
## [Service Name]: [Issue Type]
 
### Detection
 
How to detect this issue. Include specific metrics, log queries,
and alert names.
 
### Triage
 
Step-by-step triage checklist. Each step should be a single action
with an expected outcome.
 
### Resolution
 
Numbered resolution steps. Each step includes:
 
- The exact command to run
- The expected output
- What to do if the output is unexpected
 
### Rollback
 
How to undo the resolution if it makes things worse.
 
### Related Runbooks
 
Links to related runbooks.
 
### Post-Incident
 
What to do after the incident is resolved.

The structure it enforces as a diagram:

flowchart TD
    A[Runbook] --> B["Detection\nMetrics, log queries, alert names"]
    A --> C["Triage\nStep-by-step checklist with expected outcomes"]
    A --> D["Resolution\nExact commands + expected output + fallback"]
    A --> E["Rollback\nHow to undo if resolution makes things worse"]
    A --> F["Related Runbooks\nLinks to connected runbooks"]
    A --> G["Post-Incident\nTickets, alarms, schedule postmortem"]

I asked Kiro to generate the runbook using this template. Here is the full output it produced.

Payment Worker: Redis Connection Pool Exhaustion

Detection

This issue is detected by one or more of the following.

Alert: payment-worker-redis-connections-high fires when Redis connection count exceeds 80% of the configured maximum.

Log pattern: search CloudWatch Logs Insights for:

sql

fields @timestamp, @message
| filter @logStream like /payment-worker/
| filter @message like /connection pool exhausted/
| sort @timestamp desc
| limit 50

Metric: redis.clients.connected in Datadog. Navigate to Metrics, search redis.clients.connected, filter by service:payment-worker.

Triage

Check current Redis connection count:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients

Expected: a number. If it equals or exceeds maxclients, the pool is exhausted.

Check current payment-worker replica count:

bash

kubectl get deployment payment-worker -n production -o jsonpath='{.spec.replicas}'

Calculate expected connection usage. Check REDIS_MAX_CONNECTIONS in the ConfigMap:

bash

kubectl get configmap payment-worker-config -n production -o yaml | grep REDIS_MAX_CONNECTIONS

Multiply replicas by REDIS_MAX_CONNECTIONS. If the result exceeds maxclients, that is the problem.

Check Redis maxclients setting:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients

If replicas * connections_per_replica > maxclients, proceed to Resolution.

Resolution

Option A: Scale down replicas (immediate, temporary)

Scale payment-worker to a safe replica count:

bash

kubectl scale deployment payment-worker -n production --replicas=5

Expected: deployment.apps/payment-worker scaled

Verify pods are terminating:

bash

kubectl get pods -n production -l app=payment-worker -w

Wait until only 5 pods are in Running state.

Verify payment processing has resumed:

bash

kubectl logs -n production -l app=payment-worker --tail=20 | grep "payment processed"

Option B: Increase Redis maxclients (permanent fix)

Update the Redis maxclients setting:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 200

Expected: OK

Verify the change:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients

Expected output: maxclients on one line, 200 on the next.

Scale payment-worker back to the desired replica count:

bash

kubectl scale deployment payment-worker -n production --replicas=15

Monitor Redis connection count for 5 minutes:

bash

watch -n 5 'redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients'

Rollback

If Option B makes things worse:

Revert maxclients to the previous value:

bash

redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 100

Scale payment-worker back to 5 replicas:

bash

kubectl scale deployment payment-worker -n production --replicas=5

Escalate to the Redis infrastructure team.

Redis Memory Pressure
Payment Worker High Error Rate
Kubernetes Deployment Rollback

Post-Incident

File a ticket to add pre-deployment Redis capacity check
Add CloudWatch alarm for Redis connection count above 80% of max
Update the deployment runbook to include Redis capacity check
Schedule a postmortem review within 48 hours

This is production-ready. I could hand this to an on-call engineer at 2am and they could follow it without asking me anything.

Claude Code

I asked it to reformat to match my template. It did so correctly. Two turns instead of one.

The content quality was identical to Kiro's output. The difference is that Kiro followed my template automatically because of the steering file.

Cursor

I spent about 10 minutes editing it.

Windsurf, Codex, Antigravity

Windsurf generated a runbook that was better than Cursor but still needed editing. The commands were mostly correct but the structure did not match my template.

Codex generated a runbook that was mostly Python-flavored. It suggested using boto3 to query CloudWatch Logs instead of the CloudWatch Logs Insights query language. That is not how my team works.

Antigravity generated a reasonable runbook but hit quota limits before completing the post-incident section.

Task 5 Summary

What I Actually Use and Why

I am going to be direct.

Kiro is the right tool for production SRE work on a team.

The spec workflow feels slow the first week. After that, you stop noticing it. What you do notice is that you stop having conversations about why the code looks different from everything else.

Claude Code is the right tool for complex autonomous tasks.

The terminal-only interface is a real limitation. I use it alongside Cursor for the IDE experience.

Cursor is the right tool for daily inline editing.

Windsurf is the right tool if you want Cursor quality at a lower price.

Codex is not the right tool for infrastructure work.

Antigravity is not ready for production infrastructure work.

My Personal Stack

I use three tools, not one.

Kiro for new features and anything that needs to follow team conventions.

Claude Code for large refactors, debugging complex issues, and anything that requires reading the whole codebase.

Cursor for daily editing, autocomplete, quick fixes, and small changes.

This is not a failure of any single tool. It is the reality of 2026. The tools are specialized. The engineers who pick one and stick with it are leaving performance on the table.

Quick Reference

flowchart LR
    A{What are you doing?} --> B["New Terraform module\non a team project"]
    A --> C["New Terraform module\nsolo"]
    A --> D[Multi-file Go refactor]
    A --> E["Kubernetes YAML\ndebugging"]
    A --> F["Incident runbook\ngeneration"]
    A --> G["Datadog monitor\nwith team conventions"]
    A --> H[Quick inline fix]
    A --> I["Python automation\nscript"]
    A --> J["Large codebase\nexploration"]
    A --> K["Budget-conscious\nteam"]
 
    B --> L[Kiro]
    C --> M[Claude Code]
    D --> M
    E --> N[Claude Code or Windsurf]
    F --> L
    G --> L
    H --> O[Cursor]
    I --> P[Codex or Claude Code]
    J --> M
    K --> Q[Windsurf]