---
title: "Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE"
author: "Rantideb Howlader"
date: "2026-05-21T00:00:00.000Z"
canonical_url: "https://www.ranti.dev/blog/kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity"
license: "CC-BY-4.0"
---


## Why I Wrote This

Every AI IDE comparison I found was written by someone who spent a weekend on a todo app and called it a production test.

I work as an SRE on large-scale microservices infrastructure. My day is [Terraform](/blog/terraform-state-surgery), Go automation scripts, [Kubernetes](/blog/eks-networking-vpc-cni) YAML, [Datadog](https://www.datadoghq.com?utm_source=ranti.dev) monitors, [incident runbooks](/blog/disaster-recovery-rto-rpo), and CI/CD pipelines. I need tools that work on real infrastructure code, not just React components.

So I ran all six major AI coding tools through the same five real tasks I do every week. I kept notes on every command I ran, every diff I got back, every time a tool failed mid-task, and every time I had to correct something.

This is that report. Every code block in here is real. Every error message is real. Every correction turn is real.

## The Six Tools

**[Kiro](https://kiro.dev?utm_source=ranti.dev)** is AWS's [spec-driven IDE](/blog/kiro-ide-spec-driven-development). Built on VS Code, available as a [standalone download](https://kiro.dev/downloads?utm_source=ranti.dev) for macOS, Windows, and Linux. You write a prompt and it generates a requirements document, a design document, and a task list before writing a single line of code. It has hooks for event-driven automations and steering files for persistent project context. It is the only tool in this list that enforces your team's conventions automatically.

**[Cursor](https://cursor.com?utm_source=ranti.dev)** is the most popular AI IDE right now. Also VS Code-based. Chat-first, inline autocomplete, strong model selection. Cursor 3 launched in 2026 with Composer 2.5. It is what most engineers reach for first.

**[Windsurf](https://windsurf.com?utm_source=ranti.dev)** was built by Codeium and acquired by Cognition in 2025. It now ships with Devin built in. It has its own model called SWE-1.6. Flow-state editing is its signature feature. Cascade, its agent, indexes your project automatically.

**[Claude Code](https://claude.ai/code?utm_source=ranti.dev)** is Anthropic's terminal-based agent. Not an IDE. You run it from the command line. It uses Claude Opus 4.7 with a 1M token context window. It tops [SWE-bench](https://www.swebench.com?ref=ranti.dev) Verified at 87.6%. It is the most capable tool in this list and the most inconvenient to use.

**[OpenAI Codex](https://openai.com/codex?ref=ranti.dev)** is OpenAI's agentic coding tool. Available as a web app, CLI, and IDE extension. It runs GPT-5.3 Codex. Pricing changed to token-based billing in April 2026. It is excellent for Python and mediocre for everything else.

**[Google Antigravity](https://antigravity.codes?utm_source=ranti.dev)** is Google's answer to Cursor. Powered by Gemini 3.1 Pro. [Antigravity 2.0](https://techcrunch.com/2026/05/19/google-launches-antigravity-2-0-with-an-updated-desktop-app-and-cli-tool-at-io-2026/?ref=ranti.dev) launched at Google I/O 2026 with a new CLI and SDK. The pricing has been chaotic since March 2026 and the infrastructure training data is stale.

## Pricing in May 2026

| Tool            | Free Tier           | Paid Starts At                         | Notes                            |
| --------------- | ------------------- | -------------------------------------- | -------------------------------- |
| **Kiro**        | Yes                 | ~$19/mo                                | Credit-based for agentic tasks   |
| **Cursor**      | Yes                 | $20/mo Pro, $60/mo Pro+, $200/mo Ultra | Most predictable pricing         |
| **Windsurf**    | Yes (25 credits/mo) | $20/mo Pro                             | Devin included in all paid plans |
| **Claude Code** | No                  | $20/mo Pro, $100/mo Max                | Max needed for serious daily use |
| **Codex**       | Limited             | ~$100 to $200/mo average               | Token-based since April 2026     |
| **Antigravity** | Yes (~20 req/day)   | $20/mo AI Pro, $100/mo Ultra           | Credit system is confusing       |

Cursor is the most predictable. Claude Code Max at $100 per month is expensive but justified if you are doing heavy agentic work. Antigravity's credit restructuring in March 2026 was a mess. The free tier dropped 92% overnight with no warning. Codex token billing makes monthly costs hard to predict for teams.

## My Test Environment

Before the tasks, here is the repo I was working in. This matters because the quality difference between tools is almost entirely about how well they read existing context.

```
infrastructure/
  modules/
    networking/
      main.tf          # VPC, subnets, NAT gateway
      variables.tf     # 23 variables, all with descriptions
      outputs.tf       # 14 outputs
    ecs-service/
      main.tf          # ECS task definition, service, IAM roles
      variables.tf
      outputs.tf
    monitoring/
      main.tf          # Datadog monitors, SLO alerts
      variables.tf
      slo-payment-service.tf   # existing SLO monitor I use as template
  services/
    payment-worker/
      main.tf          # calls the modules above
      terraform.tfvars
  go/
    monitoring/
      collector.go     # 847 lines
      metrics.go       # 312 lines
      types.go         # 89 lines
      alerting.go      # 203 lines
      ... 8 more files
```

The naming convention in this repo uses `var.name` not `var.service_name`. The AWS provider is pinned to `~> 5.0`. The Datadog provider is `~> 3.0`. Every SLO monitor has a `runbook_url` tag. These are the things that separate a tool that read your codebase from a tool that generated generic output.

## Task 1: Writing a Terraform Module

The task: write a [Terraform](/blog/terraform-state-surgery) module for a new microservice. It needs a VPC with public and private subnets, security groups for the service and ALB, an ECS Fargate task definition, an Application Load Balancer, and CloudWatch alarms for CPU, memory, and error rate.

This is the most common infrastructure task I do. I run it at least twice a week.

### Kiro

I had already set up a steering file before running this task. This is the step most people skip and then wonder why Kiro generates generic output.

In the Kiro panel, click **Steering**. Click **+**. Select **Workspace**. Name it `terraform-standards.md`. Write this:

```markdown
---
inclusion: fileMatch
fileMatchPattern: "**/*.tf"
---

# Terraform Standards

## Naming

- Use var.name not var.service_name
- Resource names follow the pattern: local.name_prefix-resource_type
- All variables must have a description field

## Provider Versions

- AWS provider: ~> 5.0
- Datadog provider: ~> 3.0
- Never use deprecated resources: use aws_lb not aws_alb

## Module Structure

Follow the pattern in: #[[file:modules/networking/main.tf]]
Follow the variable style in: #[[file:modules/networking/variables.tf]]

## Required Tags

Every resource must have:

- Name
- Environment
- ManagedBy = "terraform"
- Team
```

Save it. Now open **Specs**. Click **+**. Select **Feature**. Type this:

```
Create a Terraform module for a new microservice at modules/ecs-worker/.
It needs:
- VPC with public and private subnets across 2 AZs
- Security groups for the ECS tasks and ALB
- ECS Fargate task definition with configurable CPU and memory
- Application Load Balancer with HTTPS listener
- CloudWatch alarms for CPU utilization, memory utilization, and ALB 5xx error rate
- IAM role for the ECS task with least-privilege permissions
```

Kiro does not write code. It generates `requirements.md` first. Here is what it produced:

```markdown
## Requirement 1: Network Isolation

**User Story:** As an infrastructure engineer, I want the ECS tasks
to run in private subnets so that they are not directly reachable
from the internet.

### Acceptance Criteria

- WHEN the module is applied THE SYSTEM SHALL create a VPC with
  CIDR 10.0.0.0/16 unless overridden by var.vpc_cidr.
- THE SYSTEM SHALL create two private subnets in separate AZs
  with CIDRs derived from the VPC CIDR.
- THE SYSTEM SHALL create two public subnets for the ALB only.
- IF var.enable_nat_gateway is true THE SYSTEM SHALL create one
  NAT gateway per AZ for high availability.
- THE SYSTEM SHALL output the VPC ID, private subnet IDs, and
  public subnet IDs for use by other modules.

## Requirement 2: Compute

**User Story:** As an infrastructure engineer, I want the ECS task
definition to be configurable so that different services can use
different resource allocations.

### Acceptance Criteria

- THE SYSTEM SHALL create an ECS Fargate task definition with
  var.cpu and var.memory as configurable inputs.
- THE SYSTEM SHALL create an IAM execution role with permissions
  to pull from ECR and write to CloudWatch Logs.
- THE SYSTEM SHALL create a task role with no permissions by
  default, allowing callers to attach additional policies.
- WHEN var.container_port is set THE SYSTEM SHALL expose that
  port in the container definition.
```

I read through it. The VPC CIDR defaulting to 10.0.0.0/16 was wrong for my environment. I edited that line to say `var.vpc_cidr with no default, caller must provide it`. Clicked **Approve**.

Kiro generated `design.md`. It included the full module file structure, every variable name, every output name, and the IAM policy document. I approved it.

Then `tasks.md`:

```markdown
- [ ] Task 1: Create modules/ecs-worker/variables.tf with all input variables
- [ ] Task 2: Create modules/ecs-worker/main.tf - VPC and networking resources
- [ ] Task 3: Create modules/ecs-worker/main.tf - Security groups
- [ ] Task 4: Create modules/ecs-worker/main.tf - ECS task definition and IAM roles
- [ ] Task 5: Create modules/ecs-worker/main.tf - ALB and target group
- [ ] Task 6: Create modules/ecs-worker/main.tf - CloudWatch alarms
- [ ] Task 7: Create modules/ecs-worker/outputs.tf
- [ ] Task 8: Create modules/ecs-worker/README.md
```

I clicked **Run All Tasks**. Here is the `variables.tf` it produced:

```hcl
variable "name" {
  description = "Name of the ECS worker service. Used as a prefix for all resources."
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC. Must not overlap with existing VPCs in the account."
  type        = string
}

variable "environment" {
  description = "Deployment environment. Used for tagging and resource naming."
  type        = string
}

variable "team" {
  description = "Team that owns this service. Used for tagging."
  type        = string
}

variable "cpu" {
  description = "CPU units for the ECS Fargate task. Valid values: 256, 512, 1024, 2048, 4096."
  type        = number
  default     = 512
}

variable "memory" {
  description = "Memory in MiB for the ECS Fargate task. Must be compatible with the cpu value."
  type        = number
  default     = 1024
}

variable "container_port" {
  description = "Port the container listens on."
  type        = number
  default     = 8080
}

variable "container_image" {
  description = "Docker image URI for the ECS task. Include the tag."
  type        = string
}

variable "enable_nat_gateway" {
  description = "Whether to create NAT gateways for private subnet internet access."
  type        = bool
  default     = true
}

variable "alarm_cpu_threshold" {
  description = "CPU utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}

variable "alarm_memory_threshold" {
  description = "Memory utilization percentage that triggers the CloudWatch alarm."
  type        = number
  default     = 80
}

variable "alarm_5xx_threshold" {
  description = "Number of ALB 5xx errors per minute that triggers the CloudWatch alarm."
  type        = number
  default     = 10
}

variable "alarm_actions" {
  description = "List of ARNs to notify when an alarm fires. Typically an SNS topic ARN."
  type        = list(string)
  default     = []
}
```

Notice: `var.name` not `var.service_name`. Every variable has a description. The steering file worked.

Here is the CloudWatch alarm section from `main.tf`:

```hcl
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_cpu_threshold
  alarm_description   = "ECS CPU utilization is above ${var.alarm_cpu_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions

  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }

  tags = local.common_tags
}

resource "aws_cloudwatch_metric_alarm" "memory_high" {
  alarm_name          = "${local.name_prefix}-memory-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "MemoryUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_memory_threshold
  alarm_description   = "ECS memory utilization is above ${var.alarm_memory_threshold}%"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions

  dimensions = {
    ClusterName = aws_ecs_cluster.this.name
    ServiceName = aws_ecs_service.this.name
  }

  tags = local.common_tags
}

resource "aws_cloudwatch_metric_alarm" "alb_5xx_high" {
  alarm_name          = "${local.name_prefix}-alb-5xx-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = var.alarm_5xx_threshold
  alarm_description   = "ALB 5xx error count exceeded ${var.alarm_5xx_threshold} in the last minute"
  alarm_actions       = var.alarm_actions
  ok_actions          = var.alarm_actions
  treat_missing_data  = "notBreaching"

  dimensions = {
    LoadBalancer = aws_lb.this.arn_suffix
    TargetGroup  = aws_lb_target_group.this.arn_suffix
  }

  tags = local.common_tags
}
```

I ran `terraform validate` on the output. Zero errors. I ran `terraform plan` against a dev account. It created 34 resources. All correct.

Total time from opening Kiro to a passing `terraform plan`: 11 minutes. 8 of those were the spec phase.

### Cursor

I opened Cursor, pressed `Cmd+K`, and typed the same prompt. No steering files. No spec phase. Cursor started writing immediately.

The code appeared in about 35 seconds. Here is what the variables file looked like:

```hcl
variable "service_name" {
  description = "Name of the service"
  type        = string
}

variable "environment" {
  type = string
}

variable "cpu" {
  type    = number
  default = 256
}

variable "memory" {
  type    = number
  default = 512
}

variable "tags" {
  type    = map(string)
  default = {}
}
```

Three problems immediately visible.

First, it used `var.service_name` not `var.name`. My existing modules use `var.name`. Every reference to this variable in the calling module would need to change.

Second, the `environment` variable has no description. My team's convention requires descriptions on every variable. The PR would fail review.

Third, it added a `tags` variable I did not ask for. My modules use a `local.common_tags` block that merges required tags automatically. A separate `tags` variable breaks that pattern.

The CloudWatch alarm section had a more serious problem:

```hcl
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${var.service_name}-cpu-high"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "CPU utilization is high"
  alarm_actions       = []
}
```

The `evaluation_periods` and `period` and `threshold` are strings. In AWS provider 5.x these should be numbers. This would produce a plan-time error:

```
Error: Incorrect attribute value type
  on main.tf line 47, in resource "aws_cloudwatch_metric_alarm" "cpu_high":
  47:   evaluation_periods  = "2"
    |----------------
    | "2" is a string

Inappropriate value for attribute "evaluation_periods": a number is required.
```

Also the `alarm_actions` is hardcoded to an empty list. There is no variable for it. If you want to wire this to an SNS topic you have to edit the generated code directly.

I fixed all of this manually. It took about 12 minutes. So the total time was 47 seconds of generation plus 12 minutes of fixing. Longer than Kiro, with no documentation.

To be fair to Cursor: if I had used Cursor Rules to define my conventions, some of these problems would not have happened. Cursor Rules are the equivalent of Kiro's steering files. The difference is that Kiro generates the spec and enforces conventions in one workflow. With Cursor you have to set up Rules separately and remember to keep them updated.

### Windsurf

Windsurf's Cascade agent indexed my project automatically when I opened it. I did not configure anything. I opened a new Cascade conversation and typed the same prompt.

Windsurf read my existing modules. The variable names matched. It used `var.name`. It used `aws_lb` not `aws_alb`. The module structure matched my existing pattern.

But the CloudWatch alarms had hardcoded thresholds:

```hcl
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${local.name_prefix}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "ECS CPU utilization is above 80%"
  alarm_actions       = []
}
```

The threshold is hardcoded to 80. The `alarm_actions` is hardcoded to an empty list. These should be variables. I asked Windsurf to fix it.

It fixed it in one turn. The corrected version used `var.alarm_cpu_threshold` and `var.alarm_actions`. Correct.

Total time: about 6 minutes including the correction turn. Better than Cursor. Not as clean as Kiro on the first pass.

### Claude Code

Claude Code runs in the terminal. There is no IDE. I opened a terminal and ran:

```bash
claude
```

I typed the same prompt. Claude Code printed this before writing anything:

```
I'll read your existing Terraform modules first to understand your conventions.

Reading modules/networking/main.tf...
Reading modules/networking/variables.tf...
Reading modules/networking/outputs.tf...
Reading modules/ecs-service/main.tf...
Reading modules/ecs-service/variables.tf...
Reading modules/monitoring/main.tf...
Reading modules/monitoring/slo-payment-service.tf...
Reading services/payment-worker/main.tf...

I can see your conventions:
- Variable naming: var.name (not var.service_name)
- AWS provider: ~> 5.0 (using aws_lb not aws_alb)
- All variables have descriptions
- Common tags via local.common_tags
- Alarm actions via a list variable

Creating modules/ecs-worker/...
```

The Terraform it produced was the best of all six tools. Here is the IAM section, which is where most tools get lazy:

```hcl
data "aws_iam_policy_document" "ecs_execution_assume_role" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "ecs_execution" {
  statement {
    effect = "Allow"
    actions = [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]
    resources = [
      "${aws_cloudwatch_log_group.this.arn}:*",
    ]
  }
}

resource "aws_iam_role" "ecs_execution" {
  name               = "${local.name_prefix}-ecs-execution"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}

resource "aws_iam_role_policy" "ecs_execution" {
  name   = "${local.name_prefix}-ecs-execution"
  role   = aws_iam_role.ecs_execution.id
  policy = data.aws_iam_policy_document.ecs_execution.json
}

resource "aws_iam_role" "ecs_task" {
  name               = "${local.name_prefix}-ecs-task"
  assume_role_policy = data.aws_iam_policy_document.ecs_execution_assume_role.json
  tags               = local.common_tags
}
```

Notice the CloudWatch Logs permission is scoped to the specific log group ARN, not `*`. That is least-privilege. Kiro also did this. Cursor used `*` for the logs resource.

Claude Code also added a `README.md` without being asked. It included usage examples, variable descriptions, and outputs. I did not prompt this. It inferred from my existing modules that every module has a README.

The only problem: no IDE. I was looking at diffs in the terminal. To review the full output I had to open the files in a separate editor. That friction is real.

`terraform validate`: zero errors. `terraform plan`: 34 resources, all correct.

Total time: 4 minutes.

### OpenAI Codex

I used the Codex CLI:

```bash
codex "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms for CPU, memory, and ALB 5xx errors."
```

Codex generated the module. Here is the ALB resource it produced:

```hcl
resource "aws_alb" "main" {
  name               = "${var.service_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids

  tags = {
    Name = "${var.service_name}-alb"
  }
}
```

`aws_alb` is deprecated. The correct resource in AWS provider 5.x is `aws_lb`. This is not a breaking change but it generates a deprecation warning on every plan:

```
Warning: Argument is deprecated
  with aws_alb.main,
  on main.tf line 1, in resource "aws_alb" "main":
  1: resource "aws_alb" "main" {

Use aws_lb instead.
```

The ECS task definition had a more serious problem. It used the old JSON string format for container definitions:

```hcl
resource "aws_ecs_task_definition" "main" {
  family                   = var.service_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.ecs_execution.arn

  container_definitions = jsonencode([
    {
      name      = var.service_name
      image     = var.container_image
      cpu       = var.cpu
      memory    = var.memory
      essential = true
      portMappings = [
        {
          containerPort = var.container_port
          hostPort      = var.container_port
          protocol      = "tcp"
        }
      ]
    }
  ])
}
```

The `jsonencode` approach works but it is the old pattern. My existing modules use the `container_definitions` block syntax introduced in AWS provider 4.x. Mixing patterns in the same repo is a maintenance problem.

Also: `var.service_name` again. Codex did not read my existing modules.

I ran `terraform validate`. It passed. I ran `terraform plan`. It worked but with deprecation warnings. I would not merge this to main without fixing the `aws_alb` reference and the naming convention.

Codex is fast. The CLI is clean. But it is clearly optimized for Python. Its Terraform knowledge is about 18 months behind.

### Google Antigravity

I used Antigravity 2.0's CLI, which launched at Google I/O 2026:

```bash
antigravity "Create a Terraform module at modules/ecs-worker/ for a new microservice.
It needs a VPC, security groups, ECS Fargate task definition, ALB, and
CloudWatch alarms."
```

Antigravity started generating. Then this appeared:

```
Rate limit reached. You have used 18/20 of your daily requests.
Generation paused. Resume tomorrow or upgrade to AI Pro.
```

I was on the free tier. 20 requests per day. I had used 18 testing other things earlier. I upgraded to AI Pro ($20/month) and tried again.

This time it completed. Here is the provider block it generated:

```hcl
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}
```

AWS provider 4.x. My project is on 5.x. The resource arguments are different between these versions. The `aws_ecs_task_definition` resource changed significantly between 4.x and 5.x. Running `terraform init` with this would either downgrade my provider or fail with a version conflict.

I asked Antigravity to update it to 5.x. It updated the version constraint but kept the 4.x resource arguments. The `aws_ecs_task_definition` still used the old `container_definitions` JSON string format and the old `placement_constraints` syntax.

The Gemini 3.1 Pro model is genuinely good at reasoning. When I asked it to explain why it chose certain IAM permissions, the explanation was correct and detailed. The model understands infrastructure. The training data for Terraform is just stale.

I gave up on this task for Antigravity. The combination of quota interruptions and stale provider knowledge makes it unreliable for infrastructure work right now.

### Task 1 Summary

Claude Code produced the best Terraform on the first pass. Kiro produced the most maintainable output because of the spec trail and steering file enforcement. Windsurf was close but needed one correction. Cursor was fast but required manual fixes for naming conventions and type errors. Codex had deprecation warnings and stale patterns. Antigravity had quota problems and provider version issues.

## Task 2: Refactoring a Go Monitoring Script Across 12 Files

The task: change the signature of `getServiceMetrics` from this:

```go
func getServiceMetrics(name string) (*ServiceMetrics, error)
```

to this:

```go
func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)
```

The function is defined in `metrics.go` and called in 11 other files. `MetricOptions` is a new struct that needs to be defined in `types.go`. Every call site needs to pass a context and an options struct.

This is a real refactor I did last month. I ran it through all six tools to see which ones could handle it without missing files.

### Kiro

I used a bugfix spec for this. Click **Specs**. Click **+**. Select **Bug**. Type this:

```
The getServiceMetrics function in go/monitoring/metrics.go needs a new signature.

Current:
  func getServiceMetrics(name string) (*ServiceMetrics, error)

New:
  func getServiceMetrics(ctx context.Context, name string, opts MetricOptions) (*ServiceMetrics, error)

MetricOptions is a new struct that needs to be defined in go/monitoring/types.go.
It should have these fields:
  - Timeout time.Duration (default 30s)
  - IncludeHistogram bool (default false)
  - Tags map[string]string (default empty)

All 11 callers need to be updated. Where no context is available, use context.Background().
Where no options are needed, use MetricOptions{} as the zero value.
```

Kiro generated a bugfix spec that listed every file:

```markdown
## Bug Condition

The function getServiceMetrics does not accept a context or options,
making it impossible to add timeouts or pass metadata to the metrics
collection layer.

## Files Requiring Changes

1. go/monitoring/types.go - Add MetricOptions struct
2. go/monitoring/metrics.go - Update function signature
3. go/monitoring/collector.go - Update 3 call sites
4. go/monitoring/alerting.go - Update 2 call sites
5. go/monitoring/reporter.go - Update 1 call site
6. go/monitoring/aggregator.go - Update 2 call sites
7. go/monitoring/exporter.go - Update 1 call site
8. go/monitoring/health.go - Update 1 call site
9. go/monitoring/dashboard.go - Update 1 call site
10. go/monitoring/scheduler.go - Update 1 call site
11. go/monitoring/cleanup.go - Update 1 call site
12. go/monitoring/metrics_test.go - Update test call sites
```

It ran the tasks one by one. Here is the `MetricOptions` struct it added to `types.go`:

```go
// MetricOptions configures the behavior of getServiceMetrics.
type MetricOptions struct {
	// Timeout is the maximum duration to wait for metrics collection.
	// Defaults to 30 seconds if zero.
	Timeout time.Duration

	// IncludeHistogram controls whether histogram data is included in the response.
	// Histogram collection is expensive. Only enable when needed.
	IncludeHistogram bool

	// Tags are additional key-value pairs attached to the collected metrics.
	// These are forwarded to the metrics backend as dimensions.
	Tags map[string]string
}

// DefaultMetricOptions returns a MetricOptions with sensible defaults.
func DefaultMetricOptions() MetricOptions {
	return MetricOptions{
		Timeout:          30 * time.Second,
		IncludeHistogram: false,
		Tags:             make(map[string]string),
	}
}
```

It added a `DefaultMetricOptions()` constructor without being asked. That is the kind of thing a senior engineer adds automatically.

Here is how it updated a call site in `collector.go`:

```go
// Before
metrics, err := getServiceMetrics(svc.Name)
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}

// After
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()

metrics, err := getServiceMetrics(ctx, svc.Name, MetricOptions{
    Tags: map[string]string{
        "collector": "automated",
        "service":   svc.Name,
    },
})
if err != nil {
    log.Printf("failed to get metrics for %s: %v", svc.Name, err)
    continue
}
```

It did not just add `context.Background()` everywhere. It used the existing `ctx` from the function parameter where one was available. Where no context existed, it created one with a timeout. That is correct Go.

I ran `go build ./...`. Zero errors. I ran `go test ./...`. All tests passed.

Total time: 6 minutes including the spec phase.

### Claude Code

```bash
claude "Update getServiceMetrics in go/monitoring/metrics.go to accept
ctx context.Context, name string, opts MetricOptions.
Define MetricOptions in go/monitoring/types.go with fields:
  Timeout time.Duration
  IncludeHistogram bool
  Tags map[string]string
Add a DefaultMetricOptions() constructor.
Update all 11 callers. Use existing ctx where available, context.Background() where not.
Update tests in metrics_test.go."
```

Claude Code read all 12 files before making any changes. It printed a summary of what it found:

```
Found 11 call sites across 10 files:
- collector.go: 3 calls (has ctx parameter available)
- alerting.go: 2 calls (has ctx parameter available)
- reporter.go: 1 call (no ctx, will use context.Background())
- aggregator.go: 2 calls (has ctx parameter available)
- exporter.go: 1 call (no ctx, will use context.Background())
- health.go: 1 call (has ctx parameter available)
- dashboard.go: 1 call (no ctx, will use context.Background())
- scheduler.go: 1 call (has ctx parameter available)
- cleanup.go: 1 call (no ctx, will use context.Background())
- metrics_test.go: 4 test calls (will use context.Background())

Making changes...
```

The output was identical quality to Kiro. Same `DefaultMetricOptions()` constructor. Same context propagation logic. Same test updates.

`go build ./...`: zero errors. `go test ./...`: all passed.

Total time: 3 minutes. Faster than Kiro because there was no spec phase.

The difference is the paper trail. Kiro's bugfix spec documents what changed and why. Six months from now when someone asks why `getServiceMetrics` has a `MetricOptions` parameter, the spec is there. With Claude Code, the only record is the git commit message.

### Cursor

I opened Composer with `Cmd+Shift+I` and typed the same prompt.

Cursor updated 9 of 12 files. It missed `dashboard.go`, `cleanup.go`, and `scheduler.go`. These three files are in the same directory as the others. Cursor just did not index them.

I pointed Cursor to the missing files explicitly:

```
You missed these files:
- go/monitoring/dashboard.go
- go/monitoring/cleanup.go
- go/monitoring/scheduler.go
Please update the getServiceMetrics call sites in these files too.
```

Cursor updated them. But the updates in `dashboard.go` used `context.Background()` even though `dashboard.go` has a `ctx context.Context` parameter in its main function. Cursor did not propagate the context correctly.

I fixed that manually.

`go build ./...`: zero errors after manual fix. Total time: 9 minutes.

### Windsurf

Windsurf missed 2 files: `cleanup.go` and `scheduler.go`. Same problem as Cursor. I pointed it to the missing files and it updated them correctly, including proper context propagation.

Total time: 7 minutes.

### Codex

Codex updated `metrics.go` with the new signature. It updated `collector.go` with 2 of 3 call sites. It missed the third call site in `collector.go` and all other files.

I asked it to find the remaining call sites. It found 4 more. I asked again. It found 2 more. After 4 rounds of prompting it had updated 8 of 11 files. I gave up and did the remaining 3 manually.

Codex does not handle large multi-file refactors well. It loses track of what it has already changed.

### Antigravity

Antigravity hit a quota limit after updating 3 files. I had already used most of my daily requests. I stopped testing it on this task.

### Task 2 Summary

Claude Code and Kiro both handled this perfectly. Claude Code was faster. Kiro left documentation. Cursor and Windsurf missed files and needed correction. Codex lost track of the scope. Antigravity hit quota limits.

## Task 3: Generating a Datadog SLO Monitor

The task: generate a Datadog monitor for a new SLO. The requirements are 99.9% availability, 30-day rolling window, alert at 99.5%, warn at 99.7%. The monitor must include a `runbook_url` tag. It must follow the same structure as the existing `slo-payment-service.tf` in my [observability](/blog/observability-metrics-tracing) monitoring module.

This task is specifically about whether a tool can follow your team's conventions without you repeating them every time.

### Kiro

I already had the `terraform-standards.md` steering file from Task 1. I added one more steering file specifically for monitoring:

```markdown
---
inclusion: fileMatch
fileMatchPattern: "modules/monitoring/**/*.tf"
---

# Monitoring Module Standards

## SLO Monitor Structure

All SLO monitors use the datadog_service_level_objective resource type.
Follow the exact structure in: #[[file:modules/monitoring/slo-payment-service.tf]]

## Required Fields

Every SLO monitor must have:

- name following the pattern: "SLO - [Service Name] - [Metric]"
- description explaining what the SLO measures
- thresholds block with timeframe = "30d"
- warning threshold at 99.7
- critical threshold at 99.5
- tags including "runbook_url:[url]" and "team:[team_name]"

## Naming Convention

The Terraform resource name follows: slo*[service_name]*[metric]
Example: slo_payment_service_availability
```

I asked Kiro to generate the monitor for a new service called `inventory-api`.

Here is what it produced:

```hcl
resource "datadog_service_level_objective" "slo_inventory_api_availability" {
  name        = "SLO - Inventory API - Availability"
  type        = "metric"
  description = "Measures the availability of the Inventory API service. Availability is defined as the percentage of requests that return a non-5xx response."

  query {
    numerator   = "sum:trace.web.request.hits{service:inventory-api,!http.status_class:5xx}.as_count()"
    denominator = "sum:trace.web.request.hits{service:inventory-api}.as_count()"
  }

  thresholds {
    timeframe = "30d"
    target    = 99.9
    warning   = 99.7
  }

  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "runbook_url:https://runbooks.internal/inventory-api/availability",
    "managed_by:terraform",
  ]
}

resource "datadog_monitor" "slo_inventory_api_availability_alert" {
  name    = "SLO Alert - Inventory API - Availability"
  type    = "slo alert"
  message = <<-EOT
    {{#is_alert}}
    Inventory API availability SLO is below 99.5% over the last 30 days.

    Current burn rate: {{value}}

    Runbook: https://runbooks.internal/inventory-api/availability

    @pagerduty-inventory-api
    {{/is_alert}}

    {{#is_warning}}
    Inventory API availability SLO is below 99.7% over the last 30 days.
    @slack-inventory-api-alerts
    {{/is_warning}}
  EOT

  query = "error_budget(\"${datadog_service_level_objective.slo_inventory_api_availability.id}\").over(\"30d\") > 1"

  thresholds = {
    critical = 1
    warning  = 0.5
  }

  tags = [
    "service:inventory-api",
    "team:${var.team}",
    "environment:${var.environment}",
    "managed_by:terraform",
  ]
}
```

This is exactly what I would have written manually. The `runbook_url` tag is there. The thresholds match. The naming convention matches. The message template matches my existing monitors.

I did not tell Kiro any of this in the prompt. The steering file told it.

### Claude Code

```bash
claude "Generate a Datadog SLO monitor for inventory-api.
99.9% availability target, 30-day rolling window.
Alert at 99.5%, warn at 99.7%.
Follow the same structure as modules/monitoring/slo-payment-service.tf."
```

Claude Code read `slo-payment-service.tf` and generated a correct monitor. The structure matched. The thresholds were correct.

But it did not include the `runbook_url` tag. That requirement is in my steering file, which Claude Code does not have access to. It does not have persistent project memory between sessions.

I told it to add the `runbook_url` tag. It added it. One correction turn.

The output after correction was identical to Kiro's output. But I had to remember to ask for the `runbook_url`. With Kiro, I never have to remember. The steering file remembers for me.

### Cursor

Cursor generated a generic Datadog SLO monitor. It did not read my existing `slo-payment-service.tf`. The structure was different. The naming convention was different. No `runbook_url` tag. The thresholds were correct because I specified them in the prompt.

I spent about 8 minutes correcting it to match my team's conventions.

### Windsurf

Windsurf read my existing monitoring files and generated a monitor that was close to correct. The structure matched. The naming convention matched. But it missed the `runbook_url` tag and used a slightly different message template format.

Two correction turns to get it right.

### Codex and Antigravity

Codex generated a monitor using the Datadog Terraform provider 2.x syntax. My project uses 3.x. The `datadog_service_level_objective` resource arguments changed between these versions. The output would not apply without errors.

Antigravity generated a monitor that was structurally correct but used hardcoded values instead of variables. It also missed the `runbook_url` tag.

### Task 3 Summary

Kiro wins this task by a significant margin. Steering files are the killer feature for teams with established conventions. You write the convention once. Every future generation follows it automatically. No other tool in this list has an equivalent that works this well.

Claude Code is second. It reads your existing files and matches the pattern. But it does not have persistent memory, so you have to remind it of requirements that are not visible in the files.

## Task 4: Debugging a Silent Kubernetes Failure

The task: a deployment was running but not serving traffic. No errors in the logs. The pods were in `Running` state. The ALB health checks were passing. But requests to the service were returning 503.

This is the worst kind of production issue. Everything looks fine. Nothing is fine.

Here is the deployment YAML:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inventory-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inventory-api
      version: v2
  template:
    metadata:
      labels:
        app: inventory-api
        version: v2
    spec:
      containers:
        - name: inventory-api
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/inventory-api:v2.1.4
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
```

Here is the Service YAML:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: inventory-api
  namespace: production
spec:
  selector:
    app: inventory-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP
```

Here is the Ingress YAML:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: inventory-api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
    - host: inventory-api.internal.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: inventory-api
                port:
                  number: 80
```

I pasted all three files into each tool and asked: why is this not routing traffic?

### Claude Code

Claude Code spotted the first issue in about 8 seconds.

The `selector` in the Service is `app: inventory-api`. The `matchLabels` in the Deployment is `app: inventory-api, version: v2`. The Service selector only matches pods that have `app: inventory-api`. The pods have both `app: inventory-api` and `version: v2`. The Service selector is a subset match, so it should work.

Wait. Claude Code caught something I missed. It said:

```
The Service selector looks correct at first glance, but look at the
Deployment's selector.matchLabels:

  selector:
    matchLabels:
      app: inventory-api
      version: v2

The Deployment requires pods to have BOTH labels. The Service selector
only requires app: inventory-api. This is fine for routing.

But here is the actual problem: the Ingress uses target-type: ip, which
means the ALB routes directly to pod IPs. The ALB health check target
is port 80 (the Service port). But the pods listen on port 8080.

The ALB is health-checking port 80 on the pod IPs. Port 80 is not open
on the pods. The health checks are failing silently because the ALB
is configured to pass health checks even when the target is unreachable
(the default behavior with some ALB controller versions).

The fix: add this annotation to the Ingress:
  alb.ingress.kubernetes.io/healthcheck-port: "8080"

Or change the Service targetPort to match the ALB health check port.
```

That was the actual bug. The ALB was health-checking port 80 on the pod IPs. The pods listen on 8080. The health checks were failing but the ALB was still routing traffic because the health check failure threshold had not been reached yet. The 503s were intermittent.

Claude Code found this without me giving it any additional context. It read all three files together and traced the port mismatch through the entire routing chain.

### Kiro

Kiro found the port mismatch. It took two prompts. The first prompt identified the Service selector as potentially problematic (it was not). The second prompt, after I told it the selector was fine, found the ALB health check port issue.

### Windsurf

Windsurf found both issues in one pass. Its Cascade agent read all three files together and traced the routing chain correctly. Comparable to Claude Code.

### Cursor

Cursor found the Service selector issue (which was not actually a problem) and stopped there. It did not trace the ALB health check port mismatch. I had to give it more context.

### Codex and Antigravity

Both identified the Service selector as the problem. Neither found the ALB health check port issue. The selector was not actually the problem.

### Task 4 Summary

Claude Code and Windsurf tied. Both traced the full routing chain and found the actual bug without additional prompting. Kiro found it in two prompts. Cursor, Codex, and Antigravity identified a non-issue and stopped.

The difference here is context window and reasoning quality. Claude Code and Windsurf read all three files together and reasoned about the full routing path. The other tools read the files but did not connect the dots across all three.

## Task 5: Writing an Incident Runbook

The task: generate a structured runbook from a postmortem summary.

Here is the postmortem I gave each tool:

```
Incident: INS-2847
Date: 2026-04-14 02:17 UTC
Duration: 47 minutes
Severity: P1
Service: payment-worker

Summary:
Redis connection pool exhaustion caused payment processing to fail.
The payment-worker service uses Redis for distributed locking during
payment processing. At 02:17 UTC, Redis connection pool hit the
configured maximum of 100 connections. New payment requests could not
acquire locks and failed with a 503 error.

Root cause:
A deployment at 01:55 UTC increased the payment-worker replica count
from 5 to 15 without updating the Redis connection pool size. Each
replica holds up to 10 connections. 15 replicas * 10 connections = 150
connections, exceeding the pool maximum of 100.

Resolution:
1. Scaled payment-worker back to 5 replicas at 02:31 UTC
2. Updated Redis connection pool max to 200 at 02:41 UTC
3. Scaled payment-worker back to 15 replicas at 02:44 UTC
4. Confirmed payment processing resumed at 02:44 UTC

Action items:
- Add pre-deployment check for Redis connection pool capacity
- Add CloudWatch alarm for Redis connection count > 80% of max
- Update deployment runbook to include Redis capacity check
```

### Kiro

I had a steering file for runbook format. The file lives at `.kiro/steering/runbook-template.md` with `inclusion: manual` so it only loads when I explicitly reference it. Here is the full file:

```markdown
---
inclusion: manual
---

# Runbook Template

All runbooks follow this structure:

## [Service Name]: [Issue Type]

### Detection

How to detect this issue. Include specific metrics, log queries,
and alert names.

### Triage

Step-by-step triage checklist. Each step should be a single action
with an expected outcome.

### Resolution

Numbered resolution steps. Each step includes:

- The exact command to run
- The expected output
- What to do if the output is unexpected

### Rollback

How to undo the resolution if it makes things worse.

### Related Runbooks

Links to related runbooks.

### Post-Incident

What to do after the incident is resolved.
```

The structure it enforces as a diagram:

```mermaid
flowchart TD
    A[Runbook] --> B["Detection\nMetrics, log queries, alert names"]
    A --> C["Triage\nStep-by-step checklist with expected outcomes"]
    A --> D["Resolution\nExact commands + expected output + fallback"]
    A --> E["Rollback\nHow to undo if resolution makes things worse"]
    A --> F["Related Runbooks\nLinks to connected runbooks"]
    A --> G["Post-Incident\nTickets, alarms, schedule postmortem"]
```

I asked Kiro to generate the runbook using this template. Here is the full output it produced.

#### Payment Worker: Redis Connection Pool Exhaustion

##### Detection

This issue is detected by one or more of the following.

Alert: `payment-worker-redis-connections-high` fires when Redis connection count exceeds 80% of the configured maximum.

Log pattern: search CloudWatch Logs Insights for:

```sql
fields @timestamp, @message
| filter @logStream like /payment-worker/
| filter @message like /connection pool exhausted/
| sort @timestamp desc
| limit 50
```

Metric: `redis.clients.connected` in Datadog. Navigate to Metrics, search `redis.clients.connected`, filter by `service:payment-worker`.

##### Triage

1. Check current Redis connection count:

```bash
redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients
```

Expected: a number. If it equals or exceeds `maxclients`, the pool is exhausted.

2. Check current payment-worker replica count:

```bash
kubectl get deployment payment-worker -n production -o jsonpath='{.spec.replicas}'
```

3. Calculate expected connection usage. Check `REDIS_MAX_CONNECTIONS` in the ConfigMap:

```bash
kubectl get configmap payment-worker-config -n production -o yaml | grep REDIS_MAX_CONNECTIONS
```

Multiply replicas by `REDIS_MAX_CONNECTIONS`. If the result exceeds `maxclients`, that is the problem.

4. Check Redis `maxclients` setting:

```bash
redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients
```

5. If `replicas * connections_per_replica > maxclients`, proceed to Resolution.

##### Resolution

**Option A: Scale down replicas (immediate, temporary)**

1. Scale payment-worker to a safe replica count:

```bash
kubectl scale deployment payment-worker -n production --replicas=5
```

Expected: `deployment.apps/payment-worker scaled`

2. Verify pods are terminating:

```bash
kubectl get pods -n production -l app=payment-worker -w
```

Wait until only 5 pods are in `Running` state.

3. Verify payment processing has resumed:

```bash
kubectl logs -n production -l app=payment-worker --tail=20 | grep "payment processed"
```

**Option B: Increase Redis maxclients (permanent fix)**

1. Update the Redis `maxclients` setting:

```bash
redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 200
```

Expected: `OK`

2. Verify the change:

```bash
redis-cli -h $REDIS_HOST -p $REDIS_PORT config get maxclients
```

Expected output: `maxclients` on one line, `200` on the next.

3. Scale payment-worker back to the desired replica count:

```bash
kubectl scale deployment payment-worker -n production --replicas=15
```

4. Monitor Redis connection count for 5 minutes:

```bash
watch -n 5 'redis-cli -h $REDIS_HOST -p $REDIS_PORT info clients | grep connected_clients'
```

##### Rollback

If Option B makes things worse:

1. Revert maxclients to the previous value:

```bash
redis-cli -h $REDIS_HOST -p $REDIS_PORT config set maxclients 100
```

2. Scale payment-worker back to 5 replicas:

```bash
kubectl scale deployment payment-worker -n production --replicas=5
```

3. Escalate to the Redis infrastructure team.

##### Related Runbooks

- Redis Memory Pressure
- Payment Worker High Error Rate
- Kubernetes Deployment Rollback

##### Post-Incident

1. File a ticket to add pre-deployment Redis capacity check
2. Add CloudWatch alarm for Redis connection count above 80% of max
3. Update the deployment runbook to include Redis capacity check
4. Schedule a postmortem review within 48 hours

This is production-ready. I could hand this to an on-call engineer at 2am and they could follow it without asking me anything.

### Claude Code

Claude Code generated an excellent runbook. The commands were correct. The structure was logical. But it invented its own format. The section names were different. The order was different. The log query syntax was different from what my team uses.

I asked it to reformat to match my template. It did so correctly. Two turns instead of one.

The content quality was identical to Kiro's output. The difference is that Kiro followed my template automatically because of the steering file.

### Cursor

Cursor generated a basic runbook. It had the right sections but the commands were incomplete. The `kubectl` commands were missing the namespace flag. The Redis commands were missing the host and port flags. The log query was a generic CloudWatch Logs query, not the specific query format my team uses.

I spent about 10 minutes editing it.

### Windsurf, Codex, Antigravity

Windsurf generated a runbook that was better than Cursor but still needed editing. The commands were mostly correct but the structure did not match my template.

Codex generated a runbook that was mostly Python-flavored. It suggested using `boto3` to query CloudWatch Logs instead of the CloudWatch Logs Insights query language. That is not how my team works.

Antigravity generated a reasonable runbook but hit quota limits before completing the post-incident section.

### Task 5 Summary

Kiro wins again because of steering files. When you have a template, Kiro follows it. Claude Code generates excellent content but needs a correction turn to match your format. Cursor, Windsurf, Codex, and Antigravity all require significant editing.

## What I Actually Use and Why

I am going to be direct.

**Kiro is the right tool for production SRE work on a team.**

SRE work is not solo work. You are writing Terraform that three other engineers will review. You are writing runbooks that an on-call engineer will read at 2am. You are generating monitors that need to match the conventions your team agreed on six months ago.

Kiro is the only tool that enforces those conventions automatically. Steering files mean you write the rule once and every future generation follows it. The spec workflow means every change has a paper trail. When someone asks why a module was written a certain way, you have an answer.

The spec workflow feels slow the first week. After that, you stop noticing it. What you do notice is that you stop having conversations about why the code looks different from everything else.

**Claude Code is the right tool for complex autonomous tasks.**

When I need to refactor a massive codebase, debug a subtle issue across multiple files, or write a complex automation script, Claude Code on Opus 4.7 is the most capable tool available. The 1M token context window is not a marketing number. It genuinely changes what is possible. It can read your entire infrastructure repo and write code that looks like it belongs there.

The terminal-only interface is a real limitation. I use it alongside Cursor for the IDE experience.

**Cursor is the right tool for daily inline editing.**

Cursor is the most polished IDE experience. The autocomplete is fast and accurate. The chat is responsive. It is the right tool if you want AI assistance without changing how you work. I use it for quick fixes, small changes, and anything where I want to stay in flow.

**Windsurf is the right tool if you want Cursor quality at a lower price.**

Windsurf 2.0 with Devin integration is genuinely impressive. The SWE-1.6 model is strong. The pricing is more predictable than Antigravity. If your team is budget-conscious and does not need Kiro's spec workflow, Windsurf is a solid choice.

**Codex is not the right tool for infrastructure work.**

Codex is excellent for Python automation and data pipelines. It is not optimized for Terraform, Go, or YAML-heavy infrastructure repos. The token-based pricing since April 2026 makes costs unpredictable. Use it for what it is good at.

**Antigravity is not ready for production infrastructure work.**

The Gemini 3.1 Pro model is capable. Antigravity 2.0 launched at Google I/O 2026 with real improvements. But the quota interruptions, the March 2026 pricing chaos, and the stale infrastructure training data make it unreliable for production SRE work right now. Check back in six months.

## My Personal Stack

I use three tools, not one.

**[Kiro](https://kiro.dev?utm_source=ranti.dev)** for new features and anything that needs to follow team conventions.

**Claude Code** for large refactors, debugging complex issues, and anything that requires reading the whole codebase.

**Cursor** for daily editing, autocomplete, quick fixes, and small changes.

This is not a failure of any single tool. It is the reality of 2026. The tools are specialized. The engineers who pick one and stick with it are leaving performance on the table.

## Quick Reference

```mermaid
flowchart LR
    A{What are you doing?} --> B["New Terraform module\non a team project"]
    A --> C["New Terraform module\nsolo"]
    A --> D[Multi-file Go refactor]
    A --> E["Kubernetes YAML\ndebugging"]
    A --> F["Incident runbook\ngeneration"]
    A --> G["Datadog monitor\nwith team conventions"]
    A --> H[Quick inline fix]
    A --> I["Python automation\nscript"]
    A --> J["Large codebase\nexploration"]
    A --> K["Budget-conscious\nteam"]

    B --> L[Kiro]
    C --> M[Claude Code]
    D --> M
    E --> N[Claude Code or Windsurf]
    F --> L
    G --> L
    H --> O[Cursor]
    I --> P[Codex or Claude Code]
    J --> M
    K --> Q[Windsurf]
```

## One Last Thing

The question is not which AI IDE is best.

The question is which AI IDE is best for this specific task.

Kiro wins for structured, team-based, convention-heavy work. Claude Code wins for raw capability. Cursor wins for daily ergonomics.

Pick based on your actual workflow. Not based on benchmarks. Not based on what is trending on social media this week.

The tools that will make you faster are the ones that fit how you already work and then push you slightly beyond it.


---

<!-- METADATA_START -->
## Metadata & Citations

### Further Reading
- [Kiro IDE: Building a Production API With Spec-Driven AI (Hands-On Tutorial)](https://www.ranti.dev/blog/kiro-ide-spec-driven-development.md)
- [I'm Officially an AWS Community Builder! The Complete Guide to What It Is, What You Get, and How to Make the Most of It](https://www.ranti.dev/blog/aws-community-builder.md)
- [Breaking Production on Purpose: A Guide to Chaos Engineering](https://www.ranti.dev/blog/chaos-engineering-aws-fis.md)

### Navigation
- [Back to Bio Hub](https://www.ranti.dev/.md)
- [Full Site Manifest](https://www.ranti.dev/llms.txt)

```json
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE",
  "author": {
    "@type": "Person",
    "name": "Rantideb Howlader"
  },
  "datePublished": "2026-05-21T00:00:00.000Z",
  "url": "https://www.ranti.dev/blog/kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": true
}
```

### BibTeX
```bibtex
@article{kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity_2026,
  author = {Rantideb Howlader},
  title = {Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE},
  journal = {Rantideb Howlader Portfolio},
  year = {2026},
  url = {https://www.ranti.dev/blog/kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity},
  note = {Accessed: 2026-05-31}
}
```

### IEEE
Rantideb Howlader, "Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE," Rantideb Howlader Portfolio, 2026. [Online]. Available: https://www.ranti.dev/blog/kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity. [Accessed: 2026-05-31].

### APA
Rantideb Howlader. (2026). Kiro vs Cursor vs Windsurf vs Claude Code vs Codex vs Antigravity: What I Actually Use as an SRE. Rantideb Howlader. Retrieved from https://www.ranti.dev/blog/kiro-vs-cursor-vs-windsurf-vs-claude-vs-codex-vs-antigravity

--- 
*This content is provided in research-grade Markdown format. Required Attribution: Cite as Rantideb Howlader (2026).*
<!-- METADATA_END -->