Smarter Cost Optimization with Karpenter: A Practical Migration Guide

Sep 19, 2025
4 min read

By Ananta Cloud Engineering Team | September 17, 2025

When most discussions around Kubernetes cost optimization revolve around dashboards and metrics, this post is focused on actions you can actually take. At Ananta Cloud, we've supported several migrations from Cluster Autoscaler (CA) to Karpenter, and consistently observed faster scaling, better bin-packing, and effective use of Spot instances—when configured correctly.

This is the guide we wish we had when we started.

Why Consider Karpenter?

Cluster Autoscaler (CA) operates by scaling pre-defined node groups. It's reliable and predictable—until your traffic spikes or your workloads demand diverse instance types.

Karpenter, on the other hand, dynamically provisions compute resources based on real-time pod needs. This

makes it possible to:

Reduce scale-up times significantly
Consolidate underused resources
Leverage Spot capacity with more flexibility

For teams managing spiky or unpredictable workloads, Karpenter has helped trim compute waste by 15–30% and reduce provisioning latency from minutes to seconds.

Conceptual Differences: CA vs Karpenter

Cluster Autoscaler (CA)

Checks for pending pods and picks a matching node group
Limited to pre-set instance types, sizes, and availability zones
Performs well with stable traffic and fine-tuned capacity models

Karpenter

Understands exact pod needs (CPU, memory, GPU, taints, etc.)
Selects optimal EC2 instances across families and AZs in real time
Offers automated consolidation and Spot integration
Honors Pod Disruption Budgets (PDBs) during scale-downs

When Karpenter Might Not Be Right

Not every team or environment is ready for Karpenter. Stick with CA if:

You're in regulated environments requiring fixed ASGs and approved instance types
Your workloads are entirely static and predictable
Your team operates under strict change control with minimal tolerance for node churn

In such cases, focus on maximizing CA first—right-sizing requests, splitting node groups by workload type, and introducing Spot wherever safe.

Pre-Migration Checklist

Before enabling Karpenter in production, ensure these foundational pieces are in place:

IAM roles and IRSA for EC2/SSM/ASG access
Subnet and security group isolation per environment
Consolidation disabled during the first week
PDBs on critical workloads
Spot interruption handling and lifecycle hooks
IP and ENI limits aligned with pod density
Account for DaemonSet overhead (CNI, CSI, logging)

A Phased 2-Week Rollout Plan

Week 0 — Capture Baseline Metrics

Gather p95/p99 latency, error rates, pending pods, time to readiness, and daily cost by namespace using OpenCost, Kubecost, or CloudWatch.

Week 1 — Introduce Karpenter (Consolidation Off)

Deploy Karpenter in your EKS cluster
Create a NodePool + NodeClass (or Provisioner + AWSNodeTemplate for older versions)
Route one non-critical workload to the new capacity using nodeSelector
Observe instance selection, bin-packing behavior, and pod readiness

Week 2 — Expand Usage

Migrate more stateless workloads
Use Spot instances for batch/queue jobs with retry logic
Enable consolidation in a controlled maintenance window
Keep one CA node group active for fallback

Sample Karpenter Manifests

Newer Versions (NodePool + EC2NodeClass)

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general
spec:
  template:
    spec:
      nodeClassRef:
        name: general-ec2
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m6i", "c7i", "r7i"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/capacity-type
          operator: In
          values: ["spot", "on-demand"]
  disruption:
    consolidationPolicy: None
    expireAfter: 720h

Older Versions (Provisioner + AWSNodeTemplate)

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: general
spec:
  requirements:
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: ["m6i", "c7i", "r7i"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64", "arm64"]
    - key: karpenter.k8s.aws/capacity-type
      operator: In
      values: ["spot", "on-demand"]
  consolidation:
    enabled: false
  ttlSecondsUntilExpired: 2592000
  providerRef:
    name: general

Pin a Workload to Karpenter

Example deployment manifest:

spec:
  nodeSelector:
    workload: general

Use this to route low-risk services to Karpenter-managed nodes for early testing.

What We Noticed After Migration

Positive Outcomes

Scale-up latency dropped to 30–60s (down from 2–4 minutes)
More efficient instance usage and bin-packing
Spot usage became reliable thanks to broader instance selection

Common Pitfalls

Consolidation paused due to PDB constraints—schedule it during off-peak hours
Sticky sessions interfered with routing—stateless or header-based stickiness is better
IP exhaustion occurred with over-dense nodes—match ENI limits carefully
DaemonSets used more CPU/memory than expected—include their overhead in requests

Proving Cost Savings

To validate ROI:

Tag Karpenter nodes and track $/day per tag
Monitor pod pending time, readiness, and utilization hourly
Compare metrics for a controlled namespace over 7 days

If results aren't favorable, pause, evaluate, and adjust.

A Spot Strategy That Works

Start with batch/stateless workloads
Ensure retry logic, idempotency, and checkpointing
Spread across instance types and AZs
Treat Spot as a cost optimization tool, not a reliability guarantee

Safe First Win: Your First Migration Step

Install Karpenter
Configure a NodePool with consolidation disabled
Migrate one batch job using Spot capacity
Monitor behavior for 72 hours
If stable, move a stateless API
After a week, enable consolidation and observe rollout and eviction behavior

Final Thoughts

Karpenter isn’t a silver bullet. Cluster Autoscaler works well for stable environments with tightly modeled capacity. But if you're running heterogeneous, bursty workloads, or trying to get more from fewer resources, Karpenter can offer real improvements in agility and efficiency.

Looking for Expert Guidance?

At Ananta Cloud, we’ve helped teams optimize EKS environments across industries. If you're evaluating Karpenter or planning a migration, get in touch—we’ll help you identify the right first moves.

Email: hello@anantacloud.com | LinkedIn: @anantacloud | Schedule Meeting