top of page

Smarter Cost Optimization with Karpenter: A Practical Migration Guide

  • Sep 19
  • 4 min read

By Ananta Cloud Engineering Team | September 17, 2025


Smart Cost Optimization with Karpenter

When most discussions around Kubernetes cost optimization revolve around dashboards and metrics, this post is focused on actions you can actually take. At Ananta Cloud, we've supported several migrations from Cluster Autoscaler (CA) to Karpenter, and consistently observed faster scaling, better bin-packing, and effective use of Spot instances—when configured correctly.


This is the guide we wish we had when we started.


Why Consider Karpenter?

Cluster Autoscaler (CA) operates by scaling pre-defined node groups. It's reliable and predictable—until your traffic spikes or your workloads demand diverse instance types.

Karpenter, on the other hand, dynamically provisions compute resources based on real-time pod needs. This


makes it possible to:

  • Reduce scale-up times significantly

  • Consolidate underused resources

  • Leverage Spot capacity with more flexibility


For teams managing spiky or unpredictable workloads, Karpenter has helped trim compute waste by 15–30% and reduce provisioning latency from minutes to seconds.


Conceptual Differences: CA vs Karpenter

Cluster Autoscaler (CA)

  • Checks for pending pods and picks a matching node group

  • Limited to pre-set instance types, sizes, and availability zones

  • Performs well with stable traffic and fine-tuned capacity models


Karpenter

  • Understands exact pod needs (CPU, memory, GPU, taints, etc.)

  • Selects optimal EC2 instances across families and AZs in real time

  • Offers automated consolidation and Spot integration

  • Honors Pod Disruption Budgets (PDBs) during scale-downs


When Karpenter Might Not Be Right

Not every team or environment is ready for Karpenter. Stick with CA if:

  • You're in regulated environments requiring fixed ASGs and approved instance types

  • Your workloads are entirely static and predictable

  • Your team operates under strict change control with minimal tolerance for node churn


In such cases, focus on maximizing CA first—right-sizing requests, splitting node groups by workload type, and introducing Spot wherever safe.


Pre-Migration Checklist

Before enabling Karpenter in production, ensure these foundational pieces are in place:

  • IAM roles and IRSA for EC2/SSM/ASG access

  • Subnet and security group isolation per environment

  • Consolidation disabled during the first week

  • PDBs on critical workloads

  • Spot interruption handling and lifecycle hooks

  • IP and ENI limits aligned with pod density

  • Account for DaemonSet overhead (CNI, CSI, logging)


A Phased 2-Week Rollout Plan

Week 0 — Capture Baseline Metrics

  • Gather p95/p99 latency, error rates, pending pods, time to readiness, and daily cost by namespace using OpenCost, Kubecost, or CloudWatch.


Week 1 — Introduce Karpenter (Consolidation Off)

  • Deploy Karpenter in your EKS cluster

  • Create a NodePool + NodeClass (or Provisioner + AWSNodeTemplate for older versions)

  • Route one non-critical workload to the new capacity using nodeSelector

  • Observe instance selection, bin-packing behavior, and pod readiness


Week 2 — Expand Usage

  • Migrate more stateless workloads

  • Use Spot instances for batch/queue jobs with retry logic

  • Enable consolidation in a controlled maintenance window

  • Keep one CA node group active for fallback


Sample Karpenter Manifests

Newer Versions (NodePool + EC2NodeClass)

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general
spec:
  template:
    spec:
      nodeClassRef:
        name: general-ec2
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m6i", "c7i", "r7i"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/capacity-type
          operator: In
          values: ["spot", "on-demand"]
  disruption:
    consolidationPolicy: None
    expireAfter: 720h

Older Versions (Provisioner + AWSNodeTemplate)

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: general
spec:
  requirements:
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: ["m6i", "c7i", "r7i"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64", "arm64"]
    - key: karpenter.k8s.aws/capacity-type
      operator: In
      values: ["spot", "on-demand"]
  consolidation:
    enabled: false
  ttlSecondsUntilExpired: 2592000
  providerRef:
    name: general

Pin a Workload to Karpenter

Example deployment manifest:

spec:
  nodeSelector:
    workload: general

Use this to route low-risk services to Karpenter-managed nodes for early testing.


What We Noticed After Migration

Positive Outcomes

  • Scale-up latency dropped to 30–60s (down from 2–4 minutes)

  • More efficient instance usage and bin-packing

  • Spot usage became reliable thanks to broader instance selection


Common Pitfalls

  • Consolidation paused due to PDB constraints—schedule it during off-peak hours

  • Sticky sessions interfered with routing—stateless or header-based stickiness is better

  • IP exhaustion occurred with over-dense nodes—match ENI limits carefully

  • DaemonSets used more CPU/memory than expected—include their overhead in requests


Proving Cost Savings

To validate ROI:

  • Tag Karpenter nodes and track $/day per tag

  • Monitor pod pending time, readiness, and utilization hourly

  • Compare metrics for a controlled namespace over 7 days


If results aren't favorable, pause, evaluate, and adjust.


A Spot Strategy That Works

  • Start with batch/stateless workloads

  • Ensure retry logic, idempotency, and checkpointing

  • Spread across instance types and AZs

  • Treat Spot as a cost optimization tool, not a reliability guarantee


Safe First Win: Your First Migration Step

  1. Install Karpenter

  2. Configure a NodePool with consolidation disabled

  3. Migrate one batch job using Spot capacity

  4. Monitor behavior for 72 hours

  5. If stable, move a stateless API

  6. After a week, enable consolidation and observe rollout and eviction behavior


Final Thoughts

Karpenter isn’t a silver bullet. Cluster Autoscaler works well for stable environments with tightly modeled capacity. But if you're running heterogeneous, bursty workloads, or trying to get more from fewer resources, Karpenter can offer real improvements in agility and efficiency.



Looking for Expert Guidance?

At Ananta Cloud, we’ve helped teams optimize EKS environments across industries. If you're evaluating Karpenter or planning a migration, get in touch—we’ll help you identify the right first moves.




average rating is 4 out of 5, based on 150 votes, Recommend it

Stay ahead with the latest insights delivered right to you.

  • Straightforward DevOps insights

  • Professional advice you can trust

  • Cutting-edge trends in IaC, automation, and DevOps

  • Proven best practices from the field

bottom of page