top of page

Preventing Autoscaler Flapping: Kubernetes HPA Tolerance in Depth

  • Aug 26
  • 3 min read

Background: Why Tolerance Matters in HPA

The Horizontal Pod Autoscaler (HPA) dynamically adjusts pod replicas based on actual metrics like CPU or memory utilization. To avoid erratic scaling due to minor metric fluctuations—known as “flapping” - Kubernetes applies a tolerance threshold: changes within ±10% of the target metric do not trigger scaling actions.


This built-in tolerance ensures stability, but it’s fixed cluster-wide and can't be customized per workload in versions prior to v1.33.


However, a fixed 10% tolerance can be problematic:

  • In large deployments, a 10% swing may represent tens or hundreds of pods.

  • Latency-sensitive workloads may need much more responsive scaling.

  • Other workloads may need more conservative scaling to avoid resource churn.

The Alpha Feature: Configurable Tolerance in Kubernetes v1.33

In Kubernetes v1.33, a new alpha feature — HPAConfigurableTolerance — was introduced. It allows per-HPA configuration of tolerance, separately for scale-up and scale-down scenarios.

Feature Highlights:

  • Alpha-only status in v1.33: Disabled by default, must be enabled via feature gate.

  • Applies under the autoscaling/v2 API version, specifically under the spec.behavior section:

    • spec.behavior.scaleUp.tolerance

    • spec.behavior.scaleDown.tolerance

  • Tolerance is a fractional value (0.0 to 1.0), representing a percentage deviation from the target.


ree

With this setup, scaling is responsive on traffic spikes but conservative during drops.



Enabling the Feature

To use configurable tolerances in v1.33:

  1. Enable the feature gate, both on kube-apiserver and kube-controller-manager:

--feature-gates=HPAConfigurableTolerance=true
  1. Use the autoscaling/v2 API and include the tolerance fields under behavior.scaleUp and/or behavior.scaleDown


If not provided, HPA falls back to the existing 10% cluster-wide tolerance, which can only be overridden globally through the --horizontal-pod-autoscaler-tolerance flag on the controller manager—not via the API.

Under the Hood: How Tolerance Affects Scaling Logic

The HPA controller computes the desired replica count based on the ratio of current metric to target metric, then applies tolerance checks:

if math.Abs(1.0 - currentUtilization/targetUtilization) <= tolerance {
   // Within tolerance; skip scaling.
   return currentReplicas
}

When the difference exceeds tolerance, the scaling decision proceeds.

Implications of tuning tolerance:

  • Lower scaleUp tolerance → more responsive scaling at the cost of potential overshoot.

  • Higher scaleDown tolerance → reduces unnecessary scale-down events during temporary dips.


Best Practices & Real‑World Tips

Scenario

Recommended Tolerance Settings

Latency-sensitive APIs

scaleUp.tolerance: 0 – 0.01 (respond fast)

Cost-sensitive batch jobs

scaleDown.tolerance: 0.1 – 0.2 (avoid flapping)

Balanced workloads

Fine-tune both for steady scaling behavior

Combine with:

  • stabilizationWindowSeconds to smooth transitions and avoid back-and-forth scaling.

  • policies under behavior to control scaling rate (e.g., Pods or Percent policies).


Security note: Use this alpha feature cautiously in production—monitor closely and test thoroughly.

Roadmap: What’s Next?

According to GitHub KEP-4951 tracking:

  • Alpha in v1.33, targeted for graduation to beta in v1.34, and beyond.

  • Future improvements may include:

    • Per-metric tolerance (e.g., CPU vs memory),

    • Dynamic/automated tolerance adjustments based on historical patterns,

    • Deeper integration with vertical autoscaling (VPA), predictive scaling, etc.

Summary

In Kubernetes 1.33 (alpha), the HPAConfigurableTolerance feature enables per-HPA, per-direction tolerance tuning—empowering you to tailor autoscaling sensitivity precisely for each workload. This improvement bridges a long-standing gap by replacing the “one-size-fits-all” 10% tolerance with fine-grained control.


By enabling this feature and tuning tolerance values along with stabilization and scaling policies, you can achieve more responsive, stable, and cost-efficient autoscaling behavior. Keep an eye on its progress toward beta in v1.34 and full stabilization beyond.


Let me know if you'd like YAML templates, testing strategies, or performance tuning examples related to this feature—I’d be happy to help!


Ready to take your Kubernetes autoscaling to the next level?

At Ananta Cloud, we help platform teams simplify and optimize infrastructure operations—so you can scale intelligently, reduce costs, and keep your applications performing at their best.


🔹 Need help implementing advanced HPA strategies like configurable tolerances?

🔹 Looking to fine-tune autoscaling across environments with real-time observability and control?


Let’s chatBook a free consultation


Smarter scaling starts with smarter tooling. Get in touch with Ananta Cloud today.

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
average rating is 4 out of 5, based on 150 votes, Recommend it

Stay ahead with the latest insights delivered right to you.

  • Straightforward DevOps insights

  • Professional advice you can trust

  • Cutting-edge trends in IaC, automation, and DevOps

  • Proven best practices from the field

bottom of page