Preventing Autoscaler Flapping: Kubernetes HPA Tolerance in Depth

Aug 26
3 min read

Background: Why Tolerance Matters in HPA

The Horizontal Pod Autoscaler (HPA) dynamically adjusts pod replicas based on actual metrics like CPU or memory utilization. To avoid erratic scaling due to minor metric fluctuations—known as “flapping” - Kubernetes applies a tolerance threshold: changes within ±10% of the target metric do not trigger scaling actions.

This built-in tolerance ensures stability, but it’s fixed cluster-wide and can't be customized per workload in versions prior to v1.33.

However, a fixed 10% tolerance can be problematic:

In large deployments, a 10% swing may represent tens or hundreds of pods.
Latency-sensitive workloads may need much more responsive scaling.
Other workloads may need more conservative scaling to avoid resource churn.

The Alpha Feature: Configurable Tolerance in Kubernetes v1.33

In Kubernetes v1.33, a new alpha feature — HPAConfigurableTolerance — was introduced. It allows per-HPA configuration of tolerance, separately for scale-up and scale-down scenarios.

Feature Highlights:

Alpha-only status in v1.33: Disabled by default, must be enabled via feature gate.
Applies under the autoscaling/v2 API version, specifically under the spec.behavior section:
- spec.behavior.scaleUp.tolerance
- spec.behavior.scaleDown.tolerance
Tolerance is a fractional value (0.0 to 1.0), representing a percentage deviation from the target.

With this setup, scaling is responsive on traffic spikes but conservative during drops.

Enabling the Feature

To use configurable tolerances in v1.33:

Enable the feature gate, both on kube-apiserver and kube-controller-manager:

--feature-gates=HPAConfigurableTolerance=true

Use the autoscaling/v2 API and include the tolerance fields under behavior.scaleUp and/or behavior.scaleDown

If not provided, HPA falls back to the existing 10% cluster-wide tolerance, which can only be overridden globally through the --horizontal-pod-autoscaler-tolerance flag on the controller manager—not via the API.

Under the Hood: How Tolerance Affects Scaling Logic

The HPA controller computes the desired replica count based on the ratio of current metric to target metric, then applies tolerance checks:

if math.Abs(1.0 - currentUtilization/targetUtilization) <= tolerance {
   // Within tolerance; skip scaling.
   return currentReplicas
}

When the difference exceeds tolerance, the scaling decision proceeds.

Implications of tuning tolerance:

Lower scaleUp tolerance → more responsive scaling at the cost of potential overshoot.
Higher scaleDown tolerance → reduces unnecessary scale-down events during temporary dips.

Best Practices & Real‑World Tips

Scenario	Recommended Tolerance Settings
Latency-sensitive APIs	scaleUp.tolerance: 0 – 0.01 (respond fast)
Cost-sensitive batch jobs	scaleDown.tolerance: 0.1 – 0.2 (avoid flapping)
Balanced workloads	Fine-tune both for steady scaling behavior

Combine with:

stabilizationWindowSeconds to smooth transitions and avoid back-and-forth scaling.
policies under behavior to control scaling rate (e.g., Pods or Percent policies).

Security note: Use this alpha feature cautiously in production—monitor closely and test thoroughly.

Roadmap: What’s Next?

According to GitHub KEP-4951 tracking:

Alpha in v1.33, targeted for graduation to beta in v1.34, and beyond.
Future improvements may include:
- Per-metric tolerance (e.g., CPU vs memory),
- Dynamic/automated tolerance adjustments based on historical patterns,
- Deeper integration with vertical autoscaling (VPA), predictive scaling, etc.

Summary

In Kubernetes 1.33 (alpha), the HPAConfigurableTolerance feature enables per-HPA, per-direction tolerance tuning—empowering you to tailor autoscaling sensitivity precisely for each workload. This improvement bridges a long-standing gap by replacing the “one-size-fits-all” 10% tolerance with fine-grained control.

By enabling this feature and tuning tolerance values along with stabilization and scaling policies, you can achieve more responsive, stable, and cost-efficient autoscaling behavior. Keep an eye on its progress toward beta in v1.34 and full stabilization beyond.

Let me know if you'd like YAML templates, testing strategies, or performance tuning examples related to this feature—I’d be happy to help!

Ready to take your Kubernetes autoscaling to the next level?

At Ananta Cloud, we help platform teams simplify and optimize infrastructure operations—so you can scale intelligently, reduce costs, and keep your applications performing at their best.

🔹 Need help implementing advanced HPA strategies like configurable tolerances?

🔹 Looking to fine-tune autoscaling across environments with real-time observability and control?

Let’s chat→ Book a free consultation

Smarter scaling starts with smarter tooling. Get in touch with Ananta Cloud today.

Cloud Computing

Preventing Autoscaler Flapping: Kubernetes HPA Tolerance in Depth

Background: Why Tolerance Matters in HPA

The Alpha Feature: Configurable Tolerance in Kubernetes v1.33

Feature Highlights:

Enabling the Feature

Under the Hood: How Tolerance Affects Scaling Logic

Best Practices & Real‑World Tips

Roadmap: What’s Next?

Summary

Ready to take your Kubernetes autoscaling to the next level?

Recent Posts

Comments

Stay ahead with the latest insights delivered right to you.

Company

Who Are We?

Our Core Values

Why Ananta?

Services

Cloud Computing

DevOps as a Service

Security

Talent on Lease

Solutions

CloudDesk

Crustify

Defendify

Careers

Open Positions

Life at Ananta

Early Career

Resources

Blogs

Case Study

Newsletter

eBooks