HPA vs Rate-limit

Lorenzo Girardi — Tue, 14 Feb 2023 00:00:00 +0000

INTRO

Strange… we are using HPA to increase availability and introducing rate limiting to reduce it?

Well, let’s create the context.

This analysis is based on specific assumptions:

Cloud environment
Dynamic infrastructure
Minimum resources available

HPA

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (Deployment, StatefulSet) to match demand.

Patterns

Type	Behaviour
Slow and temporary	Daily fluctuations, peaking during the day and troughing at night
Rapid and temporary	Short bursts from poorly-behaved downstream services
Slow and persistent	Request volume slowly increases as the product sees adoption
Rapid and persistent	Abrupt shift from low to high volumes — e.g. called by batch jobs

Ideal Practice

Type	Ideal Practice
Slow and temporary	HPA should add and remove pods as necessary
Rapid and temporary	HPA should NOT modify pod count — leave headroom for brief spikes
Slow and persistent	HPA should add and remove pods as necessary
Rapid and persistent	Leave headroom; HPA adds pods quickly to restore target utilization

Rate Limit

A rate limit is the number of API calls an app or user can make within a given time period. If this limit is exceeded — or if CPU or time limits are exceeded — the app may be throttled. Throttled requests fail.

Cost Saving - Tag - Lorenzo's Blog

HPA vs Rate-limit

INTRO

HPA

Patterns

Ideal Practice

Rate Limit