The auto-scaling part: VPA, HPQ, KEDA, Nodes, How do they dance
Watch talk on YouTubeHypothesis
- In 2024 27% of cloud spent was wasted
- 100ms delay => decrease in sales
Pod resources
- Requests: Informs scheduler’s decision
- Too low: Schedule on strained nodes
- Too high: Wasted resources
- Limits: Throttels (CPU) or Kills (Memory) if reached
- QoS: sort the eviction priority during ressource pressure
- Quranteed (request=limits)
- Burstable (Limits>Requests)
- Best effort (Nothing defined)
- Gotcha: CPU throtteling can happen before tirggers happen if requests and limits are very close
TODO: Steal table from Slides
Requests | 100m, 256Mi | 100m, 256Mi Limits |100m, 256Mi | None or <limits QoS | Gurantee | Burstable | Best effort
Scalers
- VPA: Moar power aka reccomend requests
- HPA: Moar moar aka more replicas
- KEDA: Proxy over HPA
VPA
Modes:
- Off: Dry-Run
- Initial: Applies Reccomendations to new Pods (can be used for finding out)
- Auto/Recreate: Evicts and restarts pods to update resources
Trigger: Usually Memory
Tip: maxAllowed
in order to not exhaust stuff
HPA
- Trigger: Usually cpu (percent of requests)
- Formula: $1+\frac{usage}{target}$
- Fun fact: Can not scale to 0
KeDA
- Basicly automates HPA with flexible metrics (from different soruces)
- Can scale Jobs
- Can Scale to 0
Anti patterns
TODO: Steal from slides
| Pattern | Bad | Better | CPI limit = Requests | Throtteling before scale | Set requests only |
Demo
Auto scaling meme generator (see slides/video)