The auto-scaling part: VPA, HPQ, KEDA, Nodes, How do they dance

Watch talk on YouTube

Hypothesis

  • In 2024 27% of cloud spent was wasted
  • 100ms delay => decrease in sales

Pod resources

  • Requests: Informs scheduler’s decision
    • Too low: Schedule on strained nodes
    • Too high: Wasted resources
  • Limits: Throttels (CPU) or Kills (Memory) if reached
  • QoS: sort the eviction priority during ressource pressure
    • Quranteed (request=limits)
    • Burstable (Limits>Requests)
    • Best effort (Nothing defined)
  • Gotcha: CPU throtteling can happen before tirggers happen if requests and limits are very close

TODO: Steal table from Slides

Requests | 100m, 256Mi | 100m, 256Mi Limits |100m, 256Mi | None or <limits QoS | Gurantee | Burstable | Best effort

Scalers

  • VPA: Moar power aka reccomend requests
  • HPA: Moar moar aka more replicas
  • KEDA: Proxy over HPA

VPA

Modes:

  • Off: Dry-Run
  • Initial: Applies Reccomendations to new Pods (can be used for finding out)
  • Auto/Recreate: Evicts and restarts pods to update resources

Trigger: Usually Memory Tip: maxAllowed in order to not exhaust stuff

HPA

  • Trigger: Usually cpu (percent of requests)
  • Formula: $1+\frac{usage}{target}$
  • Fun fact: Can not scale to 0

KeDA

  • Basicly automates HPA with flexible metrics (from different soruces)
  • Can scale Jobs
  • Can Scale to 0

Anti patterns

TODO: Steal from slides

| Pattern | Bad | Better | CPI limit = Requests | Throtteling before scale | Set requests only |

Demo

Auto scaling meme generator (see slides/video)