Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb

Watch talk on YouTube

Baseline Infra

  • Multiple Clusters across cloud providers
  • Cilium with Clustermesh
  • Stretched CockroachDB and NATS

TODO: Steal overview from slides

PDBs and limits

  • PDB: Classic core component that requires a number of pods with successfull readyness probes per deployment
  • Eviction: Can be stopped by a PDB what has not reached the minimum available
  • Interruptions: Voluntary (New image, updated specs, …) vs involuntary (Eviction, deletion, node pressule, NoExecute, Node deletion)

Stateful across multiple clusters

  • Baseline: PDBs only know about one cluster
  • Problem: If the master pod fails (or get’s evicted) on 2/3 clusters
  • Factors: Movement, Maintainance, Chaos-Experiments, Secret rotation
  • Workaround: Just manually check all systems before doing anything
  • Idea: Multi-Cluster PDB
  • Solution: A new hook on the eviciton api that interacts with a new Cluster-Aware CRD

How it actually works

  1. Drain API get’s called
  2. Check replicas accross cluster
  3. Anwer based on current state

Actually: There is a lease-mechanism to prevent race conditions across clusters

TODO: Steal diagram from slides

What works

  • Voluntary: 100% supported
  • Involuntary: Yes they hooked into most of the deletion api calls (eviction, pressure, kubectl delete, admissions, node deletion)

Demo

Pretty interesting, watch the video to find out

Q&A

  • Do you need a flat network: No just expose the tcp lb
  • Did you think about using etcd to implement the leases instead of objects: They use managed hostplanes and dont want another etcd
  • Have you tried to commit upstream: Nope, pretty much not an option thanks to the managed control-plane not being able to set apropriate flags