Scaling PDBs: Introducing Multi-Cluster Resilience with x-pdb
Watch talk on YouTubeBaseline Infra
- Multiple Clusters across cloud providers
- Cilium with Clustermesh
- Stretched CockroachDB and NATS
TODO: Steal overview from slides
PDBs and limits
- PDB: Classic core component that requires a number of pods with successfull readyness probes per deployment
- Eviction: Can be stopped by a PDB what has not reached the minimum available
- Interruptions: Voluntary (New image, updated specs, …) vs involuntary (Eviction, deletion, node pressule, NoExecute, Node deletion)
Stateful across multiple clusters
- Baseline: PDBs only know about one cluster
- Problem: If the master pod fails (or get’s evicted) on 2/3 clusters
- Factors: Movement, Maintainance, Chaos-Experiments, Secret rotation
- Workaround: Just manually check all systems before doing anything
- Idea: Multi-Cluster PDB
- Solution: A new hook on the eviciton api that interacts with a new Cluster-Aware CRD
How it actually works
- Drain API get’s called
- Check replicas accross cluster
- Anwer based on current state
Actually: There is a lease-mechanism to prevent race conditions across clusters
TODO: Steal diagram from slides
What works
- Voluntary: 100% supported
- Involuntary: Yes they hooked into most of the deletion api calls (eviction, pressure, kubectl delete, admissions, node deletion)
Demo
Pretty interesting, watch the video to find out
Q&A
- Do you need a flat network: No just expose the tcp lb
- Did you think about using etcd to implement the leases instead of objects: They use managed hostplanes and dont want another etcd
- Have you tried to commit upstream: Nope, pretty much not an option thanks to the managed control-plane not being able to set apropriate flags