PErfomance preseverance: Taming 1000 kubernetes clusters
SlidesHistory
- They started with upstream kubernetes - the hard way
- Env grew to over 200 prod apps
- Pains: Single Cluster, single point of failure and complexity
- What worked: Dev adoption and autonomy, no vendor
Challenges
Based on stakeholder expectations
- One tenant per cluster -> Over 1000 Clusters
- Release management
- Small team (3 Engineers)
Guiding principles
- Platform as a product
- Stability: trust
- Standardization -> Scalability and inter team collab
- Day 2 support
- Dogfooding
Tenancy
- One cluster per product
- Own CLI, devs like cli
- Custom operator and crds
Stack
- Keopsctl? Pretty much their own cluster operator
- A Simple Cluster CRD
Migration
- Build trust in platform
- Support with docs, oboarding, q&a
- Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout