PErfomance preseverance: Taming 1000 kubernetes clusters

Slides

History

  • They started with upstream kubernetes - the hard way
  • Env grew to over 200 prod apps
  • Pains: Single Cluster, single point of failure and complexity
  • What worked: Dev adoption and autonomy, no vendor

Challenges

Based on stakeholder expectations

  • One tenant per cluster -> Over 1000 Clusters
  • Release management
  • Small team (3 Engineers)

Guiding principles

  • Platform as a product
  • Stability: trust
  • Standardization -> Scalability and inter team collab
  • Day 2 support
  • Dogfooding

Tenancy

  • One cluster per product
  • Own CLI, devs like cli
  • Custom operator and crds

Stack

  • Keopsctl? Pretty much their own cluster operator
  • A Simple Cluster CRD

Migration

  1. Build trust in platform
  2. Support with docs, oboarding, q&a
  3. Co-create with devs while keeping an eye on day2 -> Feature-Flag based rollout