The Cluster API Migration Retrospective: Live migrating hundreds of clusters to Cluster API

Watch talk on YouTube

The talk started with a base introduction of ClusterAPI and the operations at gigantswarm.

TODO: Diagram

Product naming for the next noted:

  • vintage: Legacy system
  • CAPI: The new shit

Goal

Deployemnt targets: AWS, Azure, vSphere Live migrations needed für AWS (other providers were not that much in use -> migrate manually)

Migration

  • They set up a new management cluster for CAPI to
  • Tooling Options:
    • CLI: Enough for a couple of hundred clusters
    • Operator: The way to go for thousands of clusters
    • Blue/Green:

TODO: Sequence diagram

Whow it went

  • New bugs discovered with every couple of customers
  • Some cloud regions just love to fuck things up (looking at you aws china)
  • Using upstream sometimes prevents you from implementing random hacks but this is good
  • The Mixed vintage+CAPI team split into a new CAPI-team and a new vintage-team -> Setting priorities was way too hard in the mixed team
  • Implementing new providers (GCP, Openstack, etc) is way simpler nowadays
  • There is a timepline from custom tooling over product to commodity

Q&A

  • Were there any fears from the customers regarding the migration?
    • There are some but long term relations to customers help