Day 2000 - Migrating from kubeadm + ansible to clusterapi+talos
SlidesBackground
- They use large, shared clusters
- The oldest cluster is 2099 days (5,8 years) old
- Onprem hosted on vSphere with vanilla kubeadm
- Fun fact: They run chaosmonkey on all clusters -> Automaticly prepares for updates
Legacy provisioning
- Terraform create debian vm
- Deploy base tools with puppet
- Register nodes in inventory yaml file
- run ansible playbook -> Renders configs and runs kubeadm
- Configure ArgoCD
Target
- Use Clusterapi to manage the workload-clusters
- Basic CRDS: Cluster, MachineDeployment, Machine
- Talos: Immutable, minimal, ephemeral with declarative config via grpc api
Migration
- Config matching between kubeadm and talos+capi
- Import PKI/Certs
- Create ClusterAPI CRDs
- Add ClusterAPI Nodes
- Remove kubeadm nodes
1. Config matching
- Serviceaccount Issuer: Talos has it’s own default
- etcd encryption key names are hardcoded in talos
- Re-Encrypt all secrets (get secrets, replace secrets)
2. PKI
- Talos includes some logic that can generate a secrets bundle from an existing API
- Import: The etcd, k8s, serviceaccount and os (talos specific, used for the talos api auth) certificates
3. CRDs
- One namespace per workload cluster
- Cluster-CRD: Ref to CP and Infrastructure
- ControlPlane-CRD: Create cp MDs
- Infrastructure: References template for wokrer-MDs
4. Add ClusterAPI Nodes
- Add new CP and Worker Nodes to the cluster that are managed by CAPI (slowly, stuff will break)
- Remove the old nodes one by one over weeks ore months
- Potential Problems:
- Mismatched serviceaccountissuer
- Missing etcd encryption key
- Wrong etcd encryption key
- Loss of quorum:
--force-new-cluster
can force recovery on one node of the etcd cluster
Demo
I reccomend watching the demo Talos seems pretty cool.
Bootstrapping
- Kind cluster in github action or on local device