From us to ms: Pushing Kubernetes Workloads to the Limit
Watch talk on YouTubeThere were more details in the talk than I copied into these notes. Most of them were just too much to write down or application specific.
Why?
- We need it (Product requirements)
- Cost efficiency
Cross Provider Networking
- Throughput:
- Same-Zone 200GB/s
- Cross-Zone 5-10% Pemnalty
- Latency:
- Same Zone P99: 0.95ms
- Cross zone P99: 1.95ms
- Result: Encourage Services to allways router in the same zone if possible
- How:
- Topology-Aware-Routing (older, a bit buggy)
trafficDistribution: PreferClose
: Routes to same zone if possible (needs cni-support)- Setup the stack one in each zone
- Measurements: Kubezonnet can detect cross-zone-traffic
Disk latency
- Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking
- Example: 100Gbps needs a RAID0 with a bunch of SSDs
graph LR Querier-->|125ms|Cache Cache-->|200ms|S3 direction TB Cache<-->SSD
Memory managment
- Garbage Collection takes time and is a throughput for latency trade-off
- Idea: Avoid allocations
- Preallocate (e.g. Arenas)
- Allocation reuse (e.g. in grpc)
- “Allocation Schemes” (thread per core)
- Avoid memory pressure by
- Using gc-friendly types
- Tuning your GC
- Idea: Implement your own optimized data structure
Optimization in Kubernates
Defaults
- Best efford
- No protection from consuming all node memory
- Critical services could get scheduled on the same node
Requests and limits
- Requests: Needed to be scheduled
- Limits: Kill if exceeded
- Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum)
- Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed)
Tains and tolerations
- Pin your workload basted on labels and annotations
Static cpu manager
- Request a whole number of CPUs -> You get this core guranteed