From us to ms: Pushing Kubernetes Workloads to the Limit

Watch talk on YouTube

There were more details in the talk than I copied into these notes. Most of them were just too much to write down or application specific.

Why?

We need it (Product requirements)
Cost efficiency

Cross Provider Networking

Throughput:
- Same-Zone 200GB/s
- Cross-Zone 5-10% Pemnalty
Latency:
- Same Zone P99: 0.95ms
- Cross zone P99: 1.95ms
Result: Encourage Services to allways router in the same zone if possible
How:
- Topology-Aware-Routing (older, a bit buggy)
- trafficDistribution: PreferClose: Routes to same zone if possible (needs cni-support)
- Setup the stack one in each zone
Measurements: Kubezonnet can detect cross-zone-traffic

Disk latency

Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking
Example: 100Gbps needs a RAID0 with a bunch of SSDs

graph LR
    Querier-->|125ms|Cache
    Cache-->|200ms|S3
    direction TB
    Cache<-->SSD

Memory managment

Garbage Collection takes time and is a throughput for latency trade-off
Idea: Avoid allocations
- Preallocate (e.g. Arenas)
- Allocation reuse (e.g. in grpc)
- “Allocation Schemes” (thread per core)
Avoid memory pressure by
- Using gc-friendly types
- Tuning your GC
Idea: Implement your own optimized data structure

Optimization in Kubernates

Defaults

Best efford
No protection from consuming all node memory
Critical services could get scheduled on the same node

Requests and limits

Requests: Needed to be scheduled
Limits: Kill if exceeded
Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum)
Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed)

Tains and tolerations

Pin your workload basted on labels and annotations

Static cpu manager

Request a whole number of CPUs -> You get this core guranteed