From us to ms: Pushing Kubernetes Workloads to the Limit

Watch talk on YouTube

There were more details in the talk than I copied into these notes. Most of them were just too much to write down or application specific.

Why?

  • We need it (Product requirements)
  • Cost efficiency

Cross Provider Networking

  • Throughput:
    • Same-Zone 200GB/s
    • Cross-Zone 5-10% Pemnalty
  • Latency:
    • Same Zone P99: 0.95ms
    • Cross zone P99: 1.95ms
  • Result: Encourage Services to allways router in the same zone if possible
  • How:
    • Topology-Aware-Routing (older, a bit buggy)
    • trafficDistribution: PreferClose: Routes to same zone if possible (needs cni-support)
    • Setup the stack one in each zone
  • Measurements: Kubezonnet can detect cross-zone-traffic

Disk latency

  • Baseline 660MiB/s per SSD aka ~1 SSD per 5GBit/s Networking
  • Example: 100Gbps needs a RAID0 with a bunch of SSDs
graph LR
    Querier-->|125ms|Cache
    Cache-->|200ms|S3
    direction TB
    Cache<-->SSD

Memory managment

  • Garbage Collection takes time and is a throughput for latency trade-off
  • Idea: Avoid allocations
    • Preallocate (e.g. Arenas)
    • Allocation reuse (e.g. in grpc)
    • “Allocation Schemes” (thread per core)
  • Avoid memory pressure by
    • Using gc-friendly types
    • Tuning your GC
  • Idea: Implement your own optimized data structure

Optimization in Kubernates

Defaults

  • Best efford
  • No protection from consuming all node memory
  • Critical services could get scheduled on the same node

Requests and limits

  • Requests: Needed to be scheduled
  • Limits: Kill if exceeded
  • Problem: Reactive, it just checks pods according to a cronjob (can be set as apiflag but has a minimum)
  • Downward-API: You can reference the limits in your applications (to let the app trigger gc before the pod gets killed)

Tains and tolerations

  • Pin your workload basted on labels and annotations

Static cpu manager

  • Request a whole number of CPUs -> You get this core guranteed