Reliable k8s resource Submission & Bookkeeping

Slides

Service offerings

  • Product: HA Container Platform for general utility with a focus on run-to-complete
  • Use-Cases: ML Orchestration, CI/CD, Machine maintainace, Financial analysis, Data Processing pipeline
  • Requirements: Observability, Scheduling Events, Approval process, Bookkeeping, Datacenter Reseliency
  • Focus: Resiliency (HA with datacenter failover)
  • What the user needs: Workflow (e.g. generate report, persist report, notify)
  • What we need for the user: ConfigMaps + Secrets, Workflow templates for the steps

Challenges

  • Read after modify across multiople datacenters
  • Many reads against kubeapi that could overload the apiserver
  • No native approval flows and limited audit

Submission flows from a users perspective

Submission of runnables

  • User: Submits runnable to subnitter with audit
  • Submitter: Handels retry, verification, …
  • Submitter: Configures workload on workload clusters

Submission of deployables

  • User: deploys mutation to audit/sourceoftrough
  • Syncer: Syncs deployables to workload clusters

Reporting

  • User wants: UI with latest status for all jobs
  • Compliance wants: Transactions on given resource for auditing
  • Implementation: Highly available inventory as single source of truth
graph
    WorkflowAPI-->|reads|inventory
    Consumer-->|updates|inventory
    Producer-->|publishes events to|Consumer

Potential Problems

  • Problem: Delete event does not get propagated from syncer to producer leading to zombie ressources
  • Fix: Periodic Cleanup

Overview

Complete diagram Complete diagram