Geographically Distributed Clusters: Resilient Distributed Compute on the Edge

Watch talk on YouTube

Background: The state of cloud in mauritius

  • Cloud native is more cloud naive
  • Government treated cloud as bad for a while
  • People know AWS but not the cloud native ecosystem
  • Bad uplinks due to sea-cables that tend to get broken from time to time
  • Only one local cloud service provider and the big providers are “an ocean away”

The Solution

  • Idea: Use multiple homelabs across the island as availability zones or multi-cloud
  • Goal: Orchestrate everything through kubernetes
  • Plan: 3 Homelabs with at least 3 Nodes each that join one big cluster
  • Tech: Longhorn Storage, Tailscale for connectivity
  • Converns: Latency, Power-Cuts, Bandwidth, IP-Rotation
  • Prod use: A startup wanted to use this for their workload and needed
    • Tailscale exit nodes for external services
    • GPU nodes for AI-Workload

Q&A

  • How is ingress handled (by me)?
    • Migrate the control plane to the cloud provider and use their static IPs.
    • Ingress allways starts at the cloud and routes over
  • Why tailscale?
    • Fairly reliable
    • Pretty simple
    • Handles routing
  • How are you planning on scaling this setup?
    • More friends aka more homelab locations
    • Utilize Tailscale
  • How are you handling image distribution?
    • Bandwitdh is not that limited (200 Down)
    • They just host their own registry for stuff
  • What about the neighboring islands? -> Cool ideas
  • How big is your local cloud community? -> 15 People at smaller meetups and 1600 at the yearly dev meetup
  • How do you handle security in your setup?
    • This is not the primary concern for the government
    • Most locals banks/insurers have inhouse servers or stuff on aws
    • Most of the time security is an afterthought
  • What kind of hardware are you running on and how do you aquire it?
    • The second hand market is not really a thing (or rather expensive).
    • They usually just import stuff themselves
    • Most Nodes are Dell Optiplex or Lenovo Thinkcenters
  • How does longhorn perform over the 200mbit connection? -> Surprisingly good
  • Is Starlink available? -> No and the government does things linke “let’s shut down socialmedia before the election”