Podesta. Talk to us

Self-hosting our developer toolchain.

Why we run our own git forgejo, CI and container registry

By Podesta, Applied AI team

Our developer toolchain runs on open source, self-hosted in our own Kubernetes cluster. Forgejo for git, issues, and PRs; Forgejo Actions for CI; Forgejo's container registry for the images we ship; and CSS of our own on top so the UI reads as a continuation of the rest of the product. All four pieces share a namespace with the product itself, all synced from a single GitOps repository. This post walks through the pieces — and through the things that went wrong getting there.

Why self-host.

Three reasons.

Open source. Forgejo is an open-source fork of Gitea that we can extend, theme, and pair with our own runner image without asking anyone's permission. That is not abstract — it is the reason a custom theme repository and a custom runner Containerfile sit in our GitOps tree at all, and the reason a Renovate PR can bump every pinned tool inside the CI image on its own schedule.

Cost. Our entire CI workload runs on the same dedicated Hetzner servers as the rest of the product, scheduled into capacity that would otherwise sit idle. The hosted-GitHub equivalent for a team our size on a Rust-heavy CI profile — Team seats, Actions minutes at the runner sizes our compiles actually need, and Packages storage for the images we push — comes out to several hundred euros a month before usage spikes. Self-hosting puts us at low double-digit euros per server per month.

Customization. Two surfaces. The git UI is themed to match the rest of the product — Forgejo lets us drop CSS into a known location and pick a default theme via config, so a PR review page reads as a continuation of the product rather than a vendor screen inside it. The runner image is ours — pre-warmed with the toolchain baked in — which means CI startup time is bounded by image pull rather than by setup-bun hitting api.github.com's rate limit at the wrong hour. Neither surface is available on hosted CI at any price.

What's in the stack.

Git, issues, PRs. Forgejo runs as a Deployment in the cluster, backed by a CloudNativePG Postgres cluster declared next to it in the same chart. Upgrade story is the cluster's: bump the image tag, ArgoCD rolls it.

CI. Forgejo Actions, with two runner StatefulSets — one per dedicated server — pinned by nodeSelector so per-pod CPU limits add up to exactly the host's thread count. No oversubscription. KEDA scales replicas inside that envelope.

Container registry. The one that ships with Forgejo. The runner pushes its own image there on every change; the apps in the cluster pull from there. One TLS cert, one auth surface, one set of credentials.

A pre-warmed runner image. Our own Containerfile — rustup, sccache, uv, bun, Playwright with the browsers baked in, on a catthehacker/ubuntu:act-24.04 base. Every pinned tool has a # renovate: marker so Renovate keeps it current. Workflows opt in with a custom runs-on label and setup-bun / setup-uv / dtolnay/rust-toolchain become no-ops — no GitHub API calls, no downloads, no rate-limit issues during "Set up environment".

Shared compile cache. sccache, configured against an in-cluster MinIO bucket. The runner config sets the endpoint and bucket as default env; Rust workflows opt in by exporting RUSTC_WRAPPER=sccache and pick up cache from every other runner that has touched the same crate graph.

A custom git UI. A separate repository holds three CSS files — light, dark, auto. The Forgejo Deployment mounts them via a Kustomize ConfigMap that pulls the theme repository in as a git submodule, with a strategic-merge patch flipping the default theme via Forgejo's FORGEJO__ui__DEFAULT_THEME env var. Bumping the submodule pointer is the deploy.

Each runner pod also carries a docker-in-docker sidecar on a raw-block ext4 zvol, plus a separate raw-block volume for the act_runner cache server. We tried virtiofs first; bolt's mmap(MAP_SHARED|PROT_WRITE) returned ENODEV and the cache server quietly died. Raw block, ext4 inside the VM, mmap works.

The theme.

Forgejo ships with a terracotta primary (#c2410c). We retuned --color-primary-* to ink-500 (#1A2540) and remapped the entire --zinc-* ramp to a warm stone equivalent, so every downstream semantic token — buttons, cards, nav background, repo file rows — inherits the warmer surface without per-selector overrides. The terracotta is gone on purpose; the brand calls for stone plus ink plus a single oxblood mark, not a third heat color.

The semantic signal palette (red / green / yellow / blue) was retuned to AA-on-stone equivalents. Diff hunks are washed — closer to a redlined research paper than a SaaS pull-request UI. Headings get Source Serif 4, body Inter, code JetBrains Mono.

Three CSS files, one strategic-merge patch, one submodule. The ConfigMap is under 100 KB and Forgejo rolls in about ten seconds on strategy: Recreate.

The unglamorous bits.

Three things bit us hard enough to be worth naming.

The Forgejo registry truncates large blob uploads — anything in the hundreds-of-MB range hits a 499 or 504 from the ingress and never finishes. We hit it pushing the runner image (Playwright browsers ship as one fat layer per browser). The fix isn't on the image side, it's the ingress proxy timeouts and the registry's own upload limits; that ticket is still open. Until it lands, big images get built and pushed from a workstation with a non-home storage root.

Kata sandboxes occasionally drop a SandboxChanged event from cloud-hypervisor on builds north of ten minutes. The Rust toolchain layer is the long pole, so we keep RUN steps small — every Dockerfile layer is a checkpoint — and CI builds that hit this get retried from the latest cached layer.

Rootless podman build of the catthehacker base fails to commit with history lists N non-empty layers, but we have M layers on disk. The base has ~18 layers with empty-layer history markers that podman's overlay driver mishandles under home-directory storage. CI builds work because the Kata VM has a clean overlay store. Local builds, when needed, fall back to --storage-driver=vfs — slow, disk-hungry, fine.

One more, cheap but it cost us an afternoon: the slim :act-24.04 catthehacker variant doesn't ship node in PATH, and Playwright's CLI is a #!/usr/bin/env node shebang. playwright install failed with env: 'node': No such file or directory. apt install nodejs in the same RUN, move on.

A nod to depth.

CI jobs run inside microVMs — every workflow gets a fresh Kata-backed sandbox with its own kernel, its own docker daemon, and a graceful drain window so SIGTERM doesn't kill jobs mid-run. The mechanics of that (and why the per-pod CPU limits add up to exactly the host's thread count, no oversubscription) is its own piece: Kata-VM runners for self-hosted CI.

Where we drew the seams.

Self-hosting isn't "build your own everything." We pull Forgejo from upstream, the runner from code.forgejo.org, the CI base from catthehacker. We pay for OS images, kernels, and the bits of the registry we haven't yet fixed. What we own is the parts where ownership changes the answer: the runner image that decides what "set up environment" costs, the sccache backend that decides whether a Rust rebuild is two minutes or twenty, the CSS that decides what our git looks like, and the kustomization that wires all of it into one ArgoCD-managed Application. The bill came out smaller than the equivalent on hosted GitHub for our usage profile, the feedback loop into the cluster is shorter, and the UI looks like the rest of the product — all of which has paid back the ongoing maintenance work it costs to keep that surface intact.

Talk to us

Have a workflow worth automating?

Send us a sketch and a handful of examples. We come back with scope, duration, and price — no sales calls.