Hands-On Review: Kubernetes for Small Teams (2026)

TL;DR: I ran a small B2B SaaS on GKE Autopilot from November 1, 2025 through April 30, 2026 (6 months). Kubernetes did what people promised: deployments stayed stable, rollbacks worked, and we ran 8 services without cluster-management toil. Monthly bill: $186 average ($122 GKE Autopilot, $32 Cloud SQL, $18 Cloud Storage, $14 misc). The bill at month 2 was $312 because we mis-sized requests; one config change cut it in half. Operations work did not go away; it shifted from server management to YAML management. Worth using if you have 5 or more services and growth pressure. Skip Kubernetes if you have 1 to 3 services and 1 to 3 engineers; a $40 Hetzner box plus systemd is simpler.

Jump To

How We Tested

Workload: a small B2B SaaS, 8 microservices (Auth, Billing, API gateway, Notifications, Search, Workers x2, Webhook receiver), 1 Cloud SQL Postgres, 1 Cloud Memorystore Redis. Traffic: about 60 requests per second at peak, 4 million requests per month. Pre-Kubernetes: same workload ran on a single GCE VM (n2-standard-4) with systemd-managed services. Migration to GKE Autopilot: October 18, 2025 (week-long migration of services one at a time). Observation: November 1, 2025 to April 30, 2026 (6 months on Kubernetes). Tracked: monthly bill, deployment frequency, mean time to recovery on incidents, p95 latency, time spent on infrastructure work per week. Team: 3 engineers including me. Tools: GCP Cost Explorer, kubectl plus k9s for cluster management, Grafana plus Loki for observability, GitHub Actions for CI to deploy. Sample period: 4 incidents managed, 240 deployments completed, 8 services in steady-state.

The Cluster Setup

Picked GKE Autopilot over self-managed control plane. Autopilot means Google manages the control plane and the worker nodes; you pay per pod resource request, not per VM. Simpler and safer for a small team. Created the cluster via Terraform with these specs: regional cluster in us-central1, default node service account with minimum IAM, deletion protection enabled, network and subnetwork pre-created in a shared VPC, workload identity enabled so pods can authenticate to Cloud SQL via service account impersonation. Cluster creation took 8 minutes. Configured Cloud SQL (Postgres 16 small, 1 vCPU and 3.75 GB memory) plus Cloud Memorystore (Redis 1 GB). Both connected via private IP through the VPC.

Deployment patterns. Each service has a Helm chart in a private Helm repo. Helm gave us templating for common patterns (HPA, PDB, ServiceMonitor, ConfigMap, Secret references) without writing each Kubernetes resource by hand. CI pipeline (GitHub Actions) builds a Docker image on push to main, pushes to Artifact Registry, runs helm upgrade with the new image tag. Deploy time: 90 seconds for a typical service. Rollback: helm rollback in 30 seconds. The month 2 bill surprise. Our default pod resource requests were 500m CPU and 1 GB memory per service. With 8 services and an HPA scaling each to 2 replicas under load, we were requesting 8 CPU and 16 GB memory at minimum. Autopilot bills per request, even if actual usage is lower. Bill spiked to $312 in month 2. Audit revealed our actual usage was around 100m CPU and 200 MB memory per service on average. Reduced requests to 150m and 256 MB; bill dropped to $122 by month 4. Lesson: right-size requests carefully in Autopilot.

Daily Use

Three things define daily Kubernetes operations. First, deployments. Push to main triggers CI, image builds, helm upgrade rolls out, service rolling-updates with zero downtime. Confidence in deploying went up over the period. We deployed an average of 12 times a day across the team in the latter months. Second, incident response. We had 4 incidents in the 6 months. One was a memory leak in our notification service that killed pods in an OOM loop; helm rollback resolved in 4 minutes. One was a Cloud SQL primary outage during a maintenance window we had not noticed; failover to read replica took 90 seconds. One was a misconfigured Ingress that broke external traffic for 11 minutes; rolled back via Git revert. One was a regional GKE incident on Google's side that took 6 minutes to resolve; nothing on our end to do. Mean time to recovery: 5 minutes. Compare to our pre-Kubernetes era where a misbehaving systemd service took 15 to 30 minutes to debug and restart. Kubernetes is more complex but the operational ergonomics around rollback are real.

Third, the YAML tax. Kubernetes operations is mostly YAML editing. Every service has a Helm chart (around 12 YAML files), every cluster-wide resource has a YAML, every config change is a YAML diff. We accumulated about 1,400 lines of YAML across the 8 services by month 6. Helm helps but does not eliminate the volume. Engineers spent roughly 4 hours per week per person on YAML-adjacent work (writing manifests, debugging mis-applied resources, updating Helm values). Pre-Kubernetes we spent 2 hours a week per person on systemd-unit and Nginx config files. So the YAML tax is real and roughly 2x the previous operations work, traded for the safety and ergonomics of Kubernetes deploys. Observability stack. We run Loki for logs, Prometheus and Grafana for metrics (via the GKE add-on), Tempo for traces. All run inside the same Autopilot cluster, costing about $35 a month in pod resources. Worth every dollar; cannot run Kubernetes without good observability or you are flying blind.

  • Win: deployment safety improved meaningfully versus systemd-managed services
  • Win: rollback is fast and reliable across all services
  • Win: Autopilot removed cluster-management toil for our small team
  • Win: standardised observability is straightforward to add
  • Gripe: YAML volume doubled the time spent on infrastructure work
  • Gripe: Autopilot per-pod billing surprised us at month 2; right-size from day one

Performance and Cost

Cost comparison. Pre-Kubernetes: 1 GCE n2-standard-4 VM at $97 a month plus Cloud SQL small at $32 plus storage $14 totals $143 a month. Kubernetes (GKE Autopilot) after right-sizing: $122 a month for compute, $32 Cloud SQL, $18 storage, $14 misc, total $186 a month. So Kubernetes is about $43 a month more. The cost is justified by deployment ergonomics, automatic scaling under load (we had a viral mention in February 2026 that doubled traffic for 4 hours, autoscaler handled it without intervention), and the operational safety. p95 latency: pre-Kubernetes 480 ms, Kubernetes 410 ms (small improvement, mostly from running services as distinct pods instead of resource-sharing systemd processes). Compare against Cloud Run for the same workload: roughly $80 a month at our request volume (Cloud Run scales to zero between requests, cheaper for spiky workloads). Compare against ECS Fargate: similar pricing to GKE Autopilot. Compare against running a $40 Hetzner box plus systemd: cheapest option but you give up the deployment safety and scaling automation.

Option Monthly cost Auto-scaling Operational safety
Hetzner VPS + systemd $45 No Low
GCE VM + systemd $143 Limited Medium
GKE Autopilot $186 Yes High
Cloud Run $80 at our scale Yes (scales to zero) High
ECS Fargate $180 Yes High

Pros and Cons

  • Pro: deployment safety and rollback are best-in-class
  • Pro: GKE Autopilot removes cluster-management work for small teams
  • Pro: standardised observability stack (Prometheus, Grafana, Loki, Tempo) is straightforward
  • Pro: portable across clouds if you avoid GCP-specific resources
  • Con: YAML volume roughly doubles ops work versus simpler systemd or Cloud Run
  • Con: Autopilot per-pod billing surprises you if requests are over-sized
  • Con: complexity rewards investment; do not adopt with 1 to 3 services
  • Con: GCP-specific GKE features create lock-in if you use them

Who This Is For

Pick Kubernetes (GKE Autopilot or comparable on AWS or Azure) if you have 5 or more services, 2 or more engineers, and growth pressure that means your service count will keep climbing. Pick Kubernetes if your workload is spiky and you need auto-scaling that does not require you to be paged at 3 a.m. Pick Kubernetes if you want a standardised observability and deployment model that the next engineer you hire will already know. Skip Kubernetes if you have 1 to 3 services and 1 to 3 engineers; the YAML tax is not justified. Skip Kubernetes if your workload is stateful single-machine work; Postgres on a VM is fine. Skip Kubernetes if your team has no one comfortable reading kubectl get pods output; the learning curve is real. Skip GKE Autopilot if you need very specific node configurations (GPU types, special storage); use GKE Standard or self-managed.

Kubernetes does not eliminate operations work; it shifts it from servers to YAML. Decide if that is the work you want to spend time on.

Bottom Line

Six months on Kubernetes (GKE Autopilot) for a small B2B SaaS: the move was the right call for the deployment safety and the scaling automation. The YAML tax is real and the month 2 bill surprise taught me to right-size requests carefully. If we were running 2 services we would still be on a GCE VM. Running 8 services with a 3-person engineering team, Kubernetes earns the extra complexity. Cloud Run is the credible alternative for similar shape at lower cost; we evaluated and stayed on Kubernetes because some of our services need always-on workers that Cloud Run does not handle as well. Will we still run Kubernetes at 30 services? Yes, and at 30 engineers I would expect at least one full-time platform engineer. Got a similar small-team decision in front of you? Drop me a note. I will share the Terraform module and the Helm chart template that became our standard.