PH
Phil HillsSystems Architect

200 Agents on Google Cloud Run: Lessons from Production

By Phil Hills · March 20, 2026

I run 200+ AI agents on Google Cloud Run. Not as experiments, not as demos. In production. Here's what I've learned about running autonomous systems at scale.

Why Cloud Run

Cloud Run scales to zero. That matters when you have 200 agents and most of them are idle most of the time. You pay per request, not per hour. An agent that runs once a day costs nearly nothing. An agent that runs 10,000 times a day auto-scales.

Every agent is a Docker container. Same deployment pipeline, same monitoring, same IAM permissions. No Kubernetes cluster to manage. No nodes to provision.

The Cold Start Problem

Cold starts are real. A Python agent with heavy dependencies can take 3-5 seconds to start. For orchestration workflows where Agent A calls Agent B calls Agent C, those seconds stack up.

Solutions that worked for me:

Agent Discovery

With 200 agents, how does Agent A find Agent B? Not through a service registry. Through A2AC, the Agent-to-Agent Communication standard.

Each agent publishes an agent.json discovery card listing its capabilities, DID, and endpoint. The orchestrator doesn't need to know every agent. It discovers them through topology, not configuration.

Cost

200 agents on Cloud Run costs me less than most people pay for a single always-on VM. The scale-to-zero model is ideal for agent fleets. Most agents spend 99% of their time idle. You only pay when they think.

What I'd Do Differently

The full orchestration framework is part of the Q Protocol. The SDK has 19 modules covering identity, trust, memory, relay, discovery, and task lifecycle.