What I Learned Running 16 AI Agents on a Single VPS
For about 10 weeks, I ran 16 AI agents on a single VPS. They coordinated across Discord and Telegram, managed projects, wrote code, posted briefings, and — eventually — taught me that I’d built the wrong thing.
This is the story of what worked, what didn’t, and why I replaced all 16 with just 2.
The Setup
One Hetzner VPS. 8 vCPUs, 32GB RAM, 240GB NVMe. Running an AI gateway that managed:
- 16 agents with different roles (coordinators, specialists, infrastructure)
- 5 Discord bots serving channels across a private server
- 2 Telegram bots for direct messaging
- A peer-routing system where one bot could serve multiple agents
- A shared knowledge base indexed with semantic search
- 23 cron jobs for briefings, reflections, and maintenance
- Git-tracked memory that synced between VPS and GitHub every few hours
The architecture was ambitious. A coordinator agent would see all messages in its channels. When a task required coding, it would delegate to a specialist agent via Discord @mention. Specialists were ephemeral — they’d respond once, clear their context, and wait for the next mention.
Project-specific agents monitored their own channels, posted morning briefings at 8:30 AM, ran end-of-day reflections, and maintained project memory in git-tracked markdown files.
It all worked. And that was part of the problem.
What Worked Well
Before I get into the failures, some things genuinely impressed me.
Git-Tracked Memory
Every agent’s memory was stored in markdown files tracked by git. A cron job committed and pushed changes every few hours. This meant:
- Agent memory survived crashes, restarts, and resets
- I could review what agents were “thinking” via git history
- Rolling back a bad memory state was just
git revert - Memory was human-readable, not locked in a database
This was the single best architectural decision. When everything else got complicated, memory-as-code stayed simple and reliable.
Direct Clone Deployment
The VPS directory was the git repository. Deployment was literally git pull followed by a service restart. No Docker builds, no artifact shipping, no container registries. Config files, agent workspaces, and scripts all lived in the repo.
This eliminated an entire class of deployment bugs. What I tested locally was exactly what ran in production.
Ephemeral Specialists
Instead of giving every agent persistent state, specialists were designed to be stateless. Each invocation:
- Reset context (clean slate)
- Complete the task
- Report back
- Stop
This avoided the biggest problem with persistent agents: context pollution. A coding agent that had just discussed marketing would sometimes produce marketing-flavored code suggestions. Ephemeral agents didn’t have that problem.
Default-Deny Security
Every agent’s shell access was controlled by an approvals file:
deny → no shell access at all
allowlist → only approved binaries
full → unrestricted
Most agents had deny. Only the infrastructure agent had full. This was added from day one, not bolted on later. It prevented several accidental rm -rf situations when agents got creative with shell commands.
What Broke
Session Conflicts
Multiple agents sharing the same CLI backend caused session conflicts. Agent A would start a conversation, Agent B would try to resume it, and both would hang. The typing indicator would appear in Discord, spin for two minutes, then nothing.
The fix was straightforward (different session modes per agent), but diagnosing it took hours because the symptoms were subtle — agents just… stopped responding.
Identity Confusion
This was the most surreal bug. Agents reading Discord chat history would sometimes adopt the identity of another agent whose messages appeared in the thread.
A coding specialist would see messages from the coordinator in the chat history and start responding as the coordinator, using the coordinator’s name and mention format. Adding explicit identity sections to each agent’s instructions helped, but it was a constant battle.
Backlog Death Spirals
The coordinator agent had requireMention: false, meaning it saw every message in its channels. It also had allowBots: true, so it could see specialist responses. Combined with a slow model, this created a positive feedback loop:
- Message arrives
- Coordinator processes it (5-10 seconds)
- During processing, more messages arrive (including bot responses)
- Each new message queues behind the current one
- Backlog grows faster than it drains
At peak, messages were taking 200+ seconds to get responses. The gateway logs showed “Slow listener detected” warnings every few seconds.
The fix was switching the coordinator to a faster model. But the fundamental problem was architectural: a single agent trying to process every message in real-time doesn’t scale when the model is slow.
Infinite Loops
Two coordinators in the same server, both with requireMention: false, both with allowBots: true. Agent A responds to a message. Agent B sees Agent A’s response and responds to it. Agent A sees Agent B’s response. Repeat forever.
The fix was channel-based routing in each agent’s instructions: “If you’re in #channel-x, respond. If not, stay silent.” But it was fragile — instructions are suggestions, not constraints. Agents would occasionally ignore their routing rules and respond in the wrong channel.
Config Drift
The VPS had a “live” config and the repo had a “template” config. Over time, these diverged. I’d fix something on the VPS via SSH, forget to update the repo, and the next deployment would revert my fix.
The gateway also had a doctor --fix feature that would restore config from backups on startup. This was helpful when config got corrupted, but it also silently reverted intentional changes.
The Deeper Lessons
Complexity is a Liability, Not a Feature
16 agents felt productive. Each one had a clear role, a well-defined workspace, and specific responsibilities. On paper, the system was beautifully organized.
In practice, every agent was a surface area for bugs. Each one needed:
- Instructions that didn’t conflict with other agents
- Model configuration that balanced speed and quality
- Channel routing rules that didn’t overlap
- Security permissions that were tight enough but not too tight
- Memory that was useful but didn’t pollute context
- Cron jobs that ran on schedule without stepping on each other
Managing 16 agents wasn’t 16x the work of managing 1. It was closer to 16-squared, because every agent interacted with every other agent through shared channels, shared memory, and shared infrastructure.
Two Focused Agents Beat Sixteen Scattered Ones
After simplifying to 2 agents, the system became dramatically more reliable. Not because 2 agents are inherently better, but because:
- No coordination overhead between agents
- No identity confusion (only 2 identities to maintain)
- No routing rules (each agent owns its channels)
- No backlog spirals (fewer messages to process)
- Simpler config (2 entries instead of 16 in every config file)
The surprising part: I didn’t lose any meaningful capability. The 14 removed agents were doing work that either wasn’t needed or could be done by the remaining 2 on demand.
Memory-as-Code Beats Databases
Storing agent memory in git-tracked markdown files was better than any database approach I considered:
- Transparent: I could read and edit memory directly
- Versioned: Full history of every memory change
- Durable: Survived every crash, reset, and migration
- Portable: Moved between servers by cloning a repo
- Debuggable:
git diffshowed exactly what changed - Collaborative: I could commit memory changes alongside code changes
The tradeoff was speed (file I/O vs database queries), but for agent memory that’s updated every few hours, it didn’t matter.
Default-Deny Security from Day One
Adding security after the fact is painful. Adding it from day one is trivial. Every new agent started with deny access, and I explicitly granted permissions as needed.
This prevented:
- A creative writing agent from running shell commands
- A research agent from accessing the file system
- A project coordinator from modifying infrastructure
The cost was a few lines of config per agent. The benefit was never having to explain to a client why their data was accessed by the wrong AI agent.
The Documentation Test
I had a document explaining how agents talked to each other. It included a routing diagram, a peer-binding table, a channel-ownership matrix, and a delegation protocol.
When I realized that document existed, I should have simplified the system immediately. If you need a diagram to explain how your agents communicate, you have too many agents.
Where It Ended
16 agents became 2. The VPS is the same. The deployment is the same. The memory system is the same. The capability is approximately the same.
What’s different:
- Config files: 16 agent entries → 2
- Discord bots: 5 → 2
- Cron jobs: 23 → 2
- GitHub secrets: 15+ → 7
- Time spent on agent issues: Hours per week → almost zero
- Mean time to diagnose problems: 30+ minutes → under 5
The previous setup is archived in a private repository. I’m not embarrassed by it — it was a valuable experiment. But the lesson is clear: start with the simplest thing that works, and only add complexity when you’ve proven you need it.
Building multi-agent systems is like building microservices. Everyone wants to start with the distributed architecture. Almost everyone should start with the monolith.
A Note on AI Agent Infrastructure
If you’re building your own multi-agent setup, here’s the practical advice:
- Start with one agent. Add a second only when you have a specific problem the first can’t solve.
- Track memory in git. You’ll thank yourself when things go wrong.
- Default-deny everything. Grant permissions explicitly.
- Use fast models for coordinators. Slow models + high message volume = death spiral.
- Make agents ephemeral when possible. Persistent context accumulates garbage.
- Deploy from git, not from SSH. Config drift will eat you alive.
The most important lesson: the number of agents is not a measure of sophistication. It’s a measure of complexity. And complexity is the enemy of reliability.
Overall many lessons learned in few weeks and had a lot of fun doing it.