What I Learned Running 16 AI Agents on a Single VPS

February 9, 2026

For about 10 weeks, I ran 16 AI agents on a single VPS. They coordinated across Discord and Telegram, managed projects, wrote code, posted briefings, and — eventually — taught me that I’d built the wrong thing.

This is the story of what worked, what didn’t, and why I replaced all 16 with just 2.

The Setup

One Hetzner VPS. 8 vCPUs, 32GB RAM, 240GB NVMe. Running an AI gateway that managed:

16 agents with different roles (coordinators, specialists, infrastructure)
5 Discord bots serving channels across a private server
2 Telegram bots for direct messaging
A peer-routing system where one bot could serve multiple agents
A shared knowledge base indexed with semantic search
23 cron jobs for briefings, reflections, and maintenance
Git-tracked memory that synced between VPS and GitHub every few hours

The architecture was ambitious. A coordinator agent would see all messages in its channels. When a task required coding, it would delegate to a specialist agent via Discord @mention. Specialists were ephemeral — they’d respond once, clear their context, and wait for the next mention.

Project-specific agents monitored their own channels, posted morning briefings at 8:30 AM, ran end-of-day reflections, and maintained project memory in git-tracked markdown files.

It all worked. And that was part of the problem.

What Worked Well

Before I get into the failures, some things genuinely impressed me.

Git-Tracked Memory

Every agent’s memory was stored in markdown files tracked by git. A cron job committed and pushed changes every few hours. This meant:

Agent memory survived crashes, restarts, and resets
I could review what agents were “thinking” via git history
Rolling back a bad memory state was just git revert
Memory was human-readable, not locked in a database

This was the single best architectural decision. When everything else got complicated, memory-as-code stayed simple and reliable.

Direct Clone Deployment

The VPS directory was the git repository. Deployment was literally git pull followed by a service restart. No Docker builds, no artifact shipping, no container registries. Config files, agent workspaces, and scripts all lived in the repo.

This eliminated an entire class of deployment bugs. What I tested locally was exactly what ran in production.

Ephemeral Specialists

Instead of giving every agent persistent state, specialists were designed to be stateless. Each invocation:

Reset context (clean slate)
Complete the task
Report back
Stop

This avoided the biggest problem with persistent agents: context pollution. A coding agent that had just discussed marketing would sometimes produce marketing-flavored code suggestions. Ephemeral agents didn’t have that problem.

Default-Deny Security

Every agent’s shell access was controlled by an approvals file:

deny     → no shell access at all
allowlist → only approved binaries
full     → unrestricted

Most agents had deny. Only the infrastructure agent had full. This was added from day one, not bolted on later. It prevented several accidental rm -rf situations when agents got creative with shell commands.

What Broke

Session Conflicts

Multiple agents sharing the same CLI backend caused session conflicts. Agent A would start a conversation, Agent B would try to resume it, and both would hang. The typing indicator would appear in Discord, spin for two minutes, then nothing.

The fix was straightforward (different session modes per agent), but diagnosing it took hours because the symptoms were subtle — agents just… stopped responding.

Identity Confusion

This was the most surreal bug. Agents reading Discord chat history would sometimes adopt the identity of another agent whose messages appeared in the thread.

A coding specialist would see messages from the coordinator in the chat history and start responding as the coordinator, using the coordinator’s name and mention format. Adding explicit identity sections to each agent’s instructions helped, but it was a constant battle.

Backlog Death Spirals

The coordinator agent had requireMention: false, meaning it saw every message in its channels. It also had allowBots: true, so it could see specialist responses. Combined with a slow model, this created a positive feedback loop:

Message arrives
Coordinator processes it (5-10 seconds)
During processing, more messages arrive (including bot responses)
Each new message queues behind the current one
Backlog grows faster than it drains

At peak, messages were taking 200+ seconds to get responses. The gateway logs showed “Slow listener detected” warnings every few seconds.

The fix was switching the coordinator to a faster model. But the fundamental problem was architectural: a single agent trying to process every message in real-time doesn’t scale when the model is slow.

Infinite Loops

Two coordinators in the same server, both with requireMention: false, both with allowBots: true. Agent A responds to a message. Agent B sees Agent A’s response and responds to it. Agent A sees Agent B’s response. Repeat forever.

The fix was channel-based routing in each agent’s instructions: “If you’re in #channel-x, respond. If not, stay silent.” But it was fragile — instructions are suggestions, not constraints. Agents would occasionally ignore their routing rules and respond in the wrong channel.

Config Drift

The VPS had a “live” config and the repo had a “template” config. Over time, these diverged. I’d fix something on the VPS via SSH, forget to update the repo, and the next deployment would revert my fix.

The gateway also had a doctor --fix feature that would restore config from backups on startup. This was helpful when config got corrupted, but it also silently reverted intentional changes.

The Deeper Lessons

Complexity is a Liability, Not a Feature

16 agents felt productive. Each one had a clear role, a well-defined workspace, and specific responsibilities. On paper, the system was beautifully organized.

In practice, every agent was a surface area for bugs. Each one needed:

Instructions that didn’t conflict with other agents
Model configuration that balanced speed and quality
Channel routing rules that didn’t overlap
Security permissions that were tight enough but not too tight
Memory that was useful but didn’t pollute context
Cron jobs that ran on schedule without stepping on each other

Managing 16 agents wasn’t 16x the work of managing 1. It was closer to 16-squared, because every agent interacted with every other agent through shared channels, shared memory, and shared infrastructure.

Two Focused Agents Beat Sixteen Scattered Ones

After simplifying to 2 agents, the system became dramatically more reliable. Not because 2 agents are inherently better, but because:

No coordination overhead between agents
No identity confusion (only 2 identities to maintain)
No routing rules (each agent owns its channels)
No backlog spirals (fewer messages to process)
Simpler config (2 entries instead of 16 in every config file)

The surprising part: I didn’t lose any meaningful capability. The 14 removed agents were doing work that either wasn’t needed or could be done by the remaining 2 on demand.

Memory-as-Code Beats Databases

Storing agent memory in git-tracked markdown files was better than any database approach I considered:

Transparent: I could read and edit memory directly
Versioned: Full history of every memory change
Durable: Survived every crash, reset, and migration
Portable: Moved between servers by cloning a repo
Debuggable: git diff showed exactly what changed
Collaborative: I could commit memory changes alongside code changes

The tradeoff was speed (file I/O vs database queries), but for agent memory that’s updated every few hours, it didn’t matter.

Default-Deny Security from Day One

Adding security after the fact is painful. Adding it from day one is trivial. Every new agent started with deny access, and I explicitly granted permissions as needed.

This prevented:

A creative writing agent from running shell commands
A research agent from accessing the file system
A project coordinator from modifying infrastructure

The cost was a few lines of config per agent. The benefit was never having to explain to a client why their data was accessed by the wrong AI agent.

The Documentation Test

I had a document explaining how agents talked to each other. It included a routing diagram, a peer-binding table, a channel-ownership matrix, and a delegation protocol.

When I realized that document existed, I should have simplified the system immediately. If you need a diagram to explain how your agents communicate, you have too many agents.

Where It Ended

16 agents became 2. The VPS is the same. The deployment is the same. The memory system is the same. The capability is approximately the same.

What’s different:

Config files: 16 agent entries → 2
Discord bots: 5 → 2
Cron jobs: 23 → 2
GitHub secrets: 15+ → 7
Time spent on agent issues: Hours per week → almost zero
Mean time to diagnose problems: 30+ minutes → under 5

The previous setup is archived in a private repository. I’m not embarrassed by it — it was a valuable experiment. But the lesson is clear: start with the simplest thing that works, and only add complexity when you’ve proven you need it.

Building multi-agent systems is like building microservices. Everyone wants to start with the distributed architecture. Almost everyone should start with the monolith.

A Note on AI Agent Infrastructure

If you’re building your own multi-agent setup, here’s the practical advice:

Start with one agent. Add a second only when you have a specific problem the first can’t solve.
Track memory in git. You’ll thank yourself when things go wrong.
Default-deny everything. Grant permissions explicitly.
Use fast models for coordinators. Slow models + high message volume = death spiral.
Make agents ephemeral when possible. Persistent context accumulates garbage.
Deploy from git, not from SSH. Config drift will eat you alive.

The most important lesson: the number of agents is not a measure of sophistication. It’s a measure of complexity. And complexity is the enemy of reliability.

Overall many lessons learned in few weeks and had a lot of fun doing it.