[OK] loading post: running-ai-agents-on-a-5-vps.html
[OK] language: en · read: ~6 min
~/blog/running-ai-agents-on-a-5-vps.md ● EN
~ / blog / running-ai-agents-on-a-5-vps

Running AI agents on a $5 VPS (without melting it)

Most 'AI agent' tutorials assume a 16-core box with an H100. Here's what I actually do on 1 vCPU and 1 GB of RAM.

12 JUN 2026 by hendra 6 min read tags: ai ops tutorial

Every "AI agent" tutorial I read assumes you have a 16-core machine with an H100 in it and a $200/month OpenAI bill. Most of what I actually do on a daily basis fits in a $5 VPS with 1 vCPU and 1 GB of RAM. Here is the setup that survives.

The hardware, for real

I'm on a 1 vCPU, 1 GB RAM, 25 GB SSD box in Singapore. Latency to most of my tools is fine. The constraint isn't CPU — it's RAM and outbound bandwidth. RAM gets eaten by long contexts; bandwidth gets eaten by big tool results.

Rule #1: never run a model locally

I tried. Llama-3-8B on a 1 GB box is a slideshow, and the swap-thrash will make your whole server unresponsive. Use a hosted endpoint. The $5 tier on most providers gives you more than enough tokens for a single-agent personal use case.

Rule #2: keep context small

Long contexts are billed per token on input. They are also slow. I trim aggressively. The system prompt is under 800 tokens. Tool results get summarized. Old turns get rolled up.

A common mistake: passing the entire previous tool output into the next turn when only one field is needed. Be ruthless.

Rule #3: rate-limit yourself

I add a 1.5-second sleep between agent turns. This costs me nothing in wall time but saves me from a runaway loop burning $20 in five minutes because I forgot to cap max_iterations.

import time
for step in agent.run():
    handle(step)
    time.sleep(1.5)

Rule #4: pick the right model for the right task

A cheap 8B model is fine for "summarize this text" or "extract these fields." A bigger model is needed for "plan a multi-step migration." Don't use a 70B model to write a commit message.

I keep a routing table:

TASKS = {
  "summarize":  "minimax-m3",
  "extract":    "minimax-m3",
  "plan":       "claude-sonnet",
  "code":       "claude-sonnet",
  "brainstorm": "gpt-5.5",
}

Rule #5: persist everything

SQLite for session state. JSONL for the agent's full trajectory. If the box dies at 3am, I want to be able to reattach a session and continue the conversation from message 47, not from scratch.

What I run on this $5

  • A personal Telegram bot that talks to me, with the agent framework Hermes
  • A cron job that summarizes my inbox every morning
  • A scraper that watches a few RSS feeds and pings me when something interesting lands
  • A RAG index over my own notes (small, fits in 200 MB)

None of these need a GPU. None of these need a huge context window. All of them get by on discipline.


— Hendra, 2026-06-12

EOF // EOF // signal lost, but words remain