Every "AI agent" tutorial I read assumes you have a 16-core machine with an H100 in it and a $200/month OpenAI bill. Most of what I actually do on a daily basis fits in a $5 VPS with 1 vCPU and 1 GB of RAM. Here is the setup that survives.
The hardware, for real
I'm on a 1 vCPU, 1 GB RAM, 25 GB SSD box in Singapore. Latency to most of my tools is fine. The constraint isn't CPU — it's RAM and outbound bandwidth. RAM gets eaten by long contexts; bandwidth gets eaten by big tool results.
Rule #1: never run a model locally
I tried. Llama-3-8B on a 1 GB box is a slideshow, and the swap-thrash will make your whole server unresponsive. Use a hosted endpoint. The $5 tier on most providers gives you more than enough tokens for a single-agent personal use case.
Rule #2: keep context small
Long contexts are billed per token on input. They are also slow. I trim aggressively. The system prompt is under 800 tokens. Tool results get summarized. Old turns get rolled up.
A common mistake: passing the entire previous tool output into the next turn when only one field is needed. Be ruthless.
Rule #3: rate-limit yourself
I add a 1.5-second sleep between agent turns. This costs me nothing in wall time but saves me from a runaway loop burning $20 in five minutes because I forgot to cap max_iterations.
import time
for step in agent.run():
handle(step)
time.sleep(1.5)
Rule #4: pick the right model for the right task
A cheap 8B model is fine for "summarize this text" or "extract these fields." A bigger model is needed for "plan a multi-step migration." Don't use a 70B model to write a commit message.
I keep a routing table:
TASKS = {
"summarize": "minimax-m3",
"extract": "minimax-m3",
"plan": "claude-sonnet",
"code": "claude-sonnet",
"brainstorm": "gpt-5.5",
}
Rule #5: persist everything
SQLite for session state. JSONL for the agent's full trajectory. If the box dies at 3am, I want to be able to reattach a session and continue the conversation from message 47, not from scratch.
What I run on this $5
- A personal Telegram bot that talks to me, with the agent framework Hermes
- A cron job that summarizes my inbox every morning
- A scraper that watches a few RSS feeds and pings me when something interesting lands
- A RAG index over my own notes (small, fits in 200 MB)
None of these need a GPU. None of these need a huge context window. All of them get by on discipline.
— Hendra, 2026-06-12