Low-Cost Autoresearch withHPC-AI.COM
Based on experiments by Tobias Lee — see original post: https://x.com/_TobiasLee/status/203846732751615264
Project: Andrej Karpathy's Autoresearch https://github.com/karpathy/autoresearch
Goal: Reproduce an autonomous AI research loop with minimal cost.
Author: Li Lei
Overview
Autoresearch is a novel experiment where an AI agent autonomously modifies and retrains a GPT model in a real training environment:
- Train for 5 minutes per round → evaluate model using val_bpb (bits per byte, lower is better) → keep or discard changes → repeat.
- The agent can change almost anything: model architecture, hyperparameters, optimizer, batch size.
- Core components: GPU machine, smart agent, and a robust training harness.
The original setup could be resource-intensive, but using HPC-AI.com's Model APIs, we reproduced the full pipeline for under $2.
Implementation
Hardware & API Setup
- GPU instance: H200 x 1.5 hours → ~$1.60 (20–25 rounds).
- API calls: agent queries LLM for code analysis & decisions → ~$0.30.
- Training harness: OpenCode.
- Total cost: <$2.
HPC-AI Advantages
- Pay-per-use, no subscription required, K2.5 spot instance is cost-effective.
- Pre-installed CUDA + PyTorch, ready out-of-the-box.
- Stable network ensures uninterrupted dataset download.
Step-by-Step Reproduction
- Register on HPC-AI and create an H200 instance, obtain API key.
- Connect to instance and clone the autoresearch repo.
- Install dependencies and prepare data (~2 minutes).
- Run one manual round to verify the environment.
- Install OpenCode and directly connect to HPC-AI's Model API, no local deployment is needed. Use your API key to access K2.5 / M2.5, then import and configure the custom provider documentation.
- Input agent prompt and start autoresearch loop.
- Let it run (overnight recommended) and review results next day.
Results

- 2 hours of autonomous runs lowered val_bpb from 0.994 → 0.990.
- Agent mainly tuned learning rate and experimented with attention patterns.
- 28 attempts in total, 7 successful (~25% success rate).
- Demonstrates the potential of AI-driven research loops at very low cost.
Key Takeaways
- Autoresearch represents a new AI research paradigm:
- Human-defined boundaries (via program.md).
- AI-driven exploration (code modification, experiment design, evaluation).
- Future PhD-style workflow could involve supervising agents overnight and refining research prompts daily.
Tip for Beginners: Run at least 2 hours (~20 rounds) for initial experience; overnight (~100 rounds) for full potential.
Ready to Power your AI Workloads?
Join thousands of developers and researchers who are building the future with our HPC-AI platform.

