Self-Hosted LLMs in Production: What It Actually Takes to Cut API Costs
A real case study on running 80B parameter models locally: hardware, costs, tradeoffs, and the numbers from a production self-hosted AI stack serving 4 concurrent users.
A real case study on running 80B parameter models locally: hardware, costs, tradeoffs, and the numbers from a production self-hosted AI stack serving 4 concurrent users.
How we ported six patterns from NousResearch's Hermes Agent — FTS5 recall, dialectic user model, auto-skill drafts, trajectory logging — onto our four-worker Claude Code system in one session.
Building an algorithmic paper trading system, a massive app sprint, RxLog and SiteSnap updates, and getting the Play Store submission pipeline fully automated.
A massive app verification blitz, Nova drops Telegram for good, a cache drive emergency, and a game side project hits 27 dev sessions.
A packed week of infrastructure hardening, app releases, ESP32 hardware design, and building verification systems that codify hard-won lessons.
GriswoldLabs launches with a homelab buildout, dual V100 GPUs, local AI inference, and the foundation for everything to come.