AI Integration
Nova — Personal AI Assistant
A self-hosted AI assistant with 56 integrated tools, mobile + voice access, and a routing layer that hands 90% of queries to a local model — keeping cloud spend at lunch-money levels.
Duration
8 months in production
Cloud spend
~$50/mo cloud spend (was $500+/mo on equivalent SaaS)
Stack
Node.js · React Native · Ollama (local LLM) · Claude API · WebSocket · Docker
The Problem
Off-the-shelf AI assistants (ChatGPT, Claude, Gemini, etc.) all share three frustrations: they don't know your stack, your data leaves your network, and the bill scales with usage. For someone running a home automation system, a media server, a trading bot, four niche websites, and 48 mobile apps, the per-token costs alone would have been $500+/month. The privacy story would have been worse — every config file, every email draft, every scratchpad would touch a cloud provider.
The Solution
Nova is a custom AI assistant that runs as a containerized service on a Dell R7525 with 3× NVIDIA Tesla V100 GPUs (96 GB VRAM total). It exposes a WebSocket API consumed by a React Native mobile app and a web dashboard. The architectural bet is three-tier model routing:
- Tier 1 — Regex & deterministic dispatch. Commands like "lights off" or "what's the cache fill?" hit a router that maps directly to a tool call. Latency: <50ms. Cost: zero.
- Tier 2 — Local LLM (Ollama, qwen3-coder-next, 80B MoE). Anything ambiguous goes to the local model with the full toolset attached. Latency: 1-3s. Cost: zero (already paid for the GPUs).
- Tier 3 — Cloud API (Claude / GPT). Only when the local model self-reports low confidence or the task requires the very best reasoning. Latency: 2-5s. Cost: ~$0.02-0.20 per call.
The router measures which tier handled the query and writes a daily breakdown. In practice, ~90% of traffic stops at Tier 1 or 2 — leaving Tier 3 to handle the truly hard requests where paying for Claude actually moves the needle.
What Got Integrated
56 tools across categories:
- Home & infrastructure — Home Assistant control, Docker container management, Unraid status, server health.
- Personal data — Calendar, email triage, password vault read, contact lookup, document search across Nextcloud.
- Build & dev — App build status, deploy triggers, Gitea repo browsing, log tailing.
- Finance & research — Stock quotes, portfolio diffing, SEC filing pulls, scheduled cron monitors with push alerts.
- Communication — Push notifications to phone, email composition with attachments, SMS via Twilio.
Results
- Cloud cost: ~$50/month instead of an estimated $500+/month with equivalent cloud-only setup.
- Latency: Sub-second response for ~70% of queries (Tier 1 + warm Tier 2).
- Privacy: Personal data never leaves the LAN unless Tier 3 is invoked, and Tier 3 calls are logged and reviewable.
- Reliability: 99.4% uptime over the last 6 months, including two server reboots and one GPU driver upgrade.
- Replaced: ChatGPT Plus, Google Assistant, three separate dashboard apps, a $20/mo notification service, manual SSH-into-server workflows.
What Surprised Us
Tool definitions matter more than prompts. The single biggest accuracy gain came from rewriting tool descriptions — not refining the system prompt. A tool description that says "use this when the user mentions X" pulls correct invocations dramatically better than vague capability lists.
WebSocket beat polling and beat raw HTTP for mobile. A persistent WebSocket connection lets the assistant push progress updates ("checking your inbox… found 3 new") instead of going silent for 5 seconds while the model thinks. UX delta is large; engineering effort was small.
Local models are good enough for 90% of personal-assistant work. The hard part isn't model capability — it's tool routing, prompt engineering, and keeping the conversation context tight. Throwing GPT-4o at a problem that qwen3-coder handles in 800ms locally is bad engineering.
What We'd Build For You
Most companies don't need a 96GB GPU — but they almost certainly have:
- A bunch of internal SaaS subscriptions whose data could be queried by an LLM if it had API access.
- A handful of repetitive workflows (status reports, ticket triage, customer FAQ) that an assistant could front-line.
- A privacy or compliance constraint that makes "send everything to OpenAI" a non-starter.
For ~$5K-$15K we can build a stripped-down Nova for your stack: 5-15 tools, your authentication, your model preference (cloud, local, or hybrid), deployed to your infrastructure, with documentation for your team to extend.
Interested?
See the services page for engagement options, or email [email protected] with a one-paragraph description of your use case.