← All case studies

AI Integration

Nova — Personal AI Assistant

A self-hosted AI assistant with 56 integrated tools, mobile + voice access, and a routing layer that hands 90% of queries to a local model — keeping cloud spend at lunch-money levels.

Duration

8 months in production

Cloud spend

~$50/mo cloud spend (was $500+/mo on equivalent SaaS)

Stack

Node.js · React Native · Ollama (local LLM) · Claude API · WebSocket · Docker

The Problem

Off-the-shelf AI assistants (ChatGPT, Claude, Gemini, etc.) all share three frustrations: they don't know your stack, your data leaves your network, and the bill scales with usage. For someone running a home automation system, a media server, a trading bot, four niche websites, and 48 mobile apps, the per-token costs alone would have been $500+/month. The privacy story would have been worse — every config file, every email draft, every scratchpad would touch a cloud provider.

The Solution

Nova is a custom AI assistant that runs as a containerized service on a Dell R7525 with 3× NVIDIA Tesla V100 GPUs (96 GB VRAM total). It exposes a WebSocket API consumed by a React Native mobile app and a web dashboard. The architectural bet is three-tier model routing:

  1. Tier 1 — Regex & deterministic dispatch. Commands like "lights off" or "what's the cache fill?" hit a router that maps directly to a tool call. Latency: <50ms. Cost: zero.
  2. Tier 2 — Local LLM (Ollama, qwen3-coder-next, 80B MoE). Anything ambiguous goes to the local model with the full toolset attached. Latency: 1-3s. Cost: zero (already paid for the GPUs).
  3. Tier 3 — Cloud API (Claude / GPT). Only when the local model self-reports low confidence or the task requires the very best reasoning. Latency: 2-5s. Cost: ~$0.02-0.20 per call.

The router measures which tier handled the query and writes a daily breakdown. In practice, ~90% of traffic stops at Tier 1 or 2 — leaving Tier 3 to handle the truly hard requests where paying for Claude actually moves the needle.

What Got Integrated

56 tools across categories:

  • Home & infrastructure — Home Assistant control, Docker container management, Unraid status, server health.
  • Personal data — Calendar, email triage, password vault read, contact lookup, document search across Nextcloud.
  • Build & dev — App build status, deploy triggers, Gitea repo browsing, log tailing.
  • Finance & research — Stock quotes, portfolio diffing, SEC filing pulls, scheduled cron monitors with push alerts.
  • Communication — Push notifications to phone, email composition with attachments, SMS via Twilio.

Results

  • Cloud cost: ~$50/month instead of an estimated $500+/month with equivalent cloud-only setup.
  • Latency: Sub-second response for ~70% of queries (Tier 1 + warm Tier 2).
  • Privacy: Personal data never leaves the LAN unless Tier 3 is invoked, and Tier 3 calls are logged and reviewable.
  • Reliability: 99.4% uptime over the last 6 months, including two server reboots and one GPU driver upgrade.
  • Replaced: ChatGPT Plus, Google Assistant, three separate dashboard apps, a $20/mo notification service, manual SSH-into-server workflows.

What Surprised Us

Tool definitions matter more than prompts. The single biggest accuracy gain came from rewriting tool descriptions — not refining the system prompt. A tool description that says "use this when the user mentions X" pulls correct invocations dramatically better than vague capability lists.

WebSocket beat polling and beat raw HTTP for mobile. A persistent WebSocket connection lets the assistant push progress updates ("checking your inbox… found 3 new") instead of going silent for 5 seconds while the model thinks. UX delta is large; engineering effort was small.

Local models are good enough for 90% of personal-assistant work. The hard part isn't model capability — it's tool routing, prompt engineering, and keeping the conversation context tight. Throwing GPT-4o at a problem that qwen3-coder handles in 800ms locally is bad engineering.

What We'd Build For You

Most companies don't need a 96GB GPU — but they almost certainly have:

  • A bunch of internal SaaS subscriptions whose data could be queried by an LLM if it had API access.
  • A handful of repetitive workflows (status reports, ticket triage, customer FAQ) that an assistant could front-line.
  • A privacy or compliance constraint that makes "send everything to OpenAI" a non-starter.

For ~$5K-$15K we can build a stripped-down Nova for your stack: 5-15 tools, your authentication, your model preference (cloud, local, or hybrid), deployed to your infrastructure, with documentation for your team to extend.

Interested?

See the services page for engagement options, or email [email protected] with a one-paragraph description of your use case.