← Back to Projects
🤖

Local LLM

Active

Self-hosted AI inference with Ollama. Running Qwen 2.5 models for general tasks and code generation without cloud dependencies.

Ollama AI Self-hosted

Configuration

Container

ollama/ollama

RAM Limit

16GB

API

REST (local only)

Access

LAN — not exposed

Case Study: Cost-Effective AI Development Workflow

The Challenge

Reduce dependency on expensive cloud AI APIs while maintaining access to capable language models for development tasks, code generation, and document summarization — all while keeping sensitive code local.

The Solution

  • Deployed Ollama in Docker container with dedicated RAM allocation
  • Configured Qwen 2.5 7B models for balance of speed and capability
  • Integrated with Claude Code workers for tiered inference (local first, API fallback)
  • Set up MCP server for standardized model access across tools
  • Implemented prompt routing to use local models for simple tasks

The Results

~$50
Monthly API savings
100%
Data stays local
<1s
Response latency
0
External dependencies

Available Models

Model Parameters Purpose VRAM
qwen2.5:72b 72B Complex reasoning, primary model for most tasks ~28GB
qwen2.5-coder:32b 32B Code generation and analysis ~20GB
llama3.1:8b 8B Fast tasks, classification, simple extraction ~6GB
codellama:34b 34B Code review and refactoring ~22GB

Benefits

Cost Reduction

Free inference for simple tasks that would otherwise use paid API calls. Saves money on summarization, status checks, and simple questions.

Privacy

Sensitive code and data never leaves the local network. No cloud provider sees your prompts or responses.

Speed

Local inference with no network latency. Responses start immediately without waiting for API round-trips.

Availability

Works offline and during API outages. Not dependent on external service availability.

Integrations

Claude Code Workers Workers use Ollama for simple tasks before falling back to paid APIs
Ollama MCP Server Model Context Protocol server for standardized LLM access
Home Assistant Potential integration for natural language automation control

Resources