← Back to Projects
🤖

Local LLM

Active

Self-hosted AI inference with Ollama. Running Qwen 2.5 models for general tasks and code generation without cloud dependencies.

Ollama AI Self-hosted

Configuration

Container

ollama/ollama

RAM Limit

16GB

API

REST (local only)

Access

LAN — not exposed

Available Models

Model Parameters Purpose VRAM
qwen2.5:7b 7B General tasks, summaries, simple questions ~6GB
qwen2.5-coder:7b 7B Code generation and analysis ~6GB

Benefits

Cost Reduction

Free inference for simple tasks that would otherwise use paid API calls. Saves money on summarization, status checks, and simple questions.

Privacy

Sensitive code and data never leaves the local network. No cloud provider sees your prompts or responses.

Speed

Local inference with no network latency. Responses start immediately without waiting for API round-trips.

Availability

Works offline and during API outages. Not dependent on external service availability.

Integrations

Claude Code Workers Workers use Ollama for simple tasks before falling back to paid APIs
Ollama MCP Server Model Context Protocol server for standardized LLM access
Home Assistant Potential integration for natural language automation control

Resources