🤖

Local LLM

Active

Self-hosted AI inference with Ollama. Running Qwen 2.5 models for general tasks and code generation without cloud dependencies.

Ollama AI Self-hosted

Configuration

Container

ollama/ollama

RAM Limit

16GB

API

REST (local only)

Access

LAN — not exposed

Model	Parameters	Purpose	VRAM
qwen2.5:7b	7B	General tasks, summaries, simple questions	~6GB
qwen2.5-coder:7b	7B	Code generation and analysis	~6GB

Free inference for simple tasks that would otherwise use paid API calls. Saves money on summarization, status checks, and simple questions.

Sensitive code and data never leaves the local network. No cloud provider sees your prompts or responses.

Local inference with no network latency. Responses start immediately without waiting for API round-trips.

Works offline and during API outages. Not dependent on external service availability.

Claude Code Workers	Workers use Ollama for simple tasks before falling back to paid APIs
Ollama MCP Server	Model Context Protocol server for standardized LLM access
Home Assistant	Potential integration for natural language automation control