← Back to Projects
🤖
Local LLM
ActiveSelf-hosted AI inference with Ollama. Running Qwen 2.5 models for general tasks and code generation without cloud dependencies.
Ollama AI Self-hosted
Configuration
Container
ollama/ollama
RAM Limit
16GB
API
REST (local only)
Access
LAN — not exposed
Available Models
| Model | Parameters | Purpose | VRAM |
|---|---|---|---|
| qwen2.5:7b | 7B | General tasks, summaries, simple questions | ~6GB |
| qwen2.5-coder:7b | 7B | Code generation and analysis | ~6GB |
Benefits
Cost Reduction
Free inference for simple tasks that would otherwise use paid API calls. Saves money on summarization, status checks, and simple questions.
Privacy
Sensitive code and data never leaves the local network. No cloud provider sees your prompts or responses.
Speed
Local inference with no network latency. Responses start immediately without waiting for API round-trips.
Availability
Works offline and during API outages. Not dependent on external service availability.
Integrations
| Claude Code Workers | Workers use Ollama for simple tasks before falling back to paid APIs |
| Ollama MCP Server | Model Context Protocol server for standardized LLM access |
| Home Assistant | Potential integration for natural language automation control |