Local Voice Control for Home Assistant: No Cloud Required
Voice assistants are convenient but come with privacy trade-offs. Every command goes to cloud servers for processing. Alexa, Google Home, and Siri all record and analyze your speech. For a smart home that values privacy, local voice processing is the answer.
Home Assistant’s Wyoming protocol enables fully local voice control. Speech recognition, intent parsing, and text-to-speech all run on your own hardware. Responses are faster than cloud alternatives, and nothing leaves your network.
Architecture Overview
Local voice control has four components:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Wake Word │────▶│ Speech │────▶│ Intent │────▶│ Text to │
│ Detection │ │ to Text │ │ Handler │ │ Speech │
│ (openWakeWord) │ (Whisper) │ │ (HA Assist) │ │ (Piper) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
- Wake word detection — Listens for “Hey Jarvis” or custom phrase
- Speech to text — Converts audio to text using Whisper
- Intent handler — Home Assistant’s Assist processes the command
- Text to speech — Piper generates spoken response
All components run locally via Docker or Home Assistant add-ons.
Hardware Requirements
Voice processing is CPU-intensive. Minimum specs for acceptable performance:
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 4GB | 8GB+ |
| Storage | 10GB free | SSD preferred |
On our Dell R7525 with dual EPYC processors, Whisper transcription takes under 1 second. On a Raspberry Pi 4, expect 3-5 seconds per command.
GPU acceleration is optional but significantly improves Whisper performance. An NVIDIA GPU with CUDA support can process speech in real-time.
Installation
Option 1: Home Assistant Add-ons
The simplest path for Home Assistant OS users:
-
Install Whisper add-on
- Settings → Add-ons → Add-on Store
- Search “Whisper” → Install
- Configuration: Choose model size (base or small for faster response)
-
Install Piper add-on
- Search “Piper” → Install
- Configuration: Select voice (en_US-lessac-medium recommended)
-
Install openWakeWord add-on
- Search “openWakeWord” → Install
- Configuration: Choose wake word (default: “ok nabu”)
-
Configure Assist pipeline
- Settings → Voice assistants → Add assistant
- Set conversation agent, STT engine (Whisper), TTS engine (Piper)
Option 2: Docker Compose
For more control or non-HAOS installations:
# docker-compose.yml
services:
whisper:
image: rhasspy/wyoming-whisper
container_name: whisper
command: --model base.en --language en
volumes:
- ./whisper-data:/data
ports:
- "10300:10300"
restart: unless-stopped
piper:
image: rhasspy/wyoming-piper
container_name: piper
command: --voice en_US-lessac-medium
volumes:
- ./piper-data:/data
ports:
- "10200:10200"
restart: unless-stopped
openwakeword:
image: rhasspy/wyoming-openwakeword
container_name: openwakeword
command: --preload-model 'ok_nabu'
ports:
- "10400:10400"
restart: unless-stopped
Then add Wyoming integrations in Home Assistant:
- Settings → Devices & Services → Add Integration → Wyoming
- Add each service by hostname:port
Satellite Devices
A voice satellite is a microphone/speaker in another room that connects to your central Home Assistant. Options:
ESP32-S3 Box
The official Home Assistant voice hardware. ~$45, includes display, speaker, and microphones.
# esphome configuration for ESP32-S3-BOX
esphome:
name: voice-satellite-office
friendly_name: Office Voice Satellite
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
voice_assistant:
microphone: echo_microphone
speaker: echo_speaker
on_wake_word_detected:
- light.turn_on:
id: led_ring
effect: "Listening"
on_stt_end:
- light.turn_on:
id: led_ring
effect: "Processing"
on_tts_start:
- light.turn_on:
id: led_ring
effect: "Speaking"
on_end:
- light.turn_off: led_ring
Raspberry Pi with ReSpeaker
A Pi Zero 2 W with ReSpeaker 2-mic HAT works well:
# Wyoming satellite service
satellite:
image: rhasspy/wyoming-satellite
container_name: satellite
command: >
--name 'Kitchen Satellite'
--uri 'tcp://0.0.0.0:10700'
--mic-command 'arecord -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw'
--snd-command 'aplay -D plughw:CARD=seeed2micvoicec,DEV=0 -r 22050 -c 1 -f S16_LE -t raw'
--wake-uri 'tcp://homeassistant.local:10400'
--wake-word-name 'ok_nabu'
devices:
- /dev/snd:/dev/snd
group_add:
- audio
restart: unless-stopped
Old Android Phone
Repurpose an old phone as a voice satellite using the Home Assistant Companion app:
- Install Companion app
- Settings → Companion App → Manage Assist → Enable Assist
- Configure pipeline to use local Whisper/Piper
- Phone microphone and speaker become a satellite
Custom Wake Words
The default “ok nabu” is functional but impersonal. Train a custom wake word:
- Record 10-20 samples of your wake phrase
- Use openWakeWord training notebook (Google Colab works)
- Export .tflite model
- Copy to openWakeWord models directory
- Configure:
--preload-model 'custom_wake_word'
Popular custom options: “Hey Jarvis”, “Computer”, “Hey Home”, or your home’s name.
Assist Sentences
Home Assistant Assist parses commands using sentence templates. Default templates handle common requests:
- “Turn on the living room lights”
- “What’s the temperature in the bedroom?”
- “Lock the front door”
- “Set the thermostat to 72”
Add custom sentences for your specific setup:
# config/custom_sentences/en/lights.yaml
language: "en"
intents:
HassLightSet:
data:
- sentences:
- "movie mode"
- "movie time"
slots:
name: "Living Room"
brightness: 20
HassTurnOff:
data:
- sentences:
- "goodnight"
- "bedtime"
requires_context:
domain: light
Now “movie mode” dims living room lights to 20%, and “goodnight” turns off all lights.
Response Templates
Customize Assist responses for more natural interactions:
# config/custom_sentences/en/responses.yaml
language: "en"
responses:
intents:
HassLightSet:
default: "{{ slots.name }} set to {{ slots.brightness }} percent"
HassTurnOn:
default: "Turning on {{ slots.name }}"
HassTurnOff:
default: "{{ slots.name }} is off"
Automations with Voice Triggers
Voice commands can trigger complex automations:
- id: 'voice_movie_mode'
alias: Voice Activated Movie Mode
triggers:
- trigger: conversation
command:
- "movie mode"
- "movie time"
- "start the movie"
actions:
- action: scene.turn_on
target:
entity_id: scene.movie_mode
- action: media_player.turn_on
target:
entity_id: media_player.living_room_tv
The conversation trigger responds to Assist commands, including voice.
Performance Tuning
Whisper Model Selection
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny.en | 39MB | Fastest | Basic |
| base.en | 74MB | Fast | Good |
| small.en | 244MB | Medium | Better |
| medium.en | 769MB | Slow | Best |
For real-time voice control, base.en balances speed and accuracy. Use small.en if you have CPU headroom and want better recognition of unusual words.
Piper Voice Quality
Voice options range from fast/robotic to slow/natural:
- x_low: Fastest, most robotic
- low: Fast, slightly mechanical
- medium: Balanced (recommended)
- high: Slower, more natural
The en_US-lessac-medium voice sounds natural without excessive latency.
Wake Word Sensitivity
Adjust sensitivity to balance false positives vs missed wake words:
# Higher = more sensitive (more false positives)
# Lower = less sensitive (may miss commands)
command: --preload-model 'ok_nabu' --threshold 0.5
Start at 0.5 and adjust based on your environment. Noisy rooms need lower sensitivity.
Privacy Comparison
| Feature | Local (Wyoming) | Cloud (Alexa/Google) |
|---|---|---|
| Audio leaves home | Never | Always |
| Internet required | No | Yes |
| Response time | 0.5-2s | 1-3s |
| Third-party access | None | Amazon/Google |
| Works during outage | Yes | No |
| Monthly cost | $0 | $0-15 (subscriptions) |
The only trade-off: initial setup complexity. Once running, local voice control is faster, more private, and more reliable than cloud alternatives.
Troubleshooting
Wake word not detected: Check microphone gain levels. Too low and the wake word won’t register; too high causes clipping. Test with arecord -d 5 test.wav and play back.
Slow transcription: Switch to a smaller Whisper model. Check CPU usage during transcription — if maxed out, consider GPU acceleration or better hardware.
Wrong interpretations: Add custom sentences that match how you naturally speak. Assist improves as you add more sentence patterns.
Satellite disconnects: Ensure stable network connection. WiFi satellites may drop on congested networks — consider wired Ethernet for reliability.
No audio response: Verify Piper is running and accessible. Check Home Assistant logs for TTS errors. Ensure speaker volume is up on satellite device.
Summary
Local voice control with Wyoming provides:
- Sub-second response times
- Zero cloud dependency
- Complete privacy
- Works during internet outages
- No subscriptions or recurring costs
The setup takes a few hours, but the result is a voice assistant that respects your privacy while controlling your entire smart home.
Resources
Enjoyed this post?
Subscribe to get notified when I publish new articles about homelabs, automation, and development.
No spam, unsubscribe anytime. Typically 2-4 emails per month.