Local Voice Control for Home Assistant: No Cloud Required

Voice assistants are convenient but come with privacy trade-offs. Every command goes to cloud servers for processing. Alexa, Google Home, and Siri all record and analyze your speech. For a smart home that values privacy, local voice processing is the answer.

Home Assistant’s Wyoming protocol enables fully local voice control. Speech recognition, intent parsing, and text-to-speech all run on your own hardware. Responses are faster than cloud alternatives, and nothing leaves your network.

Architecture Overview

Local voice control has four components:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Wake Word │────▶│   Speech    │────▶│   Intent    │────▶│   Text to   │
│  Detection  │     │   to Text   │     │   Handler   │     │   Speech    │
│  (openWakeWord)   │  (Whisper)  │     │ (HA Assist) │     │   (Piper)   │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Wake word detection — Listens for “Hey Jarvis” or custom phrase
Speech to text — Converts audio to text using Whisper
Intent handler — Home Assistant’s Assist processes the command
Text to speech — Piper generates spoken response

All components run locally via Docker or Home Assistant add-ons.

Hardware Requirements

Voice processing is CPU-intensive. Minimum specs for acceptable performance:

Component	Minimum	Recommended
CPU	4 cores	8+ cores
RAM	4GB	8GB+
Storage	10GB free	SSD preferred

On our Dell R7525 with dual EPYC processors, Whisper transcription takes under 1 second. On a Raspberry Pi 4, expect 3-5 seconds per command.

GPU acceleration is optional but significantly improves Whisper performance. An NVIDIA GPU with CUDA support can process speech in real-time.

Installation

Option 1: Home Assistant Add-ons

The simplest path for Home Assistant OS users:

Install Whisper add-on
- Settings → Add-ons → Add-on Store
- Search “Whisper” → Install
- Configuration: Choose model size (base or small for faster response)
Install Piper add-on
- Search “Piper” → Install
- Configuration: Select voice (en_US-lessac-medium recommended)
Install openWakeWord add-on
- Search “openWakeWord” → Install
- Configuration: Choose wake word (default: “ok nabu”)
Configure Assist pipeline
- Settings → Voice assistants → Add assistant
- Set conversation agent, STT engine (Whisper), TTS engine (Piper)

Option 2: Docker Compose

For more control or non-HAOS installations:

# docker-compose.yml
services:
  whisper:
    image: rhasspy/wyoming-whisper
    container_name: whisper
    command: --model base.en --language en
    volumes:
      - ./whisper-data:/data
    ports:
      - "10300:10300"
    restart: unless-stopped

  piper:
    image: rhasspy/wyoming-piper
    container_name: piper
    command: --voice en_US-lessac-medium
    volumes:
      - ./piper-data:/data
    ports:
      - "10200:10200"
    restart: unless-stopped

  openwakeword:
    image: rhasspy/wyoming-openwakeword
    container_name: openwakeword
    command: --preload-model 'ok_nabu'
    ports:
      - "10400:10400"
    restart: unless-stopped

Then add Wyoming integrations in Home Assistant:

Settings → Devices & Services → Add Integration → Wyoming
Add each service by hostname:port

Satellite Devices

A voice satellite is a microphone/speaker in another room that connects to your central Home Assistant. Options:

ESP32-S3 Box

The official Home Assistant voice hardware. ~$45, includes display, speaker, and microphones.

# esphome configuration for ESP32-S3-BOX
esphome:
  name: voice-satellite-office
  friendly_name: Office Voice Satellite

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

voice_assistant:
  microphone: echo_microphone
  speaker: echo_speaker
  on_wake_word_detected:
    - light.turn_on:
        id: led_ring
        effect: "Listening"
  on_stt_end:
    - light.turn_on:
        id: led_ring
        effect: "Processing"
  on_tts_start:
    - light.turn_on:
        id: led_ring
        effect: "Speaking"
  on_end:
    - light.turn_off: led_ring

Raspberry Pi with ReSpeaker

A Pi Zero 2 W with ReSpeaker 2-mic HAT works well:

# Wyoming satellite service
satellite:
  image: rhasspy/wyoming-satellite
  container_name: satellite
  command: >
    --name 'Kitchen Satellite'
    --uri 'tcp://0.0.0.0:10700'
    --mic-command 'arecord -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw'
    --snd-command 'aplay -D plughw:CARD=seeed2micvoicec,DEV=0 -r 22050 -c 1 -f S16_LE -t raw'
    --wake-uri 'tcp://homeassistant.local:10400'
    --wake-word-name 'ok_nabu'
  devices:
    - /dev/snd:/dev/snd
  group_add:
    - audio
  restart: unless-stopped

Old Android Phone

Repurpose an old phone as a voice satellite using the Home Assistant Companion app:

Install Companion app
Settings → Companion App → Manage Assist → Enable Assist
Configure pipeline to use local Whisper/Piper
Phone microphone and speaker become a satellite

Custom Wake Words

The default “ok nabu” is functional but impersonal. Train a custom wake word:

Record 10-20 samples of your wake phrase
Use openWakeWord training notebook (Google Colab works)
Export .tflite model
Copy to openWakeWord models directory
Configure: --preload-model 'custom_wake_word'

Popular custom options: “Hey Jarvis”, “Computer”, “Hey Home”, or your home’s name.

Assist Sentences

Home Assistant Assist parses commands using sentence templates. Default templates handle common requests:

“Turn on the living room lights”
“What’s the temperature in the bedroom?”
“Lock the front door”
“Set the thermostat to 72”

Add custom sentences for your specific setup:

# config/custom_sentences/en/lights.yaml
language: "en"
intents:
  HassLightSet:
    data:
      - sentences:
          - "movie mode"
          - "movie time"
        slots:
          name: "Living Room"
          brightness: 20

  HassTurnOff:
    data:
      - sentences:
          - "goodnight"
          - "bedtime"
        requires_context:
          domain: light

Now “movie mode” dims living room lights to 20%, and “goodnight” turns off all lights.

Response Templates

Customize Assist responses for more natural interactions:

# config/custom_sentences/en/responses.yaml
language: "en"
responses:
  intents:
    HassLightSet:
      default: "{{ slots.name }} set to {{ slots.brightness }} percent"
    HassTurnOn:
      default: "Turning on {{ slots.name }}"
    HassTurnOff:
      default: "{{ slots.name }} is off"

Automations with Voice Triggers

Voice commands can trigger complex automations:

- id: 'voice_movie_mode'
  alias: Voice Activated Movie Mode
  triggers:
    - trigger: conversation
      command:
        - "movie mode"
        - "movie time"
        - "start the movie"
  actions:
    - action: scene.turn_on
      target:
        entity_id: scene.movie_mode
    - action: media_player.turn_on
      target:
        entity_id: media_player.living_room_tv

The conversation trigger responds to Assist commands, including voice.

Performance Tuning

Whisper Model Selection

Model	Size	Speed	Accuracy
tiny.en	39MB	Fastest	Basic
base.en	74MB	Fast	Good
small.en	244MB	Medium	Better
medium.en	769MB	Slow	Best

For real-time voice control, base.en balances speed and accuracy. Use small.en if you have CPU headroom and want better recognition of unusual words.

Piper Voice Quality

Voice options range from fast/robotic to slow/natural:

x_low: Fastest, most robotic
low: Fast, slightly mechanical
medium: Balanced (recommended)
high: Slower, more natural

The en_US-lessac-medium voice sounds natural without excessive latency.

Wake Word Sensitivity

Adjust sensitivity to balance false positives vs missed wake words:

# Higher = more sensitive (more false positives)
# Lower = less sensitive (may miss commands)
command: --preload-model 'ok_nabu' --threshold 0.5

Start at 0.5 and adjust based on your environment. Noisy rooms need lower sensitivity.

Privacy Comparison

Feature	Local (Wyoming)	Cloud (Alexa/Google)
Audio leaves home	Never	Always
Internet required	No	Yes
Response time	0.5-2s	1-3s
Third-party access	None	Amazon/Google
Works during outage	Yes	No
Monthly cost	$0	$0-15 (subscriptions)

The only trade-off: initial setup complexity. Once running, local voice control is faster, more private, and more reliable than cloud alternatives.

Troubleshooting

Wake word not detected: Check microphone gain levels. Too low and the wake word won’t register; too high causes clipping. Test with arecord -d 5 test.wav and play back.

Slow transcription: Switch to a smaller Whisper model. Check CPU usage during transcription — if maxed out, consider GPU acceleration or better hardware.

Wrong interpretations: Add custom sentences that match how you naturally speak. Assist improves as you add more sentence patterns.

Satellite disconnects: Ensure stable network connection. WiFi satellites may drop on congested networks — consider wired Ethernet for reliability.

No audio response: Verify Piper is running and accessible. Check Home Assistant logs for TTS errors. Ensure speaker volume is up on satellite device.

Summary

Local voice control with Wyoming provides:

Sub-second response times
Zero cloud dependency
Complete privacy
Works during internet outages
No subscriptions or recurring costs

The setup takes a few hours, but the result is a voice assistant that respects your privacy while controlling your entire smart home.