← Back to Blog

Local Voice Control for Home Assistant: No Cloud Required

7 min read By Charles

Voice assistants are convenient but come with privacy trade-offs. Every command goes to cloud servers for processing. Alexa, Google Home, and Siri all record and analyze your speech. For a smart home that values privacy, local voice processing is the answer.

Home Assistant’s Wyoming protocol enables fully local voice control. Speech recognition, intent parsing, and text-to-speech all run on your own hardware. Responses are faster than cloud alternatives, and nothing leaves your network.

Architecture Overview

Local voice control has four components:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Wake Word │────▶│   Speech    │────▶│   Intent    │────▶│   Text to   │
│  Detection  │     │   to Text   │     │   Handler   │     │   Speech    │
│  (openWakeWord)   │  (Whisper)  │     │ (HA Assist) │     │   (Piper)   │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
  • Wake word detection — Listens for “Hey Jarvis” or custom phrase
  • Speech to text — Converts audio to text using Whisper
  • Intent handler — Home Assistant’s Assist processes the command
  • Text to speech — Piper generates spoken response

All components run locally via Docker or Home Assistant add-ons.

Hardware Requirements

Voice processing is CPU-intensive. Minimum specs for acceptable performance:

ComponentMinimumRecommended
CPU4 cores8+ cores
RAM4GB8GB+
Storage10GB freeSSD preferred

On our Dell R7525 with dual EPYC processors, Whisper transcription takes under 1 second. On a Raspberry Pi 4, expect 3-5 seconds per command.

GPU acceleration is optional but significantly improves Whisper performance. An NVIDIA GPU with CUDA support can process speech in real-time.

Installation

Option 1: Home Assistant Add-ons

The simplest path for Home Assistant OS users:

  1. Install Whisper add-on

    • Settings → Add-ons → Add-on Store
    • Search “Whisper” → Install
    • Configuration: Choose model size (base or small for faster response)
  2. Install Piper add-on

    • Search “Piper” → Install
    • Configuration: Select voice (en_US-lessac-medium recommended)
  3. Install openWakeWord add-on

    • Search “openWakeWord” → Install
    • Configuration: Choose wake word (default: “ok nabu”)
  4. Configure Assist pipeline

    • Settings → Voice assistants → Add assistant
    • Set conversation agent, STT engine (Whisper), TTS engine (Piper)

Option 2: Docker Compose

For more control or non-HAOS installations:

# docker-compose.yml
services:
  whisper:
    image: rhasspy/wyoming-whisper
    container_name: whisper
    command: --model base.en --language en
    volumes:
      - ./whisper-data:/data
    ports:
      - "10300:10300"
    restart: unless-stopped

  piper:
    image: rhasspy/wyoming-piper
    container_name: piper
    command: --voice en_US-lessac-medium
    volumes:
      - ./piper-data:/data
    ports:
      - "10200:10200"
    restart: unless-stopped

  openwakeword:
    image: rhasspy/wyoming-openwakeword
    container_name: openwakeword
    command: --preload-model 'ok_nabu'
    ports:
      - "10400:10400"
    restart: unless-stopped

Then add Wyoming integrations in Home Assistant:

  • Settings → Devices & Services → Add Integration → Wyoming
  • Add each service by hostname:port

Satellite Devices

A voice satellite is a microphone/speaker in another room that connects to your central Home Assistant. Options:

ESP32-S3 Box

The official Home Assistant voice hardware. ~$45, includes display, speaker, and microphones.

# esphome configuration for ESP32-S3-BOX
esphome:
  name: voice-satellite-office
  friendly_name: Office Voice Satellite

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

voice_assistant:
  microphone: echo_microphone
  speaker: echo_speaker
  on_wake_word_detected:
    - light.turn_on:
        id: led_ring
        effect: "Listening"
  on_stt_end:
    - light.turn_on:
        id: led_ring
        effect: "Processing"
  on_tts_start:
    - light.turn_on:
        id: led_ring
        effect: "Speaking"
  on_end:
    - light.turn_off: led_ring

Raspberry Pi with ReSpeaker

A Pi Zero 2 W with ReSpeaker 2-mic HAT works well:

# Wyoming satellite service
satellite:
  image: rhasspy/wyoming-satellite
  container_name: satellite
  command: >
    --name 'Kitchen Satellite'
    --uri 'tcp://0.0.0.0:10700'
    --mic-command 'arecord -D plughw:CARD=seeed2micvoicec,DEV=0 -r 16000 -c 1 -f S16_LE -t raw'
    --snd-command 'aplay -D plughw:CARD=seeed2micvoicec,DEV=0 -r 22050 -c 1 -f S16_LE -t raw'
    --wake-uri 'tcp://homeassistant.local:10400'
    --wake-word-name 'ok_nabu'
  devices:
    - /dev/snd:/dev/snd
  group_add:
    - audio
  restart: unless-stopped

Old Android Phone

Repurpose an old phone as a voice satellite using the Home Assistant Companion app:

  1. Install Companion app
  2. Settings → Companion App → Manage Assist → Enable Assist
  3. Configure pipeline to use local Whisper/Piper
  4. Phone microphone and speaker become a satellite

Custom Wake Words

The default “ok nabu” is functional but impersonal. Train a custom wake word:

  1. Record 10-20 samples of your wake phrase
  2. Use openWakeWord training notebook (Google Colab works)
  3. Export .tflite model
  4. Copy to openWakeWord models directory
  5. Configure: --preload-model 'custom_wake_word'

Popular custom options: “Hey Jarvis”, “Computer”, “Hey Home”, or your home’s name.

Assist Sentences

Home Assistant Assist parses commands using sentence templates. Default templates handle common requests:

  • “Turn on the living room lights”
  • “What’s the temperature in the bedroom?”
  • “Lock the front door”
  • “Set the thermostat to 72”

Add custom sentences for your specific setup:

# config/custom_sentences/en/lights.yaml
language: "en"
intents:
  HassLightSet:
    data:
      - sentences:
          - "movie mode"
          - "movie time"
        slots:
          name: "Living Room"
          brightness: 20

  HassTurnOff:
    data:
      - sentences:
          - "goodnight"
          - "bedtime"
        requires_context:
          domain: light

Now “movie mode” dims living room lights to 20%, and “goodnight” turns off all lights.

Response Templates

Customize Assist responses for more natural interactions:

# config/custom_sentences/en/responses.yaml
language: "en"
responses:
  intents:
    HassLightSet:
      default: "{{ slots.name }} set to {{ slots.brightness }} percent"
    HassTurnOn:
      default: "Turning on {{ slots.name }}"
    HassTurnOff:
      default: "{{ slots.name }} is off"

Automations with Voice Triggers

Voice commands can trigger complex automations:

- id: 'voice_movie_mode'
  alias: Voice Activated Movie Mode
  triggers:
    - trigger: conversation
      command:
        - "movie mode"
        - "movie time"
        - "start the movie"
  actions:
    - action: scene.turn_on
      target:
        entity_id: scene.movie_mode
    - action: media_player.turn_on
      target:
        entity_id: media_player.living_room_tv

The conversation trigger responds to Assist commands, including voice.

Performance Tuning

Whisper Model Selection

ModelSizeSpeedAccuracy
tiny.en39MBFastestBasic
base.en74MBFastGood
small.en244MBMediumBetter
medium.en769MBSlowBest

For real-time voice control, base.en balances speed and accuracy. Use small.en if you have CPU headroom and want better recognition of unusual words.

Piper Voice Quality

Voice options range from fast/robotic to slow/natural:

  • x_low: Fastest, most robotic
  • low: Fast, slightly mechanical
  • medium: Balanced (recommended)
  • high: Slower, more natural

The en_US-lessac-medium voice sounds natural without excessive latency.

Wake Word Sensitivity

Adjust sensitivity to balance false positives vs missed wake words:

# Higher = more sensitive (more false positives)
# Lower = less sensitive (may miss commands)
command: --preload-model 'ok_nabu' --threshold 0.5

Start at 0.5 and adjust based on your environment. Noisy rooms need lower sensitivity.

Privacy Comparison

FeatureLocal (Wyoming)Cloud (Alexa/Google)
Audio leaves homeNeverAlways
Internet requiredNoYes
Response time0.5-2s1-3s
Third-party accessNoneAmazon/Google
Works during outageYesNo
Monthly cost$0$0-15 (subscriptions)

The only trade-off: initial setup complexity. Once running, local voice control is faster, more private, and more reliable than cloud alternatives.

Troubleshooting

Wake word not detected: Check microphone gain levels. Too low and the wake word won’t register; too high causes clipping. Test with arecord -d 5 test.wav and play back.

Slow transcription: Switch to a smaller Whisper model. Check CPU usage during transcription — if maxed out, consider GPU acceleration or better hardware.

Wrong interpretations: Add custom sentences that match how you naturally speak. Assist improves as you add more sentence patterns.

Satellite disconnects: Ensure stable network connection. WiFi satellites may drop on congested networks — consider wired Ethernet for reliability.

No audio response: Verify Piper is running and accessible. Check Home Assistant logs for TTS errors. Ensure speaker volume is up on satellite device.


Summary

Local voice control with Wyoming provides:

  • Sub-second response times
  • Zero cloud dependency
  • Complete privacy
  • Works during internet outages
  • No subscriptions or recurring costs

The setup takes a few hours, but the result is a voice assistant that respects your privacy while controlling your entire smart home.

Resources

Enjoyed this post?

Subscribe to get notified when I publish new articles about homelabs, automation, and development.

No spam, unsubscribe anytime. Typically 2-4 emails per month.

Related Posts