[BETA RELEASE] Sonos Edge-TTS server app for text to speech

This project provides a local, secure, Sonos-compatible text-to-speech (TTS) system for Hubitat Elevation using Microsoft’s Edge-TTS neural voices. It is designed as a modern alternative to Echo Speaks, especially after recent Amazon / Alexa changes that broke cookie-based authentication and device discovery for many users.

If you have Sonos speakers and want fast, high-quality TTS without relying on Alexa logins, cookies, Heroku, or cloud services — this project is for you. You can of course just use any of the built-in voices, but all of them sound very robotic compared to the more refined voices used here. This project uses the natural sounding voices from the Microsoft Edge-TTS project. It utilizes the edge-tts Python Module.

:sparkles: Features

  • 100% local (runs on any Linux machine: DGX, Pi, NUC, VM, etc.)
  • Uses Microsoft Edge-TTS neural voices — free and high-quality
  • Works with any Sonos speaker connected to Hubitat
  • Per-Sonos virtual TTS devices (one virtual per physical speaker)
  • Simple speak() support:
    • speak(text)
    • speak(text, volume)
    • speak(text, volume, voiceId)
  • Optional per-call volume override with automatic volume restore
  • Optional per-call voice override while still supporting a default voice
  • Secure shared-secret token for /generate endpoint
  • Tiny footprint — just Flask + edge-tts + ffmpeg
  • No cookies, no Amazon login, no Heroku deployment

:triangular_ruler: Architecture Overview

High-level flow:

  1. Hubitat (Edge TTS App) sends POST to /generate on your Linux box:
    {"text": "...", "voiceId": "...", "token": "..."}

  2. The Python server:

    • Calls Edge-TTS to generate speech
    • Uses ffmpeg to normalize the audio
    • Writes a small MP3 file named tts.mp3 on disk
  3. Hubitat then calls playTrack("http://<server>:5005/stream-tts.mp3") on your chosen Sonos device(s)

  4. Sonos connects directly to the Python server and streams tts.mp3 via /stream-tts.mp3

The /stream-tts.mp3 endpoint responds like a proper MP3 file service:

  • Content-Type: audio/mpeg
  • Content-Length: ...
  • Accept-Ranges: bytes
  • Optional Range support / 206 responses

:open_file_folder: Project Files

Place these in your project/repository:

File Description
server_dgx.py Main Python TTS server (Flask + edge-tts + ffmpeg + streaming)
edge-tts.service systemd service unit to run the TTS server on boot
edgetts.groovy Hubitat App: manages Sonos devices and sends TTS requests
edgespeaker.groovy Hubitat Driver: virtual TTS speaker devices for each Sonos
README.md This documentation

:fire: Quick Start Guide

  • Install the server_dgx.py application on a suitable Linux/Unix based platform on your local LAN
  • Run the app in a virtual Python environment or optionally set it up as a service
  • Load the .groovy files onto your Hubitat hub
  • edgetts.groovy goes into the Apps code area
  • edgespeaker.groovy goes into the Drivers code area
  • Add a new User App of type Edge TTS type, fill out the required fields
  • A text to voice Sonos child device will be created for each Sonos you select
  • Use one of those devices instead of Alexa Speaks devices anywhere you need speech

Detailed Instructions

Available in the README.md file in the repository


:clipboard: Examples

The parent App looks like this:

The created devices will look like this. Only the first field is required. If volume is specified, it will be used, and the prior volume restored after the text is played.

5 Likes

[EDIT] Update posted.

The en-US-CoraNeural voice is no longer available so the app will fail if you select that voice. I posted an update today that removes that voice from the list and checks for valid voices in the main function.