You can do all of those things with the node-sonos-http-api. You have to run a separate machine that can run nodejs, but otherwise it's all local and fast/reliable.
All of the grouping and playback control are straightforward. I wrote a driver (additional discussion here) that does those things. The nice thing with that API is that you can always address a speaker as if it was an individual, and if it's in a group it will also control the group. And the API implementation allows you to address any speaker in the group, so it simplifies things quite a bit.
The crossfade thing may take some more research. You could brute force it:
- pause group
- ungroup the speaker where you want TTS
- play the TTS (probably easiest to use the built-in Hubitat integration for this step)
- play the group
- re-join the TTS speaker to the group
...or there may be a way built-in to do it with that API that I'm not aware of.