Possible Local Voice Control Option?

I probably misspoke. Just assumed since the connection from the iPhone to HomeBridge is local and HomeBridge to Hubitat is local the voice control was local. I forgot about the fact that Siri might be using Apple cloud service to interpret the commend.

2 Likes

I started with just my iPhone. Then setup an old cracked screen iPad as a home hub so I could use Home remotely.

I seem to remember back in the 90's talking to my Windows 3.1/95 computer with a headset on and telling it to launch Lotus 123 for Windows and Wordperfect for Windows....

Dragon naturally speaking typing not long after that.

Here we are 25 years later and no one has a local voice system that could, in theory, if necessary, use modern dragon naturally speaking, running locally (hopefully?), and be able to tell it "Good night" and not need multi-million dollar back end datacenters at Amazon's AWS hosting facilities?

I would train it for my family's voices, then lock it to not allow any other voices without an SMS to my phone or something..

I cant believe such a product does not exist..

Local voice control does not need to do internet searches, GPS vehicle navigation, cloud streaming of any kind.

Local voice control should not even require speech recognition, but could with something like what Dragon Naturally Speaking was back in the stone age.

1 Like

Dragon is still around and another alternative is Braina - Artificial Intelligence Software for Windows

Google definitely does a lot of this locally. For example, the offline transcribing you can enable in the keyboard. My Pixel 4 XL came with all of this from the factory, and it does an astonishing amount even in airplane mode.

Perhaps there's a way to use old Android phones on Wi-Fi, but block their Internet access to accomplish this?

Otherwise I'd vote for something like Dragon. I definitely used their products back in the day. It was hella accurate back in the day, surprisingly.

I'd pay a lot for local Voice control. Even if it was just to a local pc. I think Homeseer has something like it, but that just might be announcements.

as others have said you are not going to easily find the hardware and computer power to do local voice control, and if you do it is going to be costly.

2 Likes

I don't think it would be hard to do voice recognition. What would be hard, is having the voice control understand human speech. It would be easy to have specific phrases that trigger something, but then you need to know exactly what to say to get it to work.

you could build a local repo of MP3/WAV files with various keywords (door, window, open, close, intrusion.. ) and play them in sequence with an HTTP post based on status to a local speaker..

Has anyone taken a crack at this Open Assist?

1 Like

Oh, please, somebody figure this out for us! I was also one rather hoping that Apple's home thingy could be made local, but clearly Siri has to link to the cloud to operate. I'd be happy to press a button first, and then speak commands, rather than having to use a device with a screen, to access lights, music, etc. I feel like it would be a lot easier for kids, guests.

If I figured it out, it wouldn't be free, it would be very expensive. If I found a way to recreate AI that uses neural nets and machine learning and can be powered by something as simple as a raspberry pi, I'd be a billionaire! The reality is, anything local is going to be terrible. Yeah, I do remember Dragon Naturally Speaking... I also remember trying it in middle school and having a paper mostly of expletives, not because I was some kid thinking it was funny to make my computer curse, but rather because the thing was so frustrating and poor at transcribing my dictations that I would often curse at it until it ended up in the trash can. It was a cool gimmick, and certainly "better than nothing" as an assistive technology for those unable to type, but in practice, it was pretty terrible and of limited utility.

Worse still, speech recognition is a challenging problem, but solvable. The hard part is context-sensitive speech understanding. Recognizing you said a series of words and translating the to characters is one thing. Understanding those words is another. Babies can repeat the sounds their parents make at a very young age; actually understanding the grammar, syntax, and meaning of those words is another story. Same for computers. Recognizing that "Turn on the bedroom light" "Switch on the bedroom lamp" "Turn the bedroom light on" "open the bedroom lights" "Activate the lamp in the bedroom", "I'd like the bedroom lamp to be on" etc., etc. all mean the same thing is a very complex problem that currently requires computers more powerful than what you and I have in our homes. Maybe one day, but not today.

4 Likes

You can use TTS instead, directly from RM. But the OP is looking for a solution that will do speech or phrase recognition locally.

3 Likes

This is probably not quite what the OP was looking for but I have been messing around building a local AI that just responds to commands that it hears. It’s nothing special as my programming skills are on a novice level but just a little program that will run on windows. For example it will respond to greetings & pre coded phrases. If you ask it what time is it? It will tell you, If you ask it to stop listening it will stop until given the wake up command. The interesting part is I have been playing around with the API’s and now it will turn on all my zigbee lights & plugs. I haven’t looked into different light setting yet that is a little further down the road. This is all local & no internet requirements, just needs to run on the same IP range as the HE. Of course you can always code the cloud api if you wanted to do this remotely. Unfortunately this is all hard coded into the source program as I haven’t learnt that part of programming so is not going to be any use to anyone else.
The point I am probable getting at is that if I can achieve this with my limited programming skills why hasn’t someone who can program not already done this?

I had a program like that back in the day, did the same thing. it was never accurate , and never did anything right.

BUT, If I could get that to work, and a simple BAT file to execute something, I guess it's possible. hahah.. might try that for fun. I do not expect anything really to come of it, but it might be fun.

I'm in the process of developing something, but it's very much tailored to my environment at the moment. I've tried all the usual local STT options and they're all pretty much too imprecise for home automation for me. Except for one that I just found:

I have it running, using cloud services as backup if it cannot guess correctly. If it fails, it tries IBM Watson next (because you get 500 min/month free) and the gcloud which is almost 100% accurate (but only 60 min/month free). This means that it can take a few seconds if the local one doesn't work. I also store the failures with the correct STT from watson/gcloud so it "learns", i.e. when it makes the same mistake it doesn't have to go to the cloud for the correction to the transcription.

I use porcupine for wake word detection as it just works.

It's held together by string though with python and perl scripts (I'm a perl coder otherwise it would all be in python). So it's nowhere near possible to release anything.

I use TTS through the hubitat (which obviously uses the cloud) but not for anything serious (time/date/outdoor temperature). Everything else uses pre-recorded responses (using the hubitat TTS and then saving the mp3 files so it all uses the same voice) or sounds.

Each room now has a Pi (4 x RPi 3b+, plus 1 Pi Zero) that fire off their recorded speech wav to a virtual server with more oomph that does the STT and which coordinates with Hubitat via the Maker API.

So, it can be done after a fashion, but my attempt is certainly not plug and play :smiley: or ready to share.

I did try Python but decided to go down the c# route. No BAT files are run, just an exe running. You simply tell it to turn on lounge light & it will by using local API coded into the program, no internet required. I am now looking at how a user can add commands whilst the program is running as this is only written to a text file. IE if you add a new command & a response to that command it will permanently store it. I may have to add a pre word to the commands as it randomly talks to me when it hears a sound but apart from that if you speak clearly its very accurate in translating spoken word. I haven't used Hubitat TTS so will look at that later.

I'm thinking the voice recognition should be easy and accurate given C# tools because the vocabulary would be quite limited = device names or labels, commands or attributes, numbers to a 100 or so - that's it! A few hundred words even with a user-defined library. And the vocabulary can be built uniquely from each specific Hub's database. Not familiar with raspberry pi but that would probably easily do. As a new owner, what I don't understand is all the other parts = the connection to the hub and whatever you use to get the audio data to a computer??

As an update, I did create a local Windows program that uses C# and Maker API to grab all the hub’s devices (id’s, labels, and names) and then create a very limited grammar (which means it’s fairly accurate) from those along with the numbers from 1 to 100 – and I can turn on, off, and set levels of all my devices by voice. I doubt this fills much of a need, but yes, local recognition and control is possible – you just need Hubitat, a mike and a PC. Mine also talks back to me using TTS :blush:

1 Like

https://rhasspy.readthedocs.io/en/latest/

Hmm not it imbed?

1 Like