AI with LLM's via Ollama for Local Voice Assistants

So based on a few other threads I have started to look at how to use Ollama with Hubitat. The hopeful outcome will be a fully local voice assistant.

I have a working environment of Ollama with a few LLM's that I am working with. The goal was mostly to learn about them, just see how useful they can be and how they differ. I have experiemented with some LLM's with as few as 1 Billion parameters, up to a 67 Billion parameter model. In the other threads that are discussing Future Proof AI, or local voice assistants. I had stated that it looked like this shouldn't be to difficult to do so i wanted to give it a try and am starting this thread to produce the outcome of those testing.

That said I have made some interesting progress all ready. I have created a Ollama Driver that i can use to have a conversation with Ollama. I have also added some functions to the calls so that now I get a response that includes the details Hubitat needs to act on. So that means i can use the device in Hubitat to have a chat with Ollama and say something like "Turn off the Lyra Lamp in the study" and hubitat gets a response that indicates something like "state" = off , device = Lyra Lamp, room = Study. This is pretty cools because i can now write the code to take that information and process it on any device.

One thing that I have found that i was wrong about was that i was hoping to use a fairly small LLM like llama3.2:latest with 1 or 3 Billion parameters that can easy run on small hardware. Atleast for now i have found i needed to jump up to a 8 Billion Parameter model call Qwen3:latest with some reasoning to improve reliability of the function identification. So far since making that change it has worked flawlessly. More processing is involved, but that larger model does make a huge difference. I expect this model will use my 8GB GPU, but anything larger will likely be a issue.

I neeed to work on a frontend app for the integration to manage config and to handle some of the additional logic that will take the returned data and actually act on it.

I can think of a few functions, but for more niche functions I may need help identifying them. I am really curious about everyone thoughts. Is this even something that we would want for Hubitat?

10 Likes

Absolutely! It just comes down to cost, reliability and speed. Right now I use all Google devices and use them for all voice control when necessary. Cost isn't bad, reliability is spotty at best and speed can be slow at times.

Would love to see some data on how this would compare.

2 Likes

I only understood a fraction of what you talked about, but I’ve done local voice recognition using a PC and tying that to Maker API with some success. What hardware is needed to run yours? What hardware is picking up the voice commands and maybe delivering voice responses?

I would love to be able to use a local AI assistant on Hubitat. That's for sharing your thoughts and making this thread.

What I have been working on is far from a complete solution. At the moment it only involves the Hubitat Hub and a VM/Computer running Ollama. My focus has been all about enabling the Hubitat hub to talk to an Ollama instance and start a chat. Then in the chat include tools to return back to the hub what needs to be done. The driver actually works now and started to work on a app for setup and management. There is still allot to be done before I can release any code and who know how fast it will be.

This is far from a complete solution when you are trying to replicate a actual voice assistant. That said if you have a way to have tts and stt through Maker API maybe these two things could work together to get really close. I was thinking about adding a endpoint to the app for direct calls, but Maker API is a viable option as well. I haven't even started to dig into the TTS ans STT component.

My Ollama instance is running on my server in a windows desktop VM. The VM has access to a RTX 3050 GPU to help accelerate the LLM processing. I would like to be able to run it on a Jetson Nano Orin, though i have concerns about if that will be Accelerated the same since the memory is more limited. The LLM i have settled on for the moment Qwen3:latest is around 5GB in size so need to be considerate of memory needs for that with related to GPU.

The current test setup can generate tokens in the mid thirty per second. I would love to test it on a Jetson Nano though.

1 Like

This sent me looking for details on the home assistant voice preview to see what they were using for the AI bit. But it looks like they’re currently offloading it to the cloud or a local ollama instance.

Right. I think two of those just won't be comparable with a local solution. I don't see a way to make it more cost effective then say Google or Alexa. They have both shear scale, and the fact they can be loss leaders for Amazon and Google. Both of those companies can also just through hardware at it to, while we don't have unlimited CPU at home. We will likely need to spend a few hundred just for a machine to run the LLM then each actual speaker/mic device will likely be twice as much if not more then the comparable device from Apple and Google.

Speed could be a challenge as well since running a LLM decently will require a dedicated GPU.

This is precisely why i started looking into it.

4 Likes

Absolutely — this definitely piques my interest. I’m currently building a Python-based system that uses the OpenAI API to help manage aspects of my business. It’s still a work in progress, but it’s already handling basic email-related tasks such as identifying support requests and deciding whether to escalate them, hold them, or route them appropriately.

I’ve also paired the system with a spare Hubitat C7 hub—not for traditional home automation, but to support the AI system. For example:

Presence awareness: Hubitat helps the AI determine if I’m available, away, or taking a break, which affects how certain alerts are handled. Heartbeat/status monitoring: Using Rule Machine, Hubitat monitors the AI system’s health and can notify me or take action if it stops responding. Alert routing: If a critical issue arises and I’m unavailable, Hubitat can trigger fallback alerts via IFTTT or other smart notifications I already have in place.

Using Hubitat as a kind of intelligent local sidekick has added a layer of real-world context and reliability to my setup that purely cloud-based tools can’t offer. I’m definitely interested to see where your personal LLM project goes. Something like that could be a great complement to this kind of hybrid system—especially for on-device processing, cost control, or use in disconnected environments.

3 Likes

I just figured I would post an update for those interested.

First and foremost, This has been a lot of one step forward two steps back. I think some of my assumptions with how to use the function methodology in Ollama was flawed until last night. Simply put my hope to use a driver to facilitate the communication directly to the Ollama server instance was doomed from the beginning. You need something in between the driver and Ollama. I suspect this would have been obvious to someone that does development all the time, but that is what it is .

The good thing is I was already working on the frontend app up to manage certain things. I shifted the management functions of the chat to that app and that seems to have given me the needed space to work from for all of the chat functions. Because of that shift and the changes in how the chat is managed, I believe i know have chat history properly implemented and so now context will be retained during a conversation. I have also now successfully looked up device states on my dev hub using functions configured in chats with ollama.

At this point i think i need to start working on a few more functions before I give any of this out. I want to enable some functions to actually control devices like turn them on and off and set certain states. If anyone has any requests for functions let me know.

@tohm I will see about adding a presence detection function as well since that is in your use case.

5 Likes

I just put what I have out on GitHub if anyone has a ollama instance and wants to try it out. It is just a Driver that acts as the chat interface and then the main app. You can download and install the app and driver files by themselves or use the bundle to download and install them both at the same time.

There are two parts to this setup. First there is the main app which is what really does all of the heavy lifting. When you set it up you will need to give it your Ollama server's ip address and it's port. By default the port is 11434. So in my case it is "192.168.1.12:11434". Click on Next to save this setting and go back to the main screen.

Click on the button to go back into the "Ollama Setup menu" because we need to select the LLM Model to be used. Select the button to go into the "Ollama Setup Menu" again and select the model. I am using "llama3.2:latest" or "gwen3:latest". If you already have some models downloaded to Ollama they will be available to be selected in the drop-down menu.

If you don't have any Models pulled down you can pull them from the "Ollama Setup Menu" page in the smartapp. You can either use the ones I mention above or go to the Ollama Model Search Page to get the needed name. This is where your individual hardware can potentially make a difference. If you have a more powerful GPU with More memory you may want a model with more parameters. Simply put the name on the line in the Manage Models in Ollama section and click on the Pull Model button. It will take a bit for this to process, you will see the hubitat spinning circle in the upper right corner of the app while it is processing. If the model does not appear in the Model drop down list when it is done processing click next and come back into this menu.

Once the model is downloaded and selected you are ready to click on next or done until returned to the main hubitat menu and now you can communicate with your LLM

During the install process a Ollama Device is created on the hub. Now you can go into that device and start a chat. This is purely text based right now as I haven't even tried to do anything with a STT or TTS yet.

One thing I will point out is that this is very chatty with somewhat large chunks of data being pushed around. So be aware of that. The way context is stored is within the chat. The longer you chat the larger the chunk of data being retained and pushed back and forth with each part of the conversation, so keep that in mind. That said using the option in the Ollama Driver to start a new conversation is a good thing. It will keep the context from getting to big, and reloaded context related info from Hubitat. I am a considering putting in some safeguards to limit the size of the context but the models themselves will do that naturlally in a way since they have their own limits.

I am also not storing any of the chat info in the Hubitat DB. There are a few reasons for this, but trying to minimize performance impact is the biggest one of them. These large strings could be hell for the hub's DB.

So have fun with it. Let me know how it works , if you have any issues please let me know. I believe with the last update i made to help with additional context from the hub it should be fairly reliable to process the updates it can do.

As far as what it can do. Well, I have been able to

  1. Turn devices on/off
  2. Adjust brightness levels
  3. look up a device's Temp
  4. look up a device's Humidity
  5. Look up a devices Presence

I know there is allot more to do and this is just the beginning. If you have something specific, you are interested in control or lookup wise let me know. At this point it is just a matter of identifying how the LLM would pass it along and then completing the function for it. I also suspect i will need to adjust or expand the context that is being passed to the LLM when a new conversation is started.

6 Likes

I have found a issue with processing multiple commands at once like turning off two devices. I will look at that soon to see if there is a way i can make that more reliable. It looks like it is only executing once for some reason.

***Update ***
:man_facepalming: Well toss this up to the model being used. I had switched back to llama3.2:latest for a bit since it is faster. Apparently it didn't have enough reasoning to understand there were two devices to turn off. I switched back to gwen3:latest and it turned both of the lights off. :man_shrugging:

I should also mention you need to use a model that supports tools and it seems that for some reason even though deepseek-r1 is lasted as such it doesn't seem to work.

The more i have been dabbelin with this i highly suggest qwen3:latest as the LLM of choice. The others just don't seem to be as reliable for some reason.

1 Like

I have posted some more updates to the integration app.

It will now be able to look up power state of a device, set color and set Color Temperature.

After testing a few different LLM's i wanted to recommend using qwen3 if you are having any issues. I have had a much better experience with it over any other i have tried to use.

1 Like

Does anyone have any recommendations on a hopefully simple guide on installing and configuring Ollama on a spare computer?

Windows or Linux?

The easist method is to go to the Ollama download site. Select the OS you want to install on.

If it is Windows you will download the latest version and then once downloaded, Run the downloaded installer. Once installed open up the settings and make sure the option "Expose Ollama to the Network" is turned on. You may get prompted to allow a access through windows filrewall. Allow the access. At this point you can open a windows terminal and use ollama commands to perform various tasks. You don't need to though as you can use the pull option from the integratin app.

if it is Linux you will get a command you can run from a Linux Terminal. Copy and past it on the terminal of the Linux system you want to use and press enter. Once it completes Ollama is installed

Now it needs to be updated so you can access it on the network. Submit this command in the terminal to edit the configuration file

sudo vi /etc/systemd/system/ollama.service

Make sure this line is in the config file in the services section

Environment="OLLAMA_HOST=0.0.0.0:11434"

Once that is done submit these commands to restart the service

sudo systemctl daemon-reload
sudo systemctl restart ollama

Check the ollama status by running this comand

sudo systemctl status ollama.service

if the linux system has the UFW firewall installed submit this command to open up the port as as well

sudo ufw allow 11434/tcp

All of that should get linux ready.

At this point weather it is linux or windows you can use the Ollama command in the Terminal windows.

Once all of that is done you should be able to go to http://host IP:11434. You should get a message saying "Ollama is running"

To test performance in the terminal window submitt the "ollama run --verbose". You will get a prompt that you can now sumbit a chat. You can ask a simple question like "tell me a story" and press enter. it should process the request and when it is done with the response it will give you performance info.

2 Likes

I have now picked up a Respeak Hat for my raspberry pi and a speaker to seeing how possible it will be to get that working to talk to the hub. The idea is to make this something that can truely use a wake word and talk to the LLM. And then process the response. I will likely need to create driver for this as well.

I am also thinking about switching to some other methods to process requests that may work better with smaller LLM's. The gwen3:latest model has actually worked pretty well for me, but it seems to very conversation heavy and adds allot with it's scrolling thinking responses. I am thinking perhaps instead of depending on the LLM to extract the meaning and call the right a function, to use key words extracted by the LLM as structured output. This is basically moving some of the logic out of the LLM Processing to Hubitat.

The idea being something like having the LLM search for certain key types of words that indicate a meanin, action, or task. Think of like having the LLM look at a sentence and pull out words that inciate you want something done. The example would be looking for words like set, change, adjust to determine a action needs to take place. Then look for key attributes to adjust like tempeature, switch, speed. Once those values are extracted you can act on them. But this can get very complicated to address possible combinations, and then also can create limitations

This change though could have a large improvement on performance as i believe it would allow for a much simplier smaller Models to be used possibly. I need to test this out. I will need to find a way to test this out.

2 Likes

Thanks for taking on this project! I’m really interested in seeing how this pans out. I’d love to help in any way I can. I’ve been playing with LLMs for a few years now locally, specifically using Ollama to serve them up in my environment. I have a couple of Mac Minis that I use to run ollama on my network. The 64GB M4 pro is very capable when running 32GB models, and even my M2 16GB works great with 8GB models.

I like your approach at breaking your efforts into small chunks, getting them working, and moving forward. Being able to do a simple chat to start is a big win in my book.

That said, I’ve been trying to get your integration working on my hub (old school C5, latest firmware), and I’ve been having a few issues. Maybe you could point me in the right direction to get chatting with ollama working.

Here’s the config:

Device status:

Ollama running on Mac Mini default port:

Logs:

The successful message in the logs happened when I hit DONE in the app. The groovy error happens when trying to chat or start a new conversation.

Let me know what you think, and thanks!

The first thing that stands out to me is you are using gemma3n:e4b for your model. I don't think that model supports tools and as such may be what is causing the error.

Try downloading gwen3:4b if you want to stay small or gwen3:latest. There are some others that support tools, but i haven't had as good of a experience with them.

If my thoughts in my previous post work out with the use of structured output, models that dont support tools may be possible. But in my time working on it today it wasn't super promising. Not exactly because it didn't work, but because it wasn't consistent with the smaller models i was testing with. It may not be a issue though with a larger model.

The tools enables function calls with parms. That is very important at the moment. Basically the LLM passes the function with parms to Hubitat. Then it extracts it from the response message and calls the appropriate routine with the parms to act on.

I have made a few tweaks to the code as i was testing and for some enhancements to start to clean up a few things that were ruff. I will try to publish in the morning some time.

It will include some enhancements for the other method I was thinking about with structured output fields instead of functions. That will potentially allow the use of the model you were looking at as well. It may not be as functional for changes though right now as i haven't put the time in on that yet.

Tomorrow my respeaker pi hat will arrive so i should be able to start testing with it soon to.

3 Likes

I’m looking forward to checking out your updated version.

As you suggested, the groovy errors were in fact due to the model I was using. I pulled and used the qwen3:8b model, and it appears to work as expected, at least looking at the logs and status messages. The problem is that the device isn’t actually controlled. I told it to “turn on toilet light” and it said that it did, but the toilet light doesn’t actually turn on. Not sure if there’s something I’m doing wrong, but it doesn’t appear to be erroring out now.

Device chat:

Toilet light status:

Logs: