Reliability issues and firmware updates

The change was to a specific element in the web UI and how it's data was being updated in the background when a given browser session was left running, nothing app and or driver related.

1 Like

Like many, I have experienced slowness in the past that I'm still working through. I've was able to identify quite a few different contributors over time. Now, some of these may have been collateral damage due to other root causes, but changing/eliminating helped in my situation. Most recently, I've reconfigured power reporting on all of my supporting devices. I have a few RM rules that are dependent upon power levels for various automations. Unfortunately, some of these were very chatty, causing rules to fire needlessly at an excessive rate.

Items that, for me, seemed to have the biggest impact on performance were:

  • Device Polling - Have a few older GE switches in key locations that required polling to ensure current state. Moved these switches into less-critical locations and changed to utilize an RM rule with a much lower frequency of polling.
  • InfluxDB - Disabling this, for me, improved the length of time between necessary reboots. Although now I believe that this was just as likely tied to either performance issues on the DB host and/or my chatty devices hammering the process. I may end up adding this one again to try to mitigate some NodeRed concerns.
  • NodeRed - High frequency of calls for list of devices with capabilities. Due to the number of devices that I'm currently using, this single call was increasing App screen response times by 3-4 seconds. Have reduced the frequency significantly and will likely either pick specific devices to query or try utilizing the InfluxDB integration again for these devices.
  • Chromecast Integration - Honestly, this one was yanked at the same time as Device Polling. I just didn't need the integration.
  • Chatty Devices - As mentioned above, focus on chatty devices has helped significantly. I don't believe that the chatter was significant enough to cause issues with my meshes, just what I was doing with the information. Chatty devices and DB/Rule integration, for me, appeared to negatively impact performance.

I have also had issues with horribly written rules, but those tend to manifest themselves quite quickly when they occur. I'm currently at a point where performance is good and will see how long I will run without requiring any reboot of the system. Doing far more now with HE than I had ever done with my previous system.

3 Likes

I totally get the frustration. And I will say that you are correct on the presence detection of the Hubitat phone app. Itā€™s bad. It simply doesnā€™t work for many users. Thatā€™s why I spent a lot of time developing alternatives with my iPhone WiFi Presence driver and Combine Presence app.

2 Likes

In my case maybe not 30'' because I reboot the hub before but is near that for sure.

have to disagree here :wink: for me its the best GPS presence detection app i have ever used by country mile (after teething issues when it first came out)

1 Like

Different platforms have different ways of dealing with faulty devices. I cannot speak for Wink, never used it.. But I can say with absolute certainty, based on months of work I did with the Smartthings team on Schlage locks, the way the ST hub deals with bad devices is to reboot the Z-Wave radio.. Yep... When a message isn't received and ack'd within a period of time, the ST hub reboots the Z-Wave module. Of course, that has positive benefits as it bring up a refreshed Z-Wave stack but it also causes any messages/commands send around that time to be lost as collateral damage.

I don't think the HE hub reboots the Z-Wave radio when a device fails to respond. This may explain why the Schlage locks, especially older firmware models, do not work as well on Hubitat. But that's purely speculation on my part.

I am curious as to how you confirmed this, especially for Z-Wave. I don't think anyone on this forum, except for HE staff, has access to those kind of tools. I wish I did, but cannot justify the cost.

If devices are dropping off the mesh, then one cannot realistically expect rules to work 100% of the time. It's my opinion, but I suggest stepping back and objectively trying to determine exactly where the mesh issues are. For example, I see that you have some Z-Wave repeaters, I created a driver that has some built-in diagnostics that can help you see if the repeaters are working. It says it's for Iris, but it'll pretty much work with any Z-Wave Plus repeater. It has a frame test that you can run repeatedly that will report any packet loss between the hub and the device.

That might be a place to start.

2 Likes

See that's what's so crazy! :wink: For me, it is only 20% reliable. It regularly stops reporting arrival/departure for weeks at a time. And I'm actually an app developer. I've created several iPhone apps with GPS tracking, so I know 100% that it's not the permissions/settings on my phone.

Would leaving browsers open while not using the system cause slowdowns? I appear to have less problems when I close any tab related to my HU web UI.

@brianwilson: Strangely enough, I have noticed this also.
However, it's just an anecdotal story, unless I can reproduce it, which I can't.

Considering while the browser is open it's constantly refreshing data from the hub AND there's a constant data flow of checking for updates... I would say yes closing browser windows/tabs would reduce the resource use of the hub.

I am currently running a scheduled hub reboot daily which seemed to help a little with hub unresponsiveness but I'm not sure if it's done anything for my locks. I don't have any of the expensive Z-wave networking tools but have blanketed my house with repeaters and can confirm with the Aeotec Z-wave plus repeaters that they are getting a good signal based on the signal indicator light. I don't have locks dropping off the mesh so that's a separate issue but I do have a Hue outdoor motion sensor that I've primarily had issues with disconnecting.

Iā€™m having great success combining Hubitat presence, Geofency and Apple HomeKit (via Maker api) using your Combined Presence. It hasnā€™t goofed on me or wife since running Combined Presence.

2 Likes

The Z-Wave toolbox goes on sale very now and again and it's not that much when you think about it for what it provides if you have a lot of z-wave. A really cheap solution though is to use the SiliconLabs PC Controller tool with a usb stick which can then show HUGE amounts of information and signal info about your network. The same can be down with Z-Way in an easier to use format. The SiLabs PC Controller is part of their Z-Wave Embedded SDK so.

If it works well for another platform and not HE, it's clearly not the device.

While hard to measure, some due diligence goes a long way in discovery the proper mix of repeaters. There are ways to test the mesh using a zwave stick, it's painful to unpair all the device and re-pair them to run the tests, but it is possible.

But by and large, If it works well for another platform and not HE, it's clearly not the mesh.

Yup, I'm sure. Per my previous posts, I can see exactly where the delay is when it occurs due to prodigious logging and lots and lots of time spent troubleshooting. Not to mention, the sluggish UI and the zigbee radio dropping offline as the final symptom of an overworked hub, before crash.

If it works well for another platform and not HE, it's clearly not the device.

Well, as @srwhite said, ST's way of dealing with buggy z-wave locks was to reboot their radio and lose other messages. So "works well" might be a subjective call. I had ST before Hubitat, and while I didn't know why it was happening, I did notice that every time I triggered my z-wave locks, all my other devices would stop responding for a bit. Now I know why. It didn't work well for me.

2 Likes

These things are why I don't discount your experience. That is definitely not right, and it can't be explained by mesh or smart device wonkiness. That said, it could still be something that isn't fundamental to the HE firmware. It could be a Rule Machine rule that calls itself recursively, for example. I have no idea what rules you have set up, and I'm not saying you made a recursive rule. But this is the kind of thing the Hubitat support team is trying to filter through. They've given us all a platform with enough extensibility, with enough rope for us to hang ourselves.

(Though that does give me an idea for a feature request for @bravenel: Can Rule Machine do some checks to prevent calling yourself recursively? Or at least limit the number of times you can call yourself recursively? Might be a good set of guardrails for users who aren't software professionals and don't regularly have to think about something like "recursive" and stack overflows.)

How does a rule call itself recursively?

It's very simple to find problems of a related nature: turn on logging for the rule. A run-away rule will have run-away logs.

Isn't there an action to trigger another rule? I've never tried it, but I assume a rule could trigger itself?

No, it can't run itself. But there are plenty of ways one could do crazy stuff -- by definition. Can't prevent stupidity.

6 Likes

"Science finally proves you can't fix stupid"

If you look around the room at the gaggle of relatives and strangers occupying the space and cannot immediately identify who the stupid one is, it's a good bet the stupid one is you.

This is not only experientially accurate as a joke, it's literally true as scientific fact.

Meet Cornell University psychologist David Dunning, our new most totally favorite scientist. He's spent 15 years studying stupid people.

5 Likes