Reliability issues and firmware updates

Who knows, maybe that secret society exists and you and I are just not part of it :wink:

4 Likes

OIf we can't have SSH, perhaps allow us to offload rule processing to a virtual image/Pi or something that scales better than requiring us to buy more hubs. I'd gladly pay for a virtual license that I could run on my own hardware and upgrade resources as my automation requirements grow. Or give some power users temporary SSH access to help work out the reboot issues since Hubitat is struggling at doing it alone.

I know that this was brought up before and it was a clear no as this is not part of their strategy. I would do it too but I can also understand it from Hubitats perspective. And I also don’t want to end up in another Homeseer licensing world.....

4 Likes

Hi there, sorry you are having such a hard time. I think that your hub performing better after going back to Rule Machine 3.0 is just a coincidence. Being the first to report this, we would like to further investigate the root cause of your slow downs. While we are aware of some hubs slowing down over time, many reports we received, have been resolved by addressing other issues such as mesh problems. If you didn't do so already, please send us an email at support@hubitat.com.

1 Like

I no longer have slowdowns after switching back to rule machine 3.0. I'm not planning on re-writing all the rules again back to 4.0 so that it can be investigate. Maybe a coincidence but the issue was resolved by this happening. Appreciate the offer though.

Glad to hear that you no longer have problems. If you experience any issues, please don't hesitate to reach out.

And rebooting regularly, no?

1 Like

I'm no longer rebooting at all (previously daily). On none of the 3 of my 4 hubs. 1 hub isn't currently operational at the moment

My hubs are on 2.1.7.127

There was one change made in 2.1.8.117 that may resolve one cause of hub slow-downs. We don't have enough information to be more definitive. One hub that we have monitored closely was slowing down after a couple of days, evidently from a resource leak. That hub has now run without slowdowns since that update, without reboots. We are still exploring this issue. The nature of the problems that have been reported cover a wide range of possible causes.

15 Likes

@bravenel,

what can we do to assist you? I am sure you need more information for these cases and just hearing about "it is still happening" won't help much. It is good to hear that there was something don 2.1.8.117 but I can say there is probably more. Here is my C5 hub "Loading Apps page" latency and that has 2.1.8.117 installed.


I am more than happy to assist with anything you need. I'd even would give you unrestricted access to my hub if needed. There is really nothing running on there that is confidential.

4 Likes

Can you tell us what app/driver/etc the 2.1.8.117 change was in? Some people might try disabling/removing whatever it is and hopefully give you more data points. Thanks!

1 Like

My hubs have been better since the update I think.. The last root'n toot'n reboot on my main hub was 4 days ago or so. :crossed_fingers:

For optimal processing @Ryan780's idea of customized apps vs the traditional apps sounds compelling... I may have to play around with that.

As I said, there is no conclusive answer yet as to the cause of hub slow-downs. We continue to investigate.

i get that and I didn't mean to imply impatience. I was more wondering if there is anything you want us to do, as in data points, etc.

7 Likes

I agree 100% about support. I started a dedicated thread to discuss other people's support experience and I think there's a big percentage of Hubitat users that would agree support is lacking and that one person handling support isn't sufficient for a company trying to be a legit competitor in the smart hub market.

I am so frustrated with my hub experience and I'm about ready to pull the plug and try something different. I came from Wink which was 100x more reliable and I moved on because of what's going on with the company and all the uncertainty about it's future. At this point I wonder if I made the right decision and definitely feel like I should've went with another solution.

I haven't changed anything on my hub in over a month except for the recent firmware updates and now I've got rules that flat out don't work and I can't seem to figure out why and can't get any assistance from support.

1 Like

I’ll put down 1 vote for my hub being truly reliable. I believe the people who report reliability problems, and it sounds like there are multiple causes, but I don’t think it’s affecting the majority of hubs.

My recommendation for anyone trying to debug their own system is to figure out where the reliability issue actually is:

  • device reliability. There are some truly bad devices out there. Some straight have buggy firmware.

  • mesh reliability. Z-wave is hard to get right. Especially the battery powered devices.

  • code / automation reliability. Groovy and Rule Machine are powerful enough to let you crash your own hub. The vast majority of issues I’ve had have been in my own code and rules.

  • hub reliability. The hub itself. When I hear that “the rule didn’t run”, my next question is are you sure? Did you have logging on, see that the trigger occurred, and it didn’t run at all? Or did the rule run, but the action never got through your mesh to your device?

Again, not discounting anyone’s issues. Hopefully give some places to start debugging.

7 Likes

Thanks for the recommendations. Unfortunately I have tried debugging on my own and still have issues.

  • device reliability. There are some truly bad devices out there. Some straight have buggy firmware. The Schlage lock issue is constantly blamed on the lock firmware but I never had a single issue using the same locks with Wink so I find it hard to believe that it's a firmware issue with the locks.
  • mesh reliability. Z-wave is hard to get right. Especially the battery powered devices. I initially thought this was the root of all the issues I had when I first migrated to HE but I've since spent a lot of money on Z-Wave Plus and Zigbee repeaters and can confirm I am getting good mesh signals throughout my network.
  • code / automation reliability. Groovy and Rule Machine are powerful enough to let you crash your own hub. The vast majority of issues I’ve had have been in my own code and rules. Honestly, I would love to add more complex rules but until I can get the basics working I'm not going to complicate things. I've actually paired down my automation in attempt to pinpoint the culprit(s).
  • hub reliability. The hub itself. When I hear that “the rule didn’t run”, my next question is are you sure? Did you have logging on, see that the trigger occurred, and it didn’t run at all? Or did the rule run, but the action never got through your mesh to your device? I have dug through logs and had HE Support do the same. My primary issues have been with the Schlage locks, proximity rules, devices dropping offline when they should be in range, rules for simple lighting flat out not working. I would love to see support play more of an active role in determining the root cause vs. blaming bad firmware, mesh, etc. I really wish I could post something more positive, I'm just frustrated from spending so much time trying to figure things out and as soon as I think things are stable something stops working when no changes were made.

how? There isn't many ways to prove this, it also doesn't discount the possibility that you have been unlucky and have a doggy device playing havoc.

per mesh, how many batter and powered devices do you have and what are they.

if you show the community your rules maybe there is something you have just gone down the wrong path on, we all are here to help with that. As you say there is only a few HE staff right now, so us all pulling together as a community helps with that.

z-wave? z-wave locks are just plain bad unless you have a very strong reliable network. z-wave is just not built for this type of stuff it been added as its gone on. The difference being Zigbee was built with security in mined from the ground up, so there is no "extra" load message needed. Don't just take my word for it, there are loads of people coming from other platforms that slowed things down to get over stuff like this that have come to HE and realised that there set up wasn't as robust as the previous system made it out to be.

our houses are smaller in the UK so maybe that's why we see less issues, i am also able to powered most of my devices. Some people mention z-wave and z-wave plus mixes as issues but mine are about 50/50 split but that may be because i have so many its not a issue they just chose to use the z-wave plus ones. The issue with set ups like mesh and wireless is there is no one good answer that will fix everything, you just need to fined yours. So be patient, ask for help in all areas and i'm sure by the end of it your be smiling :smile:

1 Like

The change was to a specific element in the web UI and how it's data was being updated in the background when a given browser session was left running, nothing app and or driver related.

1 Like

Like many, I have experienced slowness in the past that I'm still working through. I've was able to identify quite a few different contributors over time. Now, some of these may have been collateral damage due to other root causes, but changing/eliminating helped in my situation. Most recently, I've reconfigured power reporting on all of my supporting devices. I have a few RM rules that are dependent upon power levels for various automations. Unfortunately, some of these were very chatty, causing rules to fire needlessly at an excessive rate.

Items that, for me, seemed to have the biggest impact on performance were:

  • Device Polling - Have a few older GE switches in key locations that required polling to ensure current state. Moved these switches into less-critical locations and changed to utilize an RM rule with a much lower frequency of polling.
  • InfluxDB - Disabling this, for me, improved the length of time between necessary reboots. Although now I believe that this was just as likely tied to either performance issues on the DB host and/or my chatty devices hammering the process. I may end up adding this one again to try to mitigate some NodeRed concerns.
  • NodeRed - High frequency of calls for list of devices with capabilities. Due to the number of devices that I'm currently using, this single call was increasing App screen response times by 3-4 seconds. Have reduced the frequency significantly and will likely either pick specific devices to query or try utilizing the InfluxDB integration again for these devices.
  • Chromecast Integration - Honestly, this one was yanked at the same time as Device Polling. I just didn't need the integration.
  • Chatty Devices - As mentioned above, focus on chatty devices has helped significantly. I don't believe that the chatter was significant enough to cause issues with my meshes, just what I was doing with the information. Chatty devices and DB/Rule integration, for me, appeared to negatively impact performance.

I have also had issues with horribly written rules, but those tend to manifest themselves quite quickly when they occur. I'm currently at a point where performance is good and will see how long I will run without requiring any reboot of the system. Doing far more now with HE than I had ever done with my previous system.

3 Likes