Another hub lockup


#41

Mine is sitting vertically next to a 120 mm fan blowing on it so I am not sure if heat is the issue here.


#42

Just to add, my hub started acting really strange yesterday as well. One of my water sensors tripped in HSM and HSM started sending TONS of Pushover messages about it. My thought is that it was queuing up messages in the background until the hub finally caved in. Even after clearing the HSM alert, I was still getting Pushover messages about it. What's really odd is that in Devices, the water sensor was showing as dry, yet HSM was still sending messages about it. That's when my response times for automations also started nose-diving. It took 2 reboots of the hub to finally clear out all the messages and get my automations to start working correctly.

So, I wouldn't say the hub was locked up, but there is definitely strange occurring in the software.


#43

What you describe is very similar to what my hub did yesterday. Didn't hard lock - still had blue light, and I was eventually able to login... But automations/rules were not running, and it took ~5 minutes and many attempts to login to the hub. Like something(s) was really bogging it down.

One reboot cleared it up for me.


#44

Finally got an answer from support.
It was a generic answer that said my hub problem has been identified as being caused by custom code. Nothing about which code is causing it which I should think they should be able to tell me as they have identified it!
This has only just started happening so the offending code, whatever it is, is only now causing issues! Really!
Also a had a couple of questions in my ticket, none of which have been answered. Sorry HE team but not good enough I'm afraid.
I have been a great advocate of HE but my opinion is slowly changing.


#45

Yup., While I don't expect them to SUPPORT custom code, I definitely expect them to provide enough tools/diagnostics to figure out what code is causing the issue.

How do they even know it was custom code if they can't even tell you what specifically caused the hub to lock up? They can't - they are guessing that because the stock hub has been working, it MUST be custom code. That is kind of BS in my opinion. Or, if they "know" it is custom code causing the issue - how do they "know" this, and why can't they provide some breadcrumbs to help us narrow things down?

This is just reinforcing to me that Hubitat isn't really quite ready for general use. At least not my general use, I guess. I do think it could be someday, but it is simply too fragile right now. If the entire hub can be taken down at random times for random/unknown reasons, with no way to figure it out... Well, that isn't a product I can rely on.

I'm not bailing out yet, though. I am still going to try moving as much of my custom code over to my development hub as I can first. There are some drivers that I simply have to use user code for - so those will have to stay in place.


#46

Did you remove all custom code and prove this problem still exists? If not Hubitat cannot be responsible nor advocate it's limited resources to troubleshooting ANY possible combination "custom" code that they didn't create that may/may not be causing issues.

If you have not in actuality you are guessing that they are guessing, except for they apparently haven't seen any issues with hubs without custom code, only ones with it to make their process of elimination diagnoses from.

Could there be better diagnostic tools included sure and maybe will someday and by all means advocate for, but complaints to Hubitat about issues that they state are not related to something Hubitat created (custom code causing problems) WILL RESULT in Hubitat eventually eliminating the "feature" of being able to even add custom code, to eliminate the complaints, as limited resources allow. A platform CANNOT provide the diagnostics for every possible conceivable thought up user created function "to work", that will never exist.

A better avenue (IMO) would be to do as they directed and remove the code, .....if problems exist with everything being stock/native (AKA something THEY desgned/created) then by all means complain away. ws


#47

I do not write my own code I use what people publish.
If I cannot use non-generic code I may as well go back to ST.
For example I am using non custom code for my Fibaro FGS 222. In my ticket I asked what generic code I should use. No answer.
I asked if code is being developed.
No answer.
Someone even suggested connecting these devices to St and using the non-generic hub link code. HE have not said don't use it but it IS custom code.
So if I use it and my problems persist do I delete it.
To me the whole concept of not using custom code is a non starter as HE have not developed all the possible instances of generic code yet.


#48

I had issues similar and it was custom code. I fixed the driver (not a developer by any means and it took me a while) and that improved things I also removed Chromecast intrigration (I also believe this to be one of my issue on the random side of lock ups no proof just hunch now may add it back :thinking: to see). I also side with @waynespringer79 it's not there issue and with the amount of different code out there it's just not possible for them to fix. Also don't delete the app or driver, they added a great feature that enabled us to be able to disable the potentially rouge code! What more do you need, for now that's a good start and I'm sure in time there will be others things they will give us.
By the way I can't remember the last lockup I had :thinking: .


#49

That is always an option, but you misunderstand the "feature part" of being able to add custom code, it in no way is a "guarantee" of compatibility but rather it s a "possibility" of compatibility, and that "possibility" does not exist on many other platforms.

Is this item listed as a "compatible device" on Hubitat's stated compatible devices page.
If answer is Yes, then user stated recommended driver, and if it doesn't work then Hubitat has devoted their resources to saying this device IS supported and should troubleshoot accordingly.

If the answer is No, (which I assume is the case as by using a custom code) then you use anything "at your own risk" to "possibly" see if it works, Hubitat in no way (that I've seen) guarantees that devices NOT listed on the compatible devices work.

Correct, but they have said on mulitple occasions (webcore) that "IF" you have problems to remove the custom code to troubleshoot as they cannot guarentee compatibility.

This should be your premise from the beginning, as you will never find any platform that guarantees endless possibility of custom user code compatibility


#50

I had a different issue over the weekend. Hub has been stable and fast for a long time. The web UI remained quick to load pages, but when I opened any device or app it would take up to about 20 sec to open the page. I did a shutdown and power cycle and it went back to normal.

Im running the node-red performance monitor and it showed no issue, because its based on how long it takes to load the device list, which was probably still working normal and fast. Heres my chart, seems there are a couple of hit events during the maintenance window.

C4 hub mounted vertical.


#51

If you buy a new vehicle, any modification to the engine void the factory warranty, and a dealership will NOT work on engine problems with modifications "under warranty"......everyone should treat this the same way with custom code apps, and Hubitat's "support" provided with using them.


#52

HE should silo the user code to prevent it from taking down the hub.


#53

Sorry to hear about your hub lockup issue. I too have experienced a few lockups, but only with my original hub (which is also the busiest of the three, and runs plenty of my custom apps and drivers.) I have been trying to isolate the cause of these events for some time now, although it's been a while since my last lockup. With the recent firmware releases, my hubs have been getting restarted which I do believe helps to prevent the system from reaching the critical level where things go really wrong. I am not advocating for periodic restarts of your hub, as I agree that these devices should just run for weeks/months/years without ever requiring a reboot.

I have been trying to approach this problem as scientifically as possible. One of my hypotheses is that LAN based network traffic somehow exhausts a resource over time. My two big network traffic producers are my custom HubDuino systems and InfluxDB logger. I have moved one of my two HubDuino devices to a second HE hub, and along with that went the corresponding InfluxDB logging. So now, both hubs are still running one HubDuino LAN-connected device as well as a copy of InfluxDB Logger.

So far, it 'feels' like I have reduced the occurrence of hub lockups, but new platform firmware has been released as well over the past month or two. So, right now I have at least 2-3 variables that are changing while I am trying to troubleshoot this issue. Firmware changes, hub reboots, my system changes, etc... This makes it difficult to run an experiment like this.

I agree that it is frustrating. I do believe the Hubitat Team is aware of these issues and would love to get to the bottom of them as much as we do. I have faith that eventually a solution will be found. I would still rather restart my Hubitat hub about once every 1-2 weeks versus go back to SmartThings. But that is just me and my use-case for home automation.

So, what I am wondering is if you can share what type of custom Apps and Drivers you're currently using? It would be interesting to have others help to prove or disprove my network data transfer hypothesis. If you're not using the LAN for anything custom, perhaps I should be looking elsewhere? I know a lot of people are now using the custom HubConnect solution to tie multiple hubs together. That is a significant amount of network traffic. If those hubs are still healthy and not locking up, perhaps I am barking up the wrong tree?

Thanks for reading through my ramblings...:thinking:


#54

Periodic hub reboots shouldn’t be used as the solution to hanging software. Watchdog timers the same. They do alleviate the problem and get the system back to a more stable and accessible state but the root cause is still there and should be identified and fixed so that these precautions are not required. If watchdogs were obsessively used then many manufacturers and their customers might never know there is a problem. Having said that even Google Home Hub has one in the early hours ... but they would require remote access to fix.

The blame custom apps is an easy but frequently correct diagnosis. We really need some mechanism to see memory usage and leaks for apps and their processor utilisation, over time. That’s not the whole story though, some apps use ‘os’ features that are new or untested and that might be the culprit, for example I’m using the alpha MQTT client currently. No problems at all so far but this is a bit more entwined than just blaming custom code.

I have always worried about network usage, logging , garbage collection and disk space but I’m not able to back this up. However I have never experienced lockup’s on any of my three hubs since launch, but I only use a tiny number of radio based devices. I have seen slowdowns. Maybe the sparse ZigBee and ZWave utilisation has spared me ?

I’m increasingly loving Hubitat but I am anxious that some path exists to identify and eliminate these issues, in particular if it is a defect somewhere. HA hubs have to strive for 100% uptime, and not by reducing access to a few homegrown apps. Although I do understand why this is the easiest path.

K


#55

I think I remember someone mentioning that if you could access the top or ps command on the linux os running on the HE hub, all you would see is a big java app running. I believe there to be a nginx web server also, along with standard OS things like dhpcd, ntp etc.

With all these hub lockups and slow downs, unless I'm mistaken, the OS and SBC hardware itself are not hanging up per-se, nor the web server as you can still reboot by hitting the url, its the java app, or in my case earlier, just a particular function of the java app, maybe the db accesses were slowing it down.

To me it seems HE just need to implement more or better resource limits on user code execution so a bad behaving app or driver is not capable of taking down the box. If they continue to allow users to run their own custom code on their system (and I hope they do), then they really shouldn't pass the buck for instability caused by it.


#56

While off-topic, you know that is 100% not true, right? Look up Magnuson-Moss Warranty Act.

I have had this discussion with dealers multiple times. Quite successfully, I might add.


#57

" Consumers may seek redress in the courts for alleged violations of the Magnuson–Moss Act. A consumer who has been injured by a supplier's noncompliance may bring an action in federal court if the [amount in controversy is over $50,000 or a class action if the number of class plaintiffs is greater than 100"

How exactly is this relevant in anyway? But it is telling where your beliefs are......you "voluntarily" accepted the terms and conditions, Nor did HE claim to "warranty/support" a non-standard (hub with user code) Actually, they expressly stated against supporting by recommending removing them.


#58

As a vocal user of HubConnect, and at least one of the pioneers of a multiple Hub approach. I chose that path for a number of reasons and I can't entirely eliminate "cool" as a factor. :sunglasses:

But a giant element at the time was WebCoRE (which I never used on Hubitat) and Homebridge. Both were getting tarred with the "bad app" brush. I wanted / needed to move Homebridge off the hub I wanted Support for and I'd take my knocks on the "bad apps" I'd be picking. I got lucky I guess in that all I had been having was occasional 5-27 second 'zoned out' events (where the Hub would seem to be hung and then 'dump' everything queued up). I never ran into a locked hub. So far, never, check with me again tomorrow since I just jinxed myself. :slight_smile:

I do feel that for me, I made a good choice to split my hub into 'slices of processing' -- two for Z-radios and one for "bad apps." The recent ability to replace Homebridge (the App) with a HubConnect Remote -Client version is wonderful. No more "bad app" tar on my Homebridge need.

The only bad app I have now, is Chromecast :smiley:

My experience does make me 'upvote'


#59

That is not how that act is used / applied. Feel free to Google more if you are actually interested in the topic. It comes up ALL THE TIME for car modders and performance enthusiasts.

In any case, since you seem to just want to argue - and I think have said all the topic relevant adders at this point. Feel free to not post more in this thread that I started.

Thanks.


#60

Just out of curiosity, what is the longest time you have gone without rebooting (or updating) any of your hubs?