I’ve been having an ongoing issue with two of my four hubs becoming unresponsive simultaneously that has been a continual thorn in my side for the past year or more.
I have 3 c-7s and 1 c-8. The two devices in question are the c-8 and one of the c-7s.
This issue has persisted across many platform updates, but they are currently all running 2.3.8.125 and are all connected via Ethernet to the same network and hub mesh is enabled on all 4 hubs.
The last time this happened was about a week ago and then again today.
All hubs are running the automated reboot app that I installed about a year ago due to hubs randomly becoming unresponsive. I tried the option where the hub doesn’t actually call for a full restart and had to move to the full restart option which has made two of the hubs more reliable and perhaps made the issue less frequent but persistent.
I am not really sure where to begin troubleshooting. I have looked through logs and can’t say that anything jumps out at me but am not sure what to look for exactly. All I know is that when one hub becomes unresponsive, the other will also (so when accessing the interface via the app neither will load until I physically pull the power cable to force a reboot (using usb power, not POE). The other two hubs are invariably working when this happens.
I would appreciate any guidance to try and get my hubs to be more reliable.. one runs my heating logic and the other runs my well pump to keep our cistern full so if I don’t realize soon after they freeze up, we get cold or run out of water.. both of which it would be nice to avoid.
Thanks very much for any help you can provide.
Kindly,
Jeremy
Yeah - when their normal UI isn't responsive can you get to the hub's diagnostic tool at hubIPaddress:8081? If so, at least until you resolve this issue, use the Reboot option there to restart your hub.
Are you using the original power supply that came w/the C8?
You should probably do a reboot and use the newish reboot option to rebuild your database on the two hubs that are having problems:
Yesterday prior to writing my original post I upgraded all 4 hubs to the latest UI (2.3.8.125) and then performed a reboot on each one with the new database restore option checked. All the hubs successfully completed this procedure.
Yesterday evening a third hub that has been generally reliable became unresponsive (while the house was empty- (so no human interaction or network traffic was being generated), and I was able to login via 8081 and reboot so I will try that procedure going forward (until hopefully a resolution can be found), thanks for the tip!
This hub is set as the main hub for notifications and so the other two hubs interact with it over hub mesh regularly.
That C-7 hub and the c-8 are running on separate usb power supplies with incorporated UPS batteries that each provide 2.1A. However this problem predates their installation and the second C-7 that normally cuts out along with the C-8 is running on the original power supply as is the third c-7 that never gives trouble but also is not currently connected to any devices or running any logic.. it is just reserved for future use but connected to the network so I can keep it updated with the rest.
I do use one computer with jumbo frames (MTC 9000) to do video editing on a nas box over 10gbe. This computer (a 2019 MacBook Pro) was on but presumably sleeping yesterday when the hub became unresponsive. Can you elaborate a little more on how this use of jumbo frames could potentially be causing this issue with the hubs?
Since this seems to be now affecting all three of my active hubs it does seem likely that it is related to either a network or hub mesh issue.
I did have to make a change to the network configuration in some of the hubs awhile back because of very slow downloading of updates that would often fail due to time-out. I think it was a change in the auto-negotiated network speed based on a forum post about a known issue there. I mention this only in case it helps identify a potentially larger network issue affecting the hubs. Since that change all updates perform quickly and flawlessly on all hubs.
I appreciate any further thoughts on how to troubleshoot this issue and hopefully find a solution.
Jumbo frames are known to crash the hub UI. Since you have a need for their use it is recommended that you segment your LAN in such a way to prevent them from receiving them - the simplest way is probably to place them behind a dumb switch or hub that doesn’t support jumbo frames and let it filter them out for you.
Are your failing routers on WiFi? (i know you said ethernet, but....). I had this same issue with my C-7 development hub (very low load). I have converted it to the cable interface and the problem seems to have gone away.
Root cause appears to be my 2021 ASUS Router. (I have other devices disappearing from the router. Turns out that not all routers can smoothly handle the number of devices in our homes (especially older routers). My temporary fix was to add a mesh to my router system (had that capability) and it has been working OK.
If you are on WiFi, the next time this occurs, check your router interface and verify that the devices are in the active device list. If so, there are web-articles on this issue (search: "too many devices on router").
(PS: this does not say that there is something in Hubitat that is causing these problems as the system has upgraded to the wifi-based Matter systems - but I do not have the technical insight there and my C-7 has no matter devices.)
I have temporarily turned off jumbo frames (MTC 1500 on all devices now) and also moved my c-8 hub from my 10gbe switch to my 1gbe switch and will see if the problem continues to recur.
I have never input my wifi details into any of the hubs and have confirmed they are all using wired Ethernet connections.
My router is an Asus RT-AX86U running Asuswrt Merlin firmware and overall it has been rock solid for me so it wouldn’t be my first place to look when troubleshooting unless there is a known issue with this setup. I have Wifi turned off on this router and instead use a netgear Orbi mesh system, which also has been very reliable in my experience.
Is there any way to figure out using system logs, etc.. what has been happening? Or should I just wait and see if it continues with these changes?
Should I disable the automatic reboot apps on each device or leave them running?
Nightly reboots shouldn't be necessary and may mask issues so as a general rule for the C-7 up through the C-8 Pro I don't recommend them. Best bet is make sure you're on the latest firmware (2.3.8.128 I believe) and to let things run for a few days to see if your latest changes stabilize things for you.
Again, not knowing your level, I am not intending to talk-down. One last check item is the use of Static IP from the Settings menu on Hubitat. If you have done that, consider undoing it. See below from help page (especially the notes).
@djgutheinz no worries! All suggestions are appreciated.. I don’t really know my level either . I do what I can, but definitely could use some help figuring out how to make everything more reliable and appreciate any suggestions starting with the basics in case I missed something.. i did read about the static up issue awhile back and switched everything over to reserved IPs in the router instead. Thanks very much for your time and suggestions!
@thebearmay I have updated all the hubs again now to 2.3.8.128 and disabled the rebooter app on all as well (should I fully delete instead or is there no difference)?
Now I guess it’s wait and see if there’s nothing else I can do to try to pinpoint the cause of the past issues.
I guess the last thing it occurs to me to mention (that so far I have just chalked up to gremlins but maybe shouldn’t), is I have some lighting automations (using basic rules mostly) that sometimes fail to trigger and so lights either don’t turn on or shut off at the designated times. This has happened a few times in the past week for example.
I also have had increasing issues with the Alexa integration where it will have random periods where it will not respond to turning on/off lights and then start working again without any interaction from me.
I also have the HomeKit integration installed on one hub and a few weeks back I kept having to login to restart the service and finally turned on the option to have it restart every hour.
I mention these in case they might be indicative of a larger issue..
This is likely what was causing your crashes. This is a known issue through all versions of Hubitat Elevations...
Yeah Jumbo frames only affect ethernet not wifi.
Logs don't record the jumbo frames crashes. You could use wireshark to monitor the Hubitat ports on the switch(s) and then start sending jumbo frames to them and watch the hubitat(s) for the crash.
One solution is to get a 100mb hub (not switch) that will not pass jumbo frames. You can plug all your hubitats into that and allow jumbo frames to everything else.
I will try to live without jumbo frames for now and if it becomes a problem I will look into that solution.
I don’t know if it’s related but sometime between sundown and midnight two evenings ago I had a zwave node ghost on me on the c-7 hub that originally was crashing simultaneously with the c-8. I have tried refreshing it and it gives me the option to remove or replace it. I have looked through the other hubs and on the c-8 also found a zwave device that does not have a route listed and gives me the same options if I hit refresh although I can’t tell how long that one has been out.
What is the preferred method for fixing these issues now? I hit replace and turn on include mode on the device?
I have a zwave usb stick but have never managed to have it work, perhaps because my only windows laptop is very old and slow. I once spent over a week trying to get it to work and finally ended up resetting the hub and recreating everything from scratch. Hopefully there is a simpler way to fix these two nodes.
Here is the info for the C-8 that has two missing devices, one of which was due to a short circuit and so the device was without power for awhile and the other is the “bodega sensor” which went missing without any known reason:
I can make a third post with the topography and mesh graph of the c-8 if necessary but am only sending these now since I am limited to 5 uploads per post.
@user6562 are you by chance running a Unifi network? After a year or so of perfectly-stable C8, mine started crashing a few days ago. Right when I updated to 2.38.133. After a few crashes, I noticed that 2.38.134 was released, so I upgraded to that thinking maybe that would help. Nope.
Restored backup with 2.37.146, which definitely worked perfectly, and no dice. Though I did not perform a soft reset -- I just did the restore in the main UI which claims it destroys the DB. I thought this finally fixed my issues, but it just delayed the crashing to an hour or two. It just dawned on me reading this thread that it could have something to do with the network, since when mine crashes, :8081 is also offline. I do not have jumbo frames enabled, however. Trying a reboot of all my unifi devices.
Just figured I'd ask, just in case you were also using Unifi. Maybe there's a common thread there somehow. I just can't remember if I did an OS or device update in the last few days. I turned off auto updates on almost everything due to problems like these.
I really cannot explain this. No other devices had any issues whatsoever -- yet rebooting the the network seems to have fixed the issue. I re-updated to latest Hubitat version and it did not crash for the last 18 hours or so. Fingers crossed.
May be completely unrelated... However, I know that running UniFi WAP Firmware version 6.6.55 on my access points caused issues with my ESP8266 devices. They struggled to stay connected to the network. I reverted the APs back to version 6.5.64, and they worked fine again. Later, I updated to the Early Access version 6.6.65, which has also worked without any perceived issues. The rest of my UniFi network equipment is fully up to date with the current GA releases.