During troubleshooting/reading the forums, I recently discovered the existence of the Hubitat Diagnostic Tool (for me / on my LAN: http://hubitat5:8081/ ). That has always been available when the rest of it isn't functioning. It always also requests and succeeds in getting DHCP leases and responds to ping. Once the problem has occurred, it doesn't do any normal Hubitat hub functions (rule machine rules don't run, can't access Web UI, can't connect in Android app, can't ask Google to do things, etc). After rebooting, it works as expected for up to a few days and then stops again. It might be an issue (memory leak?) that builds over time until it stops working (I noticed one of the routines was half executed after about a day from the last reboot but the UI was still accessible. A day later (today) it didn't run at all and the UI was not reachable).
This is my C-5 (I bought a C-7 but haven't really moved anything over yet- I was planning to leave 1st gen Z-wave devices on the C-5 and run plus devices on the C-7 with them linked).
I have no custom drivers. I have no custom apps.
There are some "error" level entries in the log but they're from over a week ago (which was at least 2 hang-and-reboots ago) so I'm guessing they're not related.
The main thing in the logs that looks odd to me are these warnings and long times (but I don't know what "method parse... ran for" means). I have a bunch of Google/Nest minis (that I have it announce things on) and multiple of the exact brand/model of switch that "Kitchen Light" is that don't show up as a warning in the log (I checked a few and they're using the same generic built-in driver):
I've updated it multiple times (whatever is offered in the UI) since the problem began but it hasn't changed (it's currently on 188.8.131.52)
Is there anything else I should watch/capture or is that enough to go on to figure out what's going on?
(I tried "firstname.lastname@example.org" after reading that suggested for similar issues but I guess that no longer exists).
Have you done a soft reset?
Yes they changed the way support works. That should have been in the return message you got. You can always tag @support_team when things get dire. They can look at your engineering logs. (different from regular logs)
I haven't yet. I didn't want to risk the config (in case anything didn't restore as expected) if there was some other thing to try or check first, but that's next up if what I'm trying now doesn't do it (is the idea in that case to see if the hub can run stably with just the z-wave network and no automations/customizations/processes going on?)
I did get a return message (which is why I posted here). The return message didn't mention the tagging (so thanks for that info), just that support was through pre-existing information/documentation at support.hubitat.com and these forums at community.hubitat.com (so I went with the latter since I've already read what's available at the former).
I think I may be on to something though... After disabling ("stopping") all rule machine rules that speak information on Google/Nest speakers, I haven't seen a warning log entry (though I need to wait a few days to see if that continues). Would I be correct in assuming since that app ("Chromecast Integration (beta)") still carries a "beta" tag, any issues caused by using it are in the same category as custom apps/drivers as far as support goes, or do they want to improve it by getting feedback on problems with it (assuming the problem and warnings are gone with those rules stopped)?
Simply go to settings>>backup and restore and hit the download button. That will download a clean database. Go to yourhubip:8081 and do the soft restore and when prompted use the file you downloaded to your pc. It's a very safe procedure. Hubs don't get bricked.
Sure enough a rule didn't run correctly and the UI was inaccessible again (so it seems it stays up for a little less than 48 hours). After reboot the last log entry suggests it died about 2 hours ago (and I got more "method parse of X ran for Y ms" warnings, but only from 3 things: that kitchen switch, a multisensor and the app Google Home).
I assumed you meant "soft reset" so I did that.
It didn't directly prompt, but there were small links on the normal Web UI where it offered to start over and had those links for restoring a cloud, local or onboard backup so I selected the locally downloaded backup and let it reboot... I also assumed you meant to do it right away so correct me if I was wrong and the intention was to let it run with only the devices without my config for a while to see if the warnings showed up without apps/settings? If I did guess right, is the idea that there could have been garbage in the config that doesn't carry over into a backup (and therefore not into a restore)?
I figured that was the case. I wasn't worried at all about bricking, only about maybe losing rules, etc that I didn't have anywhere else if the restore were to fail.
Thanks for the continued responses!
At this point see how it runs... Are you running jumbo frames? If so turn them off.
This is possibly the cause of the problem, or may be a result of the problem. Your screenshot did now show up. That info about the logs would be very helpful.
Also, have you done a full power down, pull power for 30 seconds then boot back up at all? This is different from a reboot as all hardware is powered off and restarted vs just the firmware restarting with a reboot.
The switch the C-5 and C-7 are plugged into is unmanaged and allow jumbo frames with no way to disallow them. That's the case with all of the unmanaged switches in the house (and the PC I'm accessing from does not have jumbo enabled). I just reconfigured the smart switch that uplinks to that switch to disallow jumbo frames on that port so if anything was trying to reach it or broadcast/multicast with greater than standard size frames it should just drop (or fragment) now. Is that a known issue with C-5 but not C-7?
I see what went wrong with that screenshot (I copy/pasted what I had originally sent to email@example.com and realized it was still linking to Gmail for the pictures. I thought I replaced them all but I guess I missed that one (since it showed up for me since I'm logged in to my own account)).
The entries all look like this (but from a few different devices with different high ms values- kitchen light is always one of them but the rest switch up):
[dev:175] 2023-02-23 06:34:10.744 AM [warn] method parse of device Kitchen Light ran for 197,943ms
[app:1] 2023-02-23 05:47:24.056 AM [warn] method tokenResponse of app Google Home ran for 156,729ms
[dev:392] 2023-02-23 06:15:12.435 AM [warn] method off of device Common Long Hallway Outlet ran for 123,308ms
I have pulled power but not for a full 30 seconds. I can try that the next time there's an issue (if the current attempts haven't resolved it). As has been usual, there are no warnings today (but if it's going to have a problem again, they will start showing up tomorrow).
Well, I'm not sure if it was the backup + reset + restore or if it was blocking jumbo frames before they hit the switch it's connected to but it has been stable since doing those two things.
Do you have any additional details on either cause?
What is it about jumbo that would have caused an issue (even if Hubitat doesn't support them does its network stack see them and just freak out? It seems like it should just ignore them or I'd not be able to send/receive data reliably to/from it rather than causing a slow lockup of the whole system- minus the diag interface).
What would a backup, reset, restore have done? Like what would have been cleared out that fixed the issue if it were that?
Not critical to get either answer, just curious for the future.
Thanks (and thanks for the suggestions)
If a jumbo frame packet hits the hubs interface it crashes the networking. Whether it should or not, no idea, but that's the facts of what happens.