So if you disable all devices it still fails in 4 days without doing nothing?
I could disable all devices, but that would be around 175 devices so there would not be an easy way to find out which device since waiting 4 days between enabling devices would just not be worth the effort/time at that point vs just auto rebooting the hub every 3 days.
If i just had something, anything in the logs to at least look in the right direction it would help.
I do not even see a warning about the hub memory running low beforehand, which I found odd. It literally says memory critically low and just reboots itself out of nowhere. The reboot time also does not take place at any particular time during the day either. It's usually within a 12-16 hour window around 4 days after the last reboot.
Well I should of said apps not devices.
There has to be something you are hitting that is triggering the bad condition. I would just be curious to first get your hub in a good state so we know it isn't something with it for some reason that is why I suggested disabling everything. Once we know the hub will stay up on it's own then turn on a few things and see how long it lasts. Gradually expand that until a issue occurs that triggers the reboot. Then back out those items and see if it happens again. Hopefully that will help you pin it down. There is no doubt the bigger your environment is the harder it is t find these kind of issues. By chance do you keep backups for a extended period for time. like months and have a hub backup prior to this issue starting? Do you still have the old hub that the C8 Pro replaced?
What version of firmware are you running?
Yeah the only thing I have not done is tried disabling the built-in apps, since I figured it was a custom app that was causing the issue. Its possible there could also be a room lighting app that is causing an issue, but I have a ton of those as well.
I am on the latest FW: 2.4.0.151
I just keep the internal local backups which go back a week. I also download a monthly backup to a local server.
I did also have cloud backups, although I just checked that and there is none. For some reason Hub protect is not enabled on this C8 Pro hub, even though I know I transferred it from the old C8 hub after I migrated to the new C8 Pro hub. I will need to contact support regarding that issue though.
I do still have the old C8, I could use it for testing since I have not wiped/reset it yet. I would need to disable the Zigbee/Zwave radios though so there is no issues.
That shouldn't be necessary since the built-in apps have enough logging to alert users by reviewing the Logs/past logs if something goes wrong.
I checked your hub's engineering log and shows nothing unusual, other than Chromecast devices not being able to communicate with the hub at times, which seems to coincide with at least 5 last reboots. I'm still reviewing the logs and if something else jumps out will reach via PM.
Since you are commenting here, what would actually cause the error "Low Memory" event called out above and trigger a automatic reboot.
I have run my C7 down very low under 100MB before so i am surprised to see that it rebooted itself.
Thanks Bobby. Its interesting you mentioned the Chromecast devices, as I was considering uninstalling the Chromecast Integration to see if this had any affect on the issue.
You are correct on this one. For some reason the transfer didn't activate the service on your hub, although the transfer was successful. I fixed that, so please go to Backup and Restore and select refresh entitlements to activate the service locally.
The problem as I see it with using the C8 is that it is likely the combination of what you have setup on the hub and having activity that triggers some kind of action. You may be able to do something with Hub mesh and moving integrations/stuff between the hubs to isolate them but that is allot of work on its own.
That's the part we are still looking into
Thank you, the cloud backup is working now.
Should I leave the hub as is now for your testing, or should I try removing the Chromecast Integration and see if anything changes?
I certainly wouldn't hesitate to do it after bobbyD's responses.
Now take the next bit as some IT Guys ramblings as it is speculation, but seems plausible.
This is a bit of a wild guess, but the main stuff we see as I understand it is from the JVM that is used for the Hubitat stuff we interface with. That is something that functions in another OS like a Linux variant. After thinking about it my best guess is the Low Memory warning is something being triggered from that underlying OS and not within the JVM. When a JVM throws memory issues it is fairly obvious generally speaking. Also the memory stats we see I believe are really from that JVM and not from the underlying OS. I could be wrong there, but it is what I suspect. That would mean this memory error is about something at a low level that you are hitting hard. My initial guess is something in the network stack. You have a good amount of TCP based connectivity happening and as such that would put a huge load on the network stack.
Could you try disabling all of that stuff and see what happens. That would be Lutron, Maker API Instances, Google Home, Chromecast, Home Assistant Device Bridge and see what that does. I know that may make the Home automation just stop though so it is easier said than done.
Almost every time I have killed my hubs when doing development, it was by hitting the Lan connectivity hard.
The idea here is that Network stack load isn't really shown in performance data except for wait states. It also would be outside the JVM so it could impact memory allocated to the underlying OS.
I agree this seems to be the number one way to crash a hub is by overwhelming the network stack. Especially constant failed TCP/IP connections seem to hit it really hard.
Yeah, this was my first thought, although I do try to minimize the activity between LAN devices and the hub. Everything is hardwired as well, and I have yet to see any communication errors related to LAN apps while trying to diagnose this issue.
I am going to try removing the Chromecast integration, as I only have a few rules tied to these devices.
This won't necessarily happen. The first time I killed one my hubs was with node red sending commands via Maker API. It was a pretty simple process to. I had Node-Red used apcupsd to pull battery data from a raspberry pi connected to a battery backup. Then the data returned was parsed and sent to a virtual devices via Maker API that updated states based on the input. The UPS returned maybe or dozen values and it changed fairly frequently. The apcupsd integration in node red also had to be triggered every 12 seconds at least due to some limitations or it would disconnect.
The result ended up being node-red sending a far amount of commands down every 12 second. It would run for about 18 hours and then eventually the hub would just start to choke, CPU would jump up and eventually it would just lock. No errors were ever generated.
Well I removed the Chromecast Integration so we will see if there is any change. Tomorrow will be 4 days since the last reboot, so I should know at some point in the next 48 hours.
No luck, the hub rebooted itself today again.
Have you looked through your list of Scheduled jobs immediately after a reboot, and tried to find any job on the hub that is scheduled to run 4 days into the future? Since it seems to happen so repeatedly, I wonder if there could be something that is scheduled to run in the future that causes the issue?
Any chance you have some other device on the network that is performing some sort of network scanning every 4 days?
Since you still have the old C8 hub, and IIRC it had the same issue, maybe you could fire it back up with its radios disabled to see if it still crashes every 4 days? If yes, then perform a soft reset on it, and see if it still crashes every 4 days. Just thinking out loud...
I will take a look at the scheduled events and see if I can find anything.
Yeah I was thinking about using the old C8 for testing. I have performed a soft reset many times in the past in trying to diagnose the issue with both the old C8 and C8 pro.
The 4 day timeframe is odd, because I don't remember ever setting anything up to run on that schedule on my network or in the hub. The thing is the time of day is different though, which I would expect to be consistent.
Sent you a private message to get more details.