@gopher.ny
I have 2 C-7 hubs, dev and production. Both of them were on 2.2.4.156, and both of dropped off the network last night around 1am. Whether they locked up, or just stopped talking on the network, I don't know.
They are on different UPS, so I doubt it is power related.
They are on different switches and I could still contact other C-4 and C-5 hubs on the same switches, so I don't think it was network related. (See EDIT below)
They are each sitting on an open air shelf, not on top of anything that generates heat, so I don't think it is heat related.
On both of them this morning I saw GREEN light, no port 80/8080/8081 access, no ping.
On both pulling the power cable brought them back online.
I've since rolled my production C-7 back to 2.2.4.148, but the dev C-7 is still on 156 if you want to look at any logs.
EDIT: It looks like my UniFi switches upgraded firmware at that time, too. So that might be what triggered it (only other thing I see that was happening around the same time). Why that would cause two C-7 hubs to lockup hard, though, is a mystery. None of my C-4 or C-5 hubs locked up....
Well, luckily 2.2.4.148 works really well for me, so I can just park there on my production hubs as long as needed. Which is good as I have work travel next week so won't be here to watch/reboot my hubs for my family.
It looks like my UniFi switches upgraded firmware at that time, too. So that might be what triggered it. Why that would cause two C-7 hubs to lockup hard, though, is a mystery. None of my C-4 or C-5 hubs locked up....
Mine is super easy! I made a virtual switch device that I continually turn on/off every 15s. If I don't see status update from the virtual switch within 10 min, I send the hub a reboot command, and send myself a pushover message.
Interestingly, beginning with 2.2.4.153 on my C-7, I noticed changes in background repair/re-routing as I let the mesh sit, and my device response became instantaneous, the fastest I have ever seen. This morning, several devices have a second or two delay. Only change, other than the .156 update from .153 when the update was released, was addition of another GE/Jasco 26931 Smart Motion Switch that is slowly becoming a repeater for several devices as the mesh routing changes, but it is showing a speed of 100kb, and is operating fine and quickly using your Component driver, Jason, so it doesn’t look like it’s involved. I’m going back to 2.2.4.153, which was fast.
Could you please set up the dev to reboot daily or every other day @ 3am? I suspect a memory leak - one other person reported exactly the same thing. Rebooting means it will not lock up, but I can still monitor it for memory and other things.
This doesn't appear widespread - or maybe 153/156 just hasn't been out for long enough. Memory settings in the two are identical, but they are different from 148.
No problem. I already have logic in node-red to reboot hubs (which I disabled some time ago, as it wasn't really needed anymore), so it is really easy to do so.
Here is what the memory stats have looked like on my dev hub. Red line is where it dropped off the network. I only update the stats every 5 minutes, though, so if an event were faster than that I wouldn't see it:
I have the dev hub on a dedicated spare unifi switch now, so am going to try to do a few firmware upgrades/downgrades and see if I can replicate the same thing or not.
Tests this morning disabling/re-enabling the switch port, and unplugging/-replugging the network cable didn't replicate last nights event.
Just after reversion to 2.2.4.153, I looked at Z-Wave Details, and two Aeotec Recessed Door Sensor 7 devices were marked as NOT RESPONDING, and my Aeotec Door/Window Sensor 7 was marked as FAILED. Sorry I didn’t look before reversion. The DWS7 had never had any issues before, but it is the most remote device I have (garage door sensor). Haven’t seen any of the RDS7 devices indicate NOT RESPONDING in quite a few releases, and 2.2.4.153 was the best release I had seen ever, everything solid.
Seemed to be a progressive degradation.
The devices were still fine after 2.2.4.153 reversion, and operating them (opening doors, without any repair) caused them to return to OK. The DWS7, which had indicated FAILED, now shows blank in the Route box in Z-Wave Details after returning to OK, but I believe that update takes awhile.
I’m going to let the mesh sit at 2.2.4.153 and self repair. This may or may not be related to Jason’s issue.
Well, I've upgraded/downgraded firmware on my unifi switches 6 times now to try and replicate last night's issue.
Each time after the switch reboots the hub is back on the network as expected. So if it was the switch reboot that triggered it, I can't reproduce it now.
I'm on Unifi as well, and have automatic updates on my switches and APs, although not on the controller or the Edgerouter. You certainly made me a bit nervous tho!