[2.2.4.158] C7 hub drops off the network (crash?)

JasonJoel · December 27, 2020, 2:33pm

I've seen this before, but it hasn't happened in a few weeks. Now it has happened 2x in 14 hours though.

Hard to tell if it is actually crashing, or just the network interface is dying.

What I see is green light, no communication on port 80, 8081, or ping. Works fine after hard power pull reboot (hate to do that, but have no choice when it does this). Nothing in the logs. No logic, app, or driver changes on hub for quite some time.

C4 hub plugged into the same switch is working fine. Changed network cables just for grins yesterday, dropped off the network again this morning.

Maybe the hardware is dying? Who knows.

Memory looked OK (to me) before it became unresponsive (the change at the end was after I reboot it):

EDIT 1: Just crashed again. Will do a soft reset and see if it helps.

EDIT 2: Nope. died again. Is making it 5-10 minutes after a reboot before dying now.

rlithgow1 · December 27, 2020, 3:08pm

I would certainly put in a support ticket. When you did the softreset, did you restore your config or did you just let it sit? Is it possible that you have an ip conflict? And one other but remote possibility.... try changing the power supply

JasonJoel · December 27, 2020, 3:39pm

Yup, I did (ticket 19796). I haven't tried a new power supply - I'll do that next time it crashes.

rlithgow1 · December 27, 2020, 4:27pm

Also have you tried restoring the previous platform then reinstalling the latest with nothing on it? (I ask because based on your description it sounds like some sort of corruption)

JasonJoel · December 27, 2020, 4:31pm

Nope. I'll wait for support to chime in before doing that as there have been no changes app/driver/etc for a week+. Don't want to just keep trying things at random.

I did leave it unplugged for a few minutes before powering back up this time. Seemed to help (or it is a coincidence) as it has been running for the last hour, so we'll see.

JohnRob · December 27, 2020, 5:35pm

I was originally of the belief that the Hubs were capable of living in a lot warmer area but have since been proven this is incorrect. Is the hub in an area that can get too warm?

JasonJoel · December 27, 2020, 5:43pm

Nope. I've been through that before. I've often told my story of how 3 of my dev hubs would ALL lock up randomly when they were sitting on a barely luke warm network switch.

In this case the hub is on an open air shelf in my air conditioned living room (yes, we run air conditioners even in December in Texas ), resting on nothing but the shelf, with no other heat sources anywhere near it.

After leaving it unplugged for a minute or two before powering back up it has been running for 2 hours now with no lockups. Maybe the zwave radio went bonkers causing the hub to crash (that's about the only thing that gets reboot on a longer power pull that doesn't on a shorter one).

Don't know, though, as that info is in the logs we can't see.

erktrek · December 27, 2020, 6:19pm

Keep us informed! I am very interested in your experiences because you appear to have had a lot fewer issues then others here on these forums. Be very curious as to what the staff can figure out if anything.

JasonJoel · December 28, 2020, 1:41pm

Well, after doing the full pull power for a minute or two it has now been running for over a day with no crashes.

No temperature, network, or hub configuration was changed,

Weird, huh?

erktrek · December 28, 2020, 2:33pm

Yeah that's really strange. Why would the Radio chipset affect the networking?

With the older RPi's the ethernet ran through the USB 2 controller which had the effect of halving throughput - I wonder if there is something like that with the C-7's internally but with the radio or somesuch. Useless speculation on my part of course.

JasonJoel · December 28, 2020, 2:45pm

I don't think it has anything to do with the network. I think the hub/platform was locking up. But I could be wrong.

The radio could potentially cause the platform to lockup if it is really out of wack. Hubitat has confirmed that before when they were having initialization issues with the radio after reboot.

Or maybe if it gets in a race condition it generates enough heat to cause thermal lockups?

Who knows?

I've made a simple RM rule to toggle a virtual switch on/off every 30s. So if/when it happens again I'll be able to look at the device history and see if the logic was running (aka not locked up, and just a network issue) or if the whole thing is dying.

erktrek · December 28, 2020, 2:49pm

okay that makes sense - so the radio going crazy locks up the HE system with no way to resolve except for a long reboot. Isn't interesting that the diagnostic tools page was affected as well? That seems like a very hard crash.

JasonJoel · December 28, 2020, 2:52pm

It is interesting for sure. Has happened 6 or 7 times now since 2.2.4 came out. (at least I think every time it was on 2.2.4 - it is possible I'm remembering incorrectly and the 1st ones were on 2.2.3).

Never happened before then... And I've had this hub, and have been using it, since the C-7 was 1st sold.

erktrek · December 28, 2020, 3:01pm

I have been running 2 C-7s each in different locations since August and have not experienced that problem yet. I have been "long rebooting" them fairly often though for devices updates and other issues so that may have temporarily bypassed the issue.

Will def keep an eye out and report back if something happens - thanks for the update!

erktrek · December 28, 2020, 3:49pm

Are you using HubMesh in any capacity?

JasonJoel · December 28, 2020, 4:03pm

It is on for a 3 devices, but not actively using it for anything.

alex1 · January 7, 2021, 10:07am

I’m seeing lockup with my c5 also after updating to 2.2.4.158. Every couple of days

Seems to related to z-wave. I lose connection to a coupe jasco outdoor outlets when this occurs. Need to power reset these outlets for them to work again.

Logs indicate CPU is still running however.
Here is about when the Ethernet stops responding.

D48E 00 00 0000 00 00 00000A01095758009970997038997041D6383758FF19383758C44D383758AC2B00AC2B3758383758C33B3837580000030000, profileId:0000, clusterId:8032, clusterInt:32818, sourceEndpoint:00, destinationEndpoint:00, options:0040, messageType:00, dni:D48E, isClusterSpecific:false, isManufacturerSpecific:false, manufacturerId:0000, command:00, direction:00, data:[00, 00, 0A, 01, 09, 57, 58, 00, 99, 70, 99, 70, 38, 99, 70, 41, D6, 38, 37, 58, FF, 19, 38, 37, 58, C4, 4D, 38, 37, 58, AC, 2B, 00, AC, 2B, 37, 58, 38, 37, 58, C3, 3B, 38, 37, 58, 00, 00, 03, 00, 00]]

dev:1722021-01-07 02:03:20.490 amwarnDID NOT PARSE MESSAGE for description : catchall: 0000 8032 00 00 0040 00 7099 00 00 0000 00 00 00000A01090000108ED43758388ED48ED4388ED4C33B388ED4C44D388ED441D6388ED4000003000000000300000000030000

dev:1332021-01-07 02:03:20.591 amwarnDID NOT PARSE MESSAGE for description : catchall: 0000 8032 00 00 0040 00 4DC4 00 00 0000 00 00 00000A010941D63841D6000003000000001037583758383758FF1938FF19C33B38C33B8ED438375800000300000000030000

dev:1732021-01-07 02:03:20.483 amwarnDID NOT PARSE MESSAGE for description : catchall: 0000 8032 00 00 0040 00 D48E 00 00 0000 00 00 00000A01095758009970997038997041D6383758FF19383758C44D383758AC2B00AC

thebearmay · January 7, 2021, 11:39am

Is it just those 3 devices issuing warnings or are there others? If only those 3 what devices are they?

alex1 · January 7, 2021, 11:51am

just the three

thebearmay · January 7, 2021, 11:55am

What devices are they? (Clicking on the device number should highlight the device name at the top of the log.). Wondering if they have a common driver or application...