Crashing Hub

csteele · September 11, 2018, 8:14pm

I feel like some clarification is needed...

"'ghost z-wave devices', devices that you have forced removed that the hub still thinks exist, could cause issues." <-- to be a fraction clearer.. the Hub isn't the issue in this specific case. The Hub has a DB of devices it knows about. The Radio stick has a DB of the devices it knows about. For the majority of us, those two lists of devices match. It's as if there's one DB split into two physical. However, when one part of the DB mismatches, there's going to be a problem - some tiny, some large. "ghost" as vjv was referencing it was referring to devices known to the Radio stick, but not known to the Hub. If the devices are alive, the hub could be getting messages that it doesn't know what to do with. However, in vjv's case, the suspicion was that multiple alive devices were using no-longer-existing-devices to try and route through. That is all occurring out on the Radio side of the equation, not the Hub. My interpretation of what was written is that he had alive devices trying to talk thru missing devices and he had a very unstable mesh as a result. Repairs wouldn't work in that environment either. His cure was to remove the "ghost" devices from the Radio stick so that those node IDs were never propagated again. The Radio Stick is the routing master. What it thinks is the correct routing table is what it distributes. When it's table is wrong, it just propagates wrong all over the place.
Reboot. If the LED is red, cycle the power. It's safe and harmless. If the LED is Blue, you take a risk. The Hub's portion of the DB contains "everything" we think of as the hub. I believe all the automations(rules), sunrise, sunset calculations, device names are in the Hub's DB, Corrupt the DB and the Hub will attempt to rewind to a previous copy. We lose what got added/changed/deleted in the corrupted DB, but the Hub recovers. Here's where I'm going to go WAY OUT in the land of guesses.. a backup of the DB is done every reboot, keeping the most recent 4. However, since the DB doesn't match the real world, the DB might still be corrupted to us humans. That means 4 over eager reboots and we've lost any decent DB. We have 4 excellent copies of basically the same DB.

bobbles · September 11, 2018, 8:19pm

Thanks for the very detailed reply. It explains consisely what happened to vjv.

How was this done is the next question.
Any ideas?

vjv · September 11, 2018, 8:27pm

I will add, I never said I forced removed the devices, yes some of them but not all, I think I added a device then I did a database restore, I believe the device disappeared from HE (my HEM), but I had to remove it from the stick, I did so many test that I can't remember all, but after removed the ghost nodes my hub is fine, that is what matters right now.

Royski · September 11, 2018, 8:28pm

To add, how do you even know you have this “ghost”?

csteele · September 11, 2018, 8:29pm

He got a copy of Zensys's Tools working. Zensys is the originator of ZWave and they had very very low level tools available for ZWave device developers. The tools have leaked to the Internet and people can get a copy and do significant damage to their Wave network with them or improvements. Skills being the differentiator

There's a 2nd tool available called OZWCP (Open ZWave Control Panel) that is similar. It's based on the Open ZWave project and calls OpenZwave code to do those low level actions.

Once the tool is functioning, you tell it to "find" a ZWave usb stick and then clicky your way through the process. For "ghost" devices, you ask the Controller (the usb radio stick attached to the tool) to verify the health of a device. It will almost always tell you it's fine. Even when it's not. Ask repeatedly and eventually it discards the cached status and confirms the node is dead. Then you can run the low level command to delete the failed node. Your "ghost" is gone. Rinse, repeat.

bobbles · September 11, 2018, 8:29pm

And that is the question I keep asking. How?
Thanks.

vjv · September 11, 2018, 8:30pm

using Zensys software in a PC, probably the guys here doesn't want us messing with this.

bobbles · September 11, 2018, 8:31pm

Well that has answered that then.

csteele · September 11, 2018, 8:31pm

You know you have a "ghost" when you compare the list of devices known to the Hub to the list of devices shown in Zensys Tools or OZWCP.

csteele · September 11, 2018, 8:32pm

There's a recipe for OZWCP in that thread. It's an OLD thread and I'm only going to say the recipe is the part of it I'd keep TODAY. Back in February, that Thread was useful. Today, barely.

Royski · September 11, 2018, 8:32pm

Many thanks !!

bobbyD · September 11, 2018, 8:37pm

A frozen hub is a sign of an unhealthy hub. A power cycle may be required to bring it back to life, on exceptional basis. But without identifying the root cause, the exception becomes the rule and the experience turns fast into a nightmare. Usually there are signs preceding this event, like diminished responsiveness. Screening the Logs for errors may prevent a frozen hub. But if it happens, I strongly suggest reaching out for support. Plugging the Hubitat Elevation hub into a smart outlet is not a solution.

ritchierich · September 11, 2018, 8:39pm

Completely agree I should have clarified because above there was conversation of rebooting periodically/proactively to prevent lock ups. That is why it would be nice if a scheduled reboot could be setup within Hubitat itself like I can do in other computer systems.

bravenel · September 11, 2018, 8:44pm

This is an anti-solution, and a not a good idea. The better idea is to find and correct the problem. If you are having to reboot for any reason to make your hub work, you have a problem that needs diagnosis. Contact support, don't just keep on rebooting as a fix --- it's not a fix at all.

We have customer hubs that have run continuously without updates or reboots for over a year. This is to be expected.

ritchierich · September 11, 2018, 8:44pm

Years ago I invested in a Digital Loggers Web Power Switch. Its expensive but worth in my opinion. I have all my network closet gear plugged into it and I can power cycle everything with a click of a button and it will shut them all off and turn each back on at a defined interval. It can even ping a site and automatically restart certain outlets too such as modem and router.

There was even a DTH for it in ST land that I haven't ported over yet. You can buy them on Amazon and it appears like they may have a new "pro" version too.

vjv · September 11, 2018, 8:45pm

haha you said it, without updates!, I just want to be funny Bruce, nothing personal.

Thanks

ritchierich · September 11, 2018, 8:46pm

I am close to one of these and not experiencing issues fortunately. But as others have said here in the forum reboots have been necessary so its just a suggested. Take it or leave it.

csteele · September 11, 2018, 8:47pm

I want to believe that the Hub doesn't need a periodic reboot. I've been wrong before... I can remember a time 3 years ago... ...

The ONLY time I've rebooted the Hubitat Hub (and I also have 2 of them) is during Upgrades. My Hub Event's log is nothing but code upgrade events. By that I only mean, there's no inherent ... oh Darn!! I see Bruce has replied and said it better! That guy!!

bravenel · September 11, 2018, 8:48pm

OK, those "others" aren't following the suggested best practice. Follow their advice or ours, up to you.

doug · September 11, 2018, 9:07pm

I have to say that's not a fair statement. I'm following Hubitat's advice to the letter, to the point to where my hub is no longer as capable as I need it to be. But thus far no solution has been provided to the frozen hub. I have no choice, daily, but to pull the plug and reboot. I'd love that to not be the case, which is why I've been in contact with support, long before posting here.

Without a solution coming out of my contact with support, I need to look at what other options there maybe so that I can find the problem myself. Absent access to the logs you guys have through the backdoor, I have no other recourse. I'm admittedly no expert in this area, but things can't continue as they have.