HE behaving strangely, not "working"

Interesting. I rejoined the PC Controller to the network, let it simmer for a while and checked and now the "Hub not talking to things" problem is even worse. But that's not real concerning; it's still working (and indeed the network seems fine!)

First thing I'm doing is I just air-gapped all the switches in the house.

I pulled the airgap switch, left it off for ~5 seconds, pushed it back in, waited the second or two for it to turn on and resume whatever state it was in and for the light to settle, then waited another 5 seconds or more before doing the next one. Probably too quickly, but ... all seemed OK with that and I really didn't want to spend an hour doing this, slowly. :slight_smile:

Mostly I went from inside out, hub side to far away. Obviously I wasn't super worried about precision there, but generally that's what I did.

Now running a repair, and the repair is going FAR faster than it had previously. 5 minutes after starting it's already done 15 of the 60, whereas before it was taking several hours (3+) to complete. Of course maybe it hung up on a few later, so I'm not sure it'll complete faster, just that it's at least moving far more quickly now than I've ever seen it move before. And it feels like it's completing faster.

Is that ... some sort of maintenance? Once per year, or a couple of times, or maybe just "after you've got everything mostly settled into place after having sporadically and in no particular order added 60 devices willy-nilly" that you should sort of air-gap everything one at a time and let them come back up?

I mean, that feels like it's not illegitimate, if you know what I mean.

Anyway, will report later today on how that worked. (As it stands, 20 minutes after starting the repair we're already halfway done. If it keeps that up, it'll be done 5x as quickly as it used to take. So I think that's a very good sign, and an indicator that there was a device or three that were confused.)

Unfortunately lack of a "Repair is done" entry means "Repair failed". I didn't understand this until very recently. :frowning:

A successful repair of a node looks like this:

sys:12020-07-23 21:22:11.399 traceZ-Wave Node 202:  Repair is done.
sys:12020-07-23 21:22:11.381 traceZ-Wave Node 202:  Repair is requesting node neighbor info
sys:12020-07-23 21:22:11.377 traceZ-Wave Node 202:  Repair is adding return route
sys:12020-07-23 21:22:11.373 traceZ-Wave Node 202:  Repair is deleting routes
sys:12020-07-23 21:22:09.661 traceZ-Wave Node 202: Repair is requesting device associations
sys:12020-07-23 21:22:09.641 traceZ-Wave Node 202: Repair is updating neighbors
sys:12020-07-23 21:22:09.622 traceZ-Wave Node 202: Repair is updating neighbors
sys:12020-07-23 21:22:09.602 traceZ-Wave Node 202: Repair is updating neighbors
sys:12020-07-23 21:22:04.552 traceZ-Wave Node 202: Repair is updating neighbors
sys:12020-07-23 21:22:02.469 traceZ-Wave Node 202: Repair setting SUC route
sys:12020-07-23 21:22:02.426 traceZ-Wave Node 202: Repair pinging

Yes, it would be nice to have an "failed" entry in the log, especially if you have a large number of devices.

This should be much better with the C-7 that offers a richer Z-Wave properties page. Really looking forward to that.

Did you update the firmware on any of your devices? I was thinking it was just your hub. There are some devices you might have to exclude then re-include after an update.

Again, I'm afraid I don't have a solution but I have another statement:

ZWave repair is initiated by the Hub by sending a message to all devices, asking they determine their neighbors... and then return that info to the hub.

Let's think about what can go wrong with that simple idea. :smiley:

First, the hub can send all it wants, it's receiving that counts. Any ZWave device that didn't hear the request will sit there and do nothing. Each will respond to the neighbors, so that Device X may not have heard the Hub, and therefore will NOT try and get a neighbor list itself, it will respond to the neighbors. What that means in practicality is that any device hidden behind Device X didn't hear the initial request and isn't in range of any neighbors.

The ZWave Repair completes and Device X is now reachable, but nothing hidden behind it is... which gets solved by yet another ZWave Repair.

I bumbled onto this years back when I grabbed a copy of the ZWave Repair logs and sorted it, looking only at the final message.. lo and behold, there were devices missing. I did another Repair and got busy scratching my head.. because the 2nd time more devices were in the list.

Firmware: @erktrek
I did update some firmware.

Every time I've had to exclude/include, it's broken all the automation associated with #thing. There isn't even a way I've found to automatically (or semi-manually) give #re-included-thing the same name as what it had before. Is there a way around that?

What would you folk's processes be if you wanted to update the firmware on 50 devices? Would you really exclude them all, update the firmware, reinclude them and rebuild the entire set of automations?

Down the road, I'll have enough set up I do not want to mess with having to recreate and reassign various things. Hence trying to sort out this process now. I hope to not need to update firmware often on devices, but I want to be able to update firmware without too much disruption if I need to. E.g. I don't mind if I have to, in some cases, have to exclude/include one, but I'd like to keep that to a minimum, or zero if possible.

to that end, I'm hoping the newly ... jiggered and fiddled with network which works apparently 5x faster (for no reasons I can actually figure out) will let updates work via the Firmware Updater app.

Repairs: @csteele and @dennypage
I did check - there was no skipped devices in this Repair, and none gave anything other than what appears to be normal repair information in the logs. I wish these logs were able to be copied automatically to another system (syslog, FTP on a schedule, or even if there were an easy way to pull them regularly via API or webhook?).

Anyway, I can totally run several more repairs, spaced out a few hours from each other. And will kick another off right now. Since they only take 25 minutes now this is fine (better than 3 hours!)

See this post. Same concept, just with one device.

That is says "Repair setting SUC route" multiple times for a given node, and does not say "Repair is done" for that node, is an indication that the node was not successfully repaired.

1 Like

Who thought up that bizarre state of affairs? :slight_smile:
In any case, only 5 didn't complete then, and none that I would consider critical (e.g. none that would be a choke point to farther places.) I'll check on those specifically.

Wow.

I mean, thanks. And sure, that's better than breaking all your automations.

But still, wow.

I'm in the process of doing that for 115 devices...

I feel for you.