I have a rule that notifies me if z-wave crashes. It happened last night on my C8-Pro. Unfortunately, the rule was re-triggered every 50-seconds, so by the time I woke up, I had received a lot of notifications. I've worked around this using private variables and won't get a notification more than once/hour in the future.
Is this a bug, or does it indicate that the system is caught in a loop of attempting to restarting the z-wave radio?
Might help to see a screenshot of the rule. Specifically the trigger. But, it is always better to show the whole rule. Depending on how you are determining the Z-Wave has crashed and the logic for the actions, the rule itself could be in a loop or the hub could be in a loop trying to restart. (Z Wave logs might be useful as well as hub event logs)
I have already updated the rule to prevent the looping. The screenshot below shows the rule, but with the new statements that prevent looping highlighted in yellow.
I've also attached an extract from the hub logs that show the crash rule being triggered every 50-seconds.
My experience is that z-wave logs can only be captured if the z-wave log window is open at the time the events occur. Let me know if there is a way to go back and access z-wave logs from "past" events.
I am using the legacy z-wave stack, as the ZwaveJS stack is laggy on my system.
They are both instances of a device that sends an email message every time a z-wave crash occurs. It's the first action in the rule ("Notify Marc Phone..."). The device driver is called "LGK Sendmail V3". I doubt that is the cause since it does not involve any z-wave activity. It is talking to a local email relay over the LAN.
I do not think that the email driver telnet socket closing is the cause of the problem. Below I will put the oldest event as the first entry in the bullet list:
At 06:28:56AM the "z-wave crashed" event arrives, and the email is sent (2 log entries)
At 06:29:04AM the telnet socket stream is closed for the email message that was sent.
At 06:29:46AM the next "z-wave crashed" event arrives.
From here that pattern repeats.
The patterns I see are:
The trigger events ("Z-wave crashed") are all about 50-seconds apart.
The timeout of the telnet input stream for talking to the email relay happens about 8-seconds after the email device is invoked.
A bit more important background info:
I use that driver to send emails from other rules and it does not result in a z-wave crash.
I had issues with z-wave crashes happening about every 7-14 days before I added the rule to notify me about them. At that time there was not use of the email driver, but the crashes kept coming until I shutdown and power cycled the hub.
My conclusion is that the socket timeout is not the cause of the z-wave crash.
Seems your real issue is the Zwave crashing, not how many times it reports Zwave crashing, though that is a different issue in itself, but it wouldn't exist if Zwave wasn't crashing already.
There are lots of post here to get to the bottom of the crashes, but it is usually a device in your mesh that is causing it. I would focus on finding the source of the crashes instead of tweaking the rule that reports them.
We can't see the logs before the crash, but they probably wouldn't show much. You really need to monitor your Zwave logs to see if a device is spamming the network, or possibly freezing up and causing repeating issues so the mesh crashes. It is hard to say until you start looking for things what the cause is. Again, look through existing posts about finding the cause of Zwave crashes.
I had a similar issue, and it was a device that just stopped repeating at times, but apparently it looked to other devices that it was still a valid path to repeat through. Whatever the reason it was causing the crash, replacing the problem device fixed my crash issues. Mine actually was two devices, both Eva Logix in-wall dimmers I had installed at the same time. Both seemed to cause the problem, strangely. It started with just the switches going unresponsive but not crashing the network, and I would get them working with an air gap. Then the crashes started, so I went straight to the known problem switches I had to reset every couple weeks being the cause of the crashes, and it turns out that was it.
Edit: I failed to mention, that when I was looking at Zwave logs for those two devices, they were going totally silent in logging Zwave events before the crashes, so that confirmed to me that they were the cause, and I assumed it was because they stopped repeating but otherwise they looked to be online and just fine.
Agreed, and I am trying to figure out the root cause of the crashes. My reason for creating this thread was an attempt to bring attention to the problem with the rapid-file repeating of the z-wave crash events.
Unfortunately, the lack of z-wave log retention makes diagnosing the problem difficult. I have an always-on Raspberry Pi (RPI) -- do you know if there is a way to have the RPI continuously save the z-wave logs?
There is an endpoint listed for Zwave logs in this list, but http://IP/hub/zwaveLogs in a browser just brings you to the web interface page.
You should be able to leave a log page open in a browser until a crash, or do the logs all go away when your Zwave crashes? It seems as long as you don't reboot the logs would still be there, maybe after a reboot if you have tabs open for each device log.
I would look for strange activity generally from any device, like logs that just always scroll activity constantly and don't pause, even while things are working, to see if something is spamming the mesh, before the mesh gets overwhelmed.
I know that I saw no activity from my problem dimmers, where everything else had logs. I can't remember now if I noticed that before a crash or after, because I was leaving logging on in a browser to try and catch something at the time.
Update: The z-wave logs are not saved. Once you close the window, they are gone. When you open the window, you will only see things in the log when something new happens. There isn't anything equivalent to the "Past logs" tab that you get on the system logs.
I just remembered that I do have an always-on windows machine. I'm going to open a browser window and leave it on the z-wave logs page...
Yes - there is a websocket endpoint and you can write a bash or python script to suck everything down into a file. I have one I could post later if interested however that’s great grunt work for your favourite LLM. ws://your.hub.ip/zwaveLogsocket
I don’t remember seeing this before - I get an occasional zwaveCrashed event after cloud backups on my « old » C7, but never saw rapid sequences like that, implying the zwave stack recovers quickly and then crashes again? what does your location events tab in the logs section look like ?
If the hub code loses contact with Zipgateway or ZWave-JS, the backend software gets restarted.. It could be a bug in Zipgateway or the chip itself locking up..
Laggy? .. Did you give time for all the re-interviews to complete?
ZWave-JS has much better diagnostics.. And I find it tends to be more stable than Zipgateway..
Yes, I gave it 48-hours. After another 2-3 days some of the switches would experience a 2-3 second lag, so I switched back and everything was nearly instantaneous.
This z-wave crash is actually not anything new -- it happens about once every 10-20 days. I set up a rule to reboot the hub once/week and the problem went away. I recently disabled the weekly reboot and it looks like the problem is still there.
@bcopeland, I just ran a simple test to demonstrate the lag I am seeing while using ZwaveJS, captured the z-wave logs and had chatgpt analyze them. I ran this test 18 hours after making the switch to ZwaveJS. Here is the scenario:
I push the top paddle of my ZEN30 (node 41).
This triggers a small rule that turns on 2 light switches (nodes 41 and 97).
An analysis of the z-wave logs appears to point to a 580MS time lapse between the hub receiving the top paddle "button" push and the hub sending a command to turn the lights on connected to node 97; 710MS for the lights connected to node 41. I believe that this is why I am experiencing lag times when using ZwaveJS.
There is virtually no perceptible lag when using the legacy z-wave stack. My hub is a C-8 pro running platform version 2.4.3.158.
Can you help me figure out what is causing the lag to increase on ZwaveJS?
Note: I ran the test again 36 hours after making the switch to ZwaveJS and the lag remains.
Thanks for any help you can provide
Marc
Here is the chatgpt analysis of the z-wave logs:
Event timestamps (from the log):
Button (Node 041) CentralScene received by hub: 2025-12-04 01:59:19.094 PM