Significant delays between physical events and recorded log events

Approximately 2-3 month ago a noticeable delays (up to 10sec) between PHYSICAL event such as Button Press or Motion Sensor Trigger started to pop up more and more often. Frankly this was happening before but not to very often and barely noticeable. Let me make this very clear - nothing is wrong with the rules themself because as soon as Trigger Event is logged the rest of the rule executes very fast and exactly as expected. The delay is between PHYSICAL Event and a recorded Trigger Log Event. The only way to prove this - is to record video with ability to measure time between frames. I am not going to go that far. This problem may happen in two cases:

  • Most probable place is a problem(s) with a Zigbee and/or ZWave Mesh
    (to my eyes both meshes seems to be in a good condition, but of course I cannot be 100% sure);
  • Another probable place - a delays in the Event Processor
    (I have no idea if any type of logs can show/reveal this);

And to make things even more exited - if say, Button is pressed more then once the correspondent rule also will be executed multiple times. So, it looks like PHISICAL events are somewhat queued. Unfortunately this behavior brings WAF down (otherwise WAF is very high). My wife (and myself as well) is slowly learning not to press buttons many times if nothing happens on a first attempt. Now she knows the command will/may go through. Just be more tolerable.

Any ideas how to attack this problem are very welcome.

If it's a device with a driver that has open code then adding a log.info ("Data received") to the first line of the parse() function will generate a log entry and you will be able to tell which side has the latency, the network or the hub.

Thank you for the idea but I have only few out of too many devices with custom drivers.
The majority (if not all) of the observed delays are coming from the Buttons and Motion Sensors.
All these devices are using a built-in drivers.

It for sure sounds like a mesh issue. You could use this tool to get a sortable view: [BETA] A Z-Wave Mesh Tool [C7 and 2.2.4+ Only]

Then look at the RTT values. If the delay is in the mesh I would expect to see a high RTT on the problem devices.

2 Likes

You would only have to do one sample device to test with. Its only a troubleshooting technique.

1 Like

I am using a bit different app which collects the same data:

What exactly RTT is and what number is considered problematic?

Is this 0xB9 device is a problem?

This is one is GE/Jusco Dimmer/MS combo.
I have 3 of these installed 4+ years ago and they been problematic from day # 1.
The MS reported "active" reliably and usually fails to report "inactive". I had to create a special
"refresh rule" for all these 3 devices. Unfortunately there is no (at least I did not find) a replacement for these toys (wall mounted Dimmer/MS combo). Replacing them with 2 devices is not a desirable option unless this is absolutely a must.

And from the other side, few Zigbee MS sensors (also installed 4+ years ago) exhibit the same problems which is relatively new. Of course both meshes could be problematic. Because I am in a big apartment complex 2.4GHz band is very crowded and absolutely uncontrolled (to...o many WiFi routers and Bluetooth Devices around). But surprisingly my Bluetooth Devices and Zigbee mesh works very well (I think).

I 100%+ agree but devices with custom drivers are pretty much stable.

Yes it looks like B9 is struggling. 24 route changes also.

How often are you refreshing them? The higher RTT might just be from all the refreshing.

1 Like

Once per minute and ony when ms is active. But these devices and associated rules are 4 years old. My dream is to replace them but there is no one.

If you think it is the hub platform you could roll back to 2.3.6 or 2.3.5?

I don't think it is related to the hub platform and/or hub hw (upgraded to C8Pro). Occasionally this was happening in a past but somehow less pronounced. I was blaming on a potential Zigbee interference because mainly Zigbee MSs exhibit this problem in a past. Now problem mainly related to the ZWave Buttons and became more annoying. Also I cannot imagine how problems with whatever mesh can explain a command queing effect.

I have seen it before but not as long as 10 minutes. Could be a couple of things, the message bounces around for a while before getting to the hub, the message fails and the device tries it again multiple times. Or lastly, maybe if the z-wave radio on the hub is busy it will cache incoming messages before sending it over to the hub (this is just a guess).

You could try disabling the polling you have setup temporarily to see if that makes any difference. Honestly the 1 minute interval should not really be a problem, but maybe the mesh is already stressed and that is pushing it over the edge.

If you want to post the full z-wave mesh details I could look to see if there is anything else that catches my eye. You can use the system page or that custom app to get the screenshot.

I have several of the Jasco motion dimmers. I agree that their signal strength is not the best. Over time they've become better performers. I've done several things to my setup over time, so I can't tell what step has made the largest impact. Only that they can be made to perform well.

A few things to note/try:

  • The community drivers are IMO better than the built in ones. Not likely a solution, but you could add logging code.
  • The devices have a firmware update available on Jasco's GitHub site.
  • I've moved from C-7 to C-8 and now C-8 Pro. The 8 series are definitely better for mesh performance
  • I've added a few other Z-Wave devices, including 700 and 800 series chipsets over time. Zooz's dedicated range extender (ZAC38) is 800 series. It might be worth a try.

Interesting, mesh is actually capable to some sort of queuing messages.

1min polling for the GE Dimmer/MS devices active only when device reported "active" state.
These devices almost never faild to report "active" but usually stuck in this state until refreshed.

OK, here is an entire Z-Wave Mesh details (3 screenshots):

I did not like these devices from the very beginning. I am planning to replace them with Zooz devices (ZEN12 or so) as soon as they become available.
Thank you for the hints. I will try to update firmware and use different drivers. Who knows, mabe this will improve these devices performance.

How long had it been since hub was rebooted there? I see the two combo devices seem to be the worst with route changes which suggests they are either struggling a little or very chatty. Sometimes a very chatty devices fires out so much stuff that route changes are inevitable. It also greatly depends on the hub uptime as to how bad those stats are, it may be perfectly fine if hub has been up for 24+ hours already.

I really like how configurable the Jasco devices are. The community driver lets you easily change between Occupancy/Vacancy/Manual modes.

Last reboot was about 2 days ago when I updated both hubs (C7 and C8Pro) to the latest platform.

Could you please point to the device ids?
Are they this GE Dimmer/MS devices?

0x62 and 0xB9
The number of route changes is not horrible for 2 days of uptime. Those looks like the most active devices as well. Someone else said they have a weak antenna so that might be causing the route changes.

What are the node numbers or names of the button devices you have been having issues with?

Oh yes, these are two GE combos. I will try to update firmware and use a different driver as suggested by @bill.d But frankly I really like to replace them with something else.

These are 2 multi-button remotes from different manufacture. Device IDs are 0x54 and 0xAF
Both have a button programmed to open/close a balcony doors and used by my wife relatively often. Of course for her it is very annoying when button does nothing right away but after few seconds doors are moving. Now she waits for the action but before she was trying to push button multiple times. Finally actions was activated and door were moving few times back and force. I heard many "good words" about this automation but she really likes it a lot.

Also I am planning to replace first 3 devices (0x04, 0x16 and 0x1b). These are nice looking Aeotech dimmers with touch panels but usually the communication speed is down to 9.6kbps. Surprisingly they are a repeaters for multiple devices.