Aeotec Smart Switch 7 with 2.2.4 driver question

I've played a bit with in-driver command queue. First impressions a pretty promising. First of all I was able to remove most of the command delays making device-driver interaction more responsive. And command loss rate is still lower than was before. All test it some more and apply changes to SS7 driver also.

Idea is pretty simple. But HUBs language adds some difficulties to implementation. It's kinda weird that in the late 2021 I can't declare custom classes inside driver code to make it more readable and better organized.

So the queue takes all commands and dispatches them one by one. If device returns 'Application status' "device is busy" command is resent with either fixed or returned timeout if available. And there is some timeout for the device to have some time to respond in such cases. But it is time smaller then before. 100ms. Actuator-style commands are wrapped with supervision encapsulation. They are awaited up 60s with 500ms re-tries. Values are configurable. And there is a possibility go beyond that and re-encapsulate important commands with new session ID if there were no response after 60s of waiting. It might help with some z-wave locks that cause troubles to some people.

In general such logic is something that can be actually implemented one or more layers closer to hardware (as it is pretty generic thing). Scripted implementation looks not the best way to provide it. I bet HUB has some internal z-wave scheduling. But doesn't seem to make use of such CCs when available (in contrary to S2 encapsulation function)

Not being aprogrammer at all, I take it that this is all a matter of implemantation? :thinking:
But thanks for the insights.

Would SiLabs allow this? :wink:

Another question regarding the Fibaro drivers, I am still using the one without the polling option.
Is there something like a "chatty" component within these driver, I feel that, since I use them, some "lights on when motion deteted"-rules are occasionally executed with some delay?
But maybe only another temporary Z-Wave indisposition.

I don't know my self :stuck_out_tongue: Just recently started with groovy/java and devices. I have a C++ programming background and electronics engineer diploma )). I just gluing it all together playing with devices I have.

From what I can see, hubitat drivers are just proxy scripts that simply encode/decode device messages (mostly). So for being 'chatty' is more about device it self. If it initiates messages to HUB in case of events.

Like SS7 sends message itself without any polling when you physically press a button on it. Or those timed/threshold reports.

In contrary Fibaro TRV is pretty silent. From doc I know it checks sensor each 10 minutes. But it does not send any updates about it. The only messages it initiates are about the lack of heating medium and opened window. Battery low/empty events should also be sent. But I wasn't able to check/confirm that yet. So I added a polling option. It is not enabled all the time. It's a configurable option (on/off and polling interval). In fact polling can be organized with RM. I just prefer to use it for high level logic. And to me polling device is closer to micro management (just a matter of taste)

And I'm about to add 'experimental' driver for SS7 with internal command queue.

I see. But it sometimes feels like the network got hammered, just like when all TRVs are being triggered at a time.
Which reminds me that you wrote about queuing commands if device failed to confirm, without meaning to interfere with your work, are you aware of the fact that each time you trigger a FLiRS, a broadcast will be sent and all FLiRS wake up within 1s?
This is what used to hammer my network for about 30s, I bypassed the problem by adding a delay of at least 7s between each command.
Sorry, if you already knew, just thought it could be worth mentioning.

I'll stand by for testing if you want.. :wink:

No. I didn't actually knew about wakeup broadcast behavior with FLiRS devices. And I can't really test that atm as I have only one in use. But for me it is a bit weird that it hammers network. Once again I don't yet know all the details. So my next thoughts are mostly assumptions based on general principles.

There are two types of RF-communicators (devices). Those that work in simplex or duplex mode. It simply means if each device can either receive-or-transmit or do both at a time. In both cases only one device can communicate to some other device. (WiFi MIMO is a bit different thing that is using multiple closely arranged frequencies).
And the lowest level protocol usually enforces rules of communication channel acquisition and device negotiation. Higher level protocols works on already established logical connections (that are actually packets sent at negotiated device-to-device sessions).
From this perspective multiple concurrent transmissions from awakened devices sounds more like some sort of issue in the channel negotiation logic. But I might be wrong.

All in all I have 40+ Z-Wave devices, so my mesh needs to be handled with care. :wink:
One FLiRS won't hurt, but I can reproduce the effect when only triggering the 3 TRVs in my living-room.
Triggering all 7 TRVs I have will seriously upset the hub.
You'll find some evidence of this in the forums, e.g.

Summary

Eurotronic thermostats hang my hub - #55 by Arek

Interesting, but I can only tell from my observations, that sometimes, when triggering my living-room devices at a time, all indicators lighten up almost simultaniously, but only on a good may, mostly one of them gets stuck.

If there's one thing I learned from this forum, that there's a lot special with this protocol. :wink:

Hmm.. Some of the mentioned lags I've seen even with my single device even in manual control mode. Experimental driver tries to ensure actuator command delivery with 60s timeout. I've seen cases when device had been awaking up to 40s. I didn't applied supervision to status requests. Only to commands (on/off, setHeatingPoint).

The device log will show if driver had to retry some supervised commands. Or if command were timeout without confirmation.

Device state will show the contents of the driver command queue (but it needs to refresh drivers page to update).

Interesting if it may help with multiple devices.

This is a lot, what are your other Z-Wave devices?
Sounds like a mesh issue to me, or maybe you have some ghost devices?
If you find a mains-powered device in the Z-Wave details page that shows up no routes, it needs to be excluded, if possible and paired again.

You mean I should test the experimental Fibaro driver?

"Should" is way too strong wording)))) But you can try)

I'm keeping an eye on my z-wave network. And it looks fine. The case with long wakeup is rather rare. But I saw it can happen. I see my devices change routes from time to time (with no obvious need or reason). And it made me suspect that if something happens in a meantime while route is changing could it be the reason of such delay.

Just gathering statistics and looking for possible causes)

1 Like

I can tell you that this behaviour is normal.
If you interested to go deeper into Z-Wave, I recommend having a look into the SiLabs docs.

Now for my testing:
I created a rule that is supposed to trigger my living-room TRVs at once.
Installed the drivers, rebooted the hub.
The result was always the same, 2 out 3 TRVs worked, one failed, without witing any logs.
Also, this very device could not be controlled by any means, entering the setpoint manually didn't work as well refreshing it.
Tried all that several times.

Reverting to your "standard" drivers fixed the problem. Weird.

Edit: I just rembered also having had an issue with long response times, and all of them had been SS7 included with S2-security. Look here.

Did you have a chance to look if device queue were empty or had some commands queued?
Like this:

I have an issue that sometimes command get stuck in the queue. And I can't figure out yet why. While suspecting some possible multithreading issue I still looking for error in my code.

Both seemed to work, this is what the logs said for each of them:

trv__log

These logs says that two controlling commands were accepted by target device and removed from queue as acknowledged

Ah, sorry, didn't have an eye on that, this is the faulty device, the queue for the other 2 is empty:

Summary

Arhhh.. The queue has stuck.

You can press "Flush all queued commands" to punch it. They will start executing.

I need somehow to track down what gets wrong...

I have feeling that UI interactions are handled in single-threaded mode. While schedules fire action events from other threads.

And they did, which also caused the device to go nuts. :grin:

Summary

After some testing and logging I'm pretty sure 'experimental' command queue based drivers are facing thread conflict issues. With small data amount scheduled one thread might not see other thread is pushing something to the queue. While with large amount of scheduled data threads don't see command queue is being consumed leading to infinite command send/execution loop.

The conflict is between cron event callback and runIn/runInMillis calls (the latter can conflict with it self due to running event from different thread).

I found mentions on forum about few thread-safe collection types. But providing multithreaded behavior into driver instance without providing full synchronization API it is (let's say softly) a weird solution. I would expect either a full MT API stack or no multithreading at all (in a programmers model for drivers and apps).

By my last point I mean that driver instances may run in different threads. But I don's see a need to run driver instance methods from different threads at a time. Each driver instance may have its own event queue under the hood filled from different threads and executing from some single worker thread. This way all drivers will be secured from MT issues.

It's not about use case that may be faced or not. It's about a rifle that will shoot sooner or later. Basically it posses a strong risk if to try to use HUB in some critical system/environment.

To fix my experimental implementation I see 2 ways. The simpler way is to use a 'pauseExecution' instead of postponing. But documentation doesn't mentions if the function call is blocking (if it will block a driver script only and allow context switch inside allowing other scripts to operate or it will block execution thread completely). The more complex way it to use concurrent collections provided.

Made fixes for multithreaded command queue access. Pity that I have to use global variable. I would prefer to have some shared access driver field for the purpose a-la 'state' instead.

Testing experimental implementations for stability

Started to investigate Schedule CC to implement temporary manual override functionality. Seems like HUB has an issue with parsing Schedule CC replies.

Requesting:
ScheduleSupported

HUB parsing result:
ScheduleSupportedReport(fallbackSupport:null, numberOfSupportedCc:null, numberOfSupportedScheduleId:null, overrideSupport:null, startTimeSupport:null, supportEnabledisable:null, supportedOverrideTypes:null)

Device replies with
FD 0A 02 43 01 40 01 80
FD -> numberOfSupportedScheduleId: 253, same as in manual
0A -> supportEnabledisable: false, fallbackSupport: false, startTimeSupport: [hour/minute, weekdays]
02 -> numberOfSupportedCc: 2
43 01 -> thermostat setpoint: set
40 01 -> thermostat mode: set
80 -> overrideSupport: true, supportedOverrideTypes:[]

I see you had quite a busy time. :slight_smile:

You mean for security or stability-reasons?

Hm, maybe a dumb question, but one of my TRVs, the one with firmware 4.6, occasionally tends to spin up the valve to fully opened.
While in this state, the deivce won't accept any commands, neither local or via the driver, I always have to re-calibrate it to make it work again.
Using the prior version of your drivers, it did this every night, annoying, but it looks like now I found a pattern, seems always to go nuts at about 03:20 in the morning.
The hub's database cleanup runs at 03:15 every day, do you think there could be a connection between these 2 events?