Thermostat scheduler "missing out" thermostats

Hi,
I'm setting up a new Hubitat system with 19 z-wave TRV thermostats. I've noticed a recurring theme where Thermostat Scheduler will update some, but not all of the thermostats on schedule, i.e. some seem to be missed out. If I go into the app and click "Set Scheduled Temperatures" it will then usually update the thermostats correctly.

Looking at the logs, I can see a lot of entries "skipped cmd:ApplicationBusy" which I believe are related to this. I get the impression that when there's a lot of updates such as when there's a hub mode change, it simply skips certain devices if it's busy.

Is the Thermostat Scheduler not closed loop, i.e. if a thermostat is meant to have a certain set point, will it not be monitoring that and resending the z-wave command? Or checking the z-wave status report after a command has been sent? Am I missing something here?

This isn't a one-off, I've experienced this multiple times whilst setting up and testing the system over the last few days.

Thermostat Scheduler screen:

Thermostat device before I click "Set Scheduled Temperatures":

Thermostat device after I click "Set Scheduled Temperatures":

What platform and what update are you using? I have had problems with 2.3.4.x versions with my thermostat, so I rolled back to 2.3.3.140. You may need to roll back and try that.

I found the bit in the logs where the Scheduler app sent the set point command:

Thanks, I'm on 2.3.4.134

Have you encountered the same issues with "skipped cmd:ApplicationBusy" in the logs?

I did not have debug logging enabled, so I did not see those.

So I think I've found a bug here... background is switched from Fibaro HC3 to Hubitat on Friday after many issues with Fibaro that I won't go into here. I've got 19 Fibaro z-wave thermostats on radiators so I guess that's quite a busy network. I'm getting intermittent issues with radiators not switching from OFF to AUTO or not getting the correct set point.

I came down to a cold kitchen this morning, the radiator should have switched from OFF to AUTO and had a set point of 17.0 applied at 07.00. I've captured the relevant logs and it's clear that Thermostat Scheduler did not issue the Auto() command. The first Auto() is at 07.29 when I issued it manually. The other radiator in the same room did get the Auto() command and correct set point.

@bravenel hope you don't mind me tagging you here as you've answered questions in my other threads. Please let me know if there's a better way to get support.

The log screenshot:

The Thermostat Scheduler settings:

To diagnose this it is necessary to separate what the app is doing from what the thermostats themselves are doing. So if it's an app failure, we can deal with that issue. But, if it's a Z-Wave mesh issue, that's a different set of issues. Please post a screenshot of the logs just from the app itself, not the devices. This is how we can see if the app is sending the commands that it should or not.

Thanks for the reply. It happened again this morning, attached logs from Thermostat Scheduler. It issued the Thermostat Mode off at 22.00 correctly, then at 07.00 it's issued the Heating set point but not put the thermostats back to Auto. At 8.00 it's gone into Away mode, at 8.46 I changed modes and it's correctly issued the mode change at that point.

So it does seem to be not sending the mode changes out reliably.

OK, I will look into this issue.

The way it works, is that it sets the thermostat mode if it is not already in the desired setting. So, the question becomes just what setting was it in at 7:00. The app logs show that it was set to off at 22:00, but what we don't know is what the app saw in the thermostat at 7:00. I'm thinking it shouldn't bother with checking the mode, and just send the command irrespective,

None of our apps do this, they are all fire-and-forget. As a general principle, rechecking that a device responded to a command is a losing proposition. It's not possible to discriminate between a completely non-responsive device, and one that dropped a single command. Pounding more commands out could damage the mesh further, without solving the problem. So, generally, the hub has to assume that the mesh is functioning properly. Examining logs can reveal when this is not the case.

Let's try one more logging thing if you will. First of all, turn off debug logging in the driver, as this is just noise for our purpose that makes it more difficult to see what's going on. Then show the logs for both the app and the device, so we can see cause and effect together.

Meanwhile, for the next release we will remove the test of current state, and send the setThermostatMode irrespective. This might solve your problem.

Thanks, I'll update the logging as you request and try to catch this again.

I understand what you're saying about closed-loop, my background (years ago) was in control engineering so to me closed-loop is a given, but I suppose you're right that most domestic products won't work that way.

I think your point about sending commands irrespective of current state is a good one - if it's not closed-loop then I suppose the app's record of current state could be incorrect for many reasons. I think you can have closed-loop with state, or open-loop with no state, but open-loop with state doesn't make sense to me particularly in a world full of RFI.

Obviously there can be problems with this. But, mesh networks are somewhat fragile, especially Z-Wave. There is an objective of not putting more traffic on a mesh than absolutely necessary, so often this means looking at state. Consider the case of motion activated lighting. A motion sensor may bang away with active/inactive for the duration that someone is in the area, but the lights are on until there is no motion for a couple of minutes. If every active event results in an on command being sent to devices that are already on, that's a lot of traffic hitting the mesh unnecessarily. So we try to use state tests to control that. The implication is a resulting need to be able to override that, and force the commands irrespective.

When there is a healthy mesh, open-loop with state is the right way to go. When there is not a healthy mesh, the home automation system is going to be flaky. Flaky automation is worse than no automation by a long shot. So --> fix the mesh. This is what has to happen for a reliable system.

In the case of thermostat mode setting, this is a very low frequency event, so just sending the command won't have much impact.

I like your reasoning. Although I can't help thinking that's a partial argument for cautious closed loop: one command, one test that the command has been actioned, if it failed then one retry, then log an error. Equally I can see that a failed command could be just as likely to fail the second time. Perhaps there's a hybrid where state is updated on a lazy basis.

Probably different strategies work pragmatically for different situations, which is what you're suggesting in this case.

Is there already a software queue with some time delays to prevent radio congestion? I think there's a pattern here where the failures occur at change of hub mode. I've got 19 z-wave radiator thermostats and there's a lot going on in the logs at each mode change. When I manually change a setpoint or mode the thermostat generally responds immediately.

I jumped ship from Fibaro Home Center because of numerous problems like this, but without the ability to diagnose and trouble shoot due to lack of information from the controller and lack of a community resource like this one. I think you're entirely right that flakey automation is worse than no automation.

I'm beginning to slightly regret going down the z-wave route, I chose it because I wanted separate and lower frequencies than WiFi, and the things I read suggested that a decently set up mesh is quite stable. I've got a lot of mains powered devices that should be acting as repeaters too. I've just bought a second Hubitat so I can split the mesh up a bit.

This is not practical. There are a variety of actual device response times. So when do you do this? And how are you supposed to know when to do it? Room Lights does provide a way with debug logging to see the device responses, for diagnostic purposes. But the key strategy is to fix the underlying issue, not pound away at it with retries.

Radio congestion at the controller is not the issue. Of course the radios have queues. But, the mesh is not instantaneous, and there can be issues with propagation through the mesh. Additional issues are devices that don't respond, either because (a) the device itself is flaky, or (b) its mesh connection is flaky.

Which begs the question of what sort of mesh do you really have? If you are having problems you may need to reinforce it with repeaters. Also, if you have problems with many commands at a certain time as you mention, why not stagger the times a little? If you used Rule Machine instead of Thermostat Schedule for thermostat mode setting, you could introduce time staggering down to a few seconds. From afar, it's not possible to diagnose your situation.

Having said all of that, we think our radio / event / command architecture is solid, even if it doesn't fit with your experience from closed loop systems.

Release 2.3.4.139 is out which has the unguarded setThermostatMode feature.

Thank you for the rapid update release, it's installing as I type.

It's helpful to understand your thinking with regards to the architecture. I've no doubt that what you describe is the normal approach to domestic control systems.

I do already have a four repeaters (z-wave sockets, three of which don't have anything to control) plus three hardwired light switches that should be repeating too. I haven't put my second hub in yet so maybe I will find it a bit more stable with that in place.

I did think about using Rule Machine but currently my setup is based around Thermostat Schedule with mode changes for away from home and working from home. It would be a lot of rules to capture all of that, but I may give that I try if I'm still having issues.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.