Simple Rule app response flakiness redux

All,

This is something of a continuation of a prior issue. Starting a new thread as the old one started with a red herring about possible RFI from a whole house fan. Here is that thread for reference:

One of the suggestions there was to do a reset and rebuild the Z-Wave network since it was built ad hoc the first time. This time I built the network outward from the Hub.

Here is the same picture of the physical layout with numbers indicating order of inclusion

(I did have to re-include the Outside Double as it did not load the right driver the first time, but that should not have an impact.

One thing I note is that the Topology Map is actually worse than before:

old:

New:

Z-Wave Topology_2

This is background. I'll out the issue in the next post for some separation and possible ease of reading.

Cheers.

OK, here is what's going on.

Everything works very well and reliably, except the same thing that didn't work well before. The Circulator Light (100 kbps) connection does not reliably turn on when the Circulator (40 kbps) turns on. The Circulator is very reliable, but the indicator light is not.

But this is only from the app rule. When I sit at the hub and turn this on/off, it is rock solid. Did it like 8 or 9 times with no delay and 100% compliance.

The rule is a simple "turn this on when that turns on" (the circulator brand is Grundfos):

with a similar Off rule.

Not sure what I might be missing here.

Cheers.

Sounds like the Grundfos device is not reliably reporting its state. Check the logs for both the app and the device. Sure, you can control Grundfos Light reliably, that isn't the problem. The problem is probably that the rule is not even firing when it should.

Thanks for the suggestion. So, I knew I'd forget to add some details. In the failure cases, the Grundfos does report and the rule does trigger. Overall failure rate for the light is like 30%. The rule firing reliability is well above 90%. So the rule says "yeah, I turned that light on (or off) just like I was supposed to.

I'll also add that we turn on the Grundfos 2 ways - we have a time-based rule for weekday mornings as well as having two different switches that can do it manually). The failures don't seem specific to the triggering mechanism.

One thought I'm having is timing. Maybe the trigger rule should have some delay before trying to turn on the light? Not sure why this might help since by the time the rule triggers the Grundfos is on.

I had also considered that the version of the Simple Automation Rules app in the Hubitat version I'm using (2.2.6.140) might have an issue, But I use similar rules for turning on a parking laser and an interior light (at night) when a garage sensor opens and this works reliably.

Cheers.

Cheers.

I would certainly suggest that you update to 2.2.9.146. There are lots of bug fixes across the entire platform since 2.2.6. Be sure to download a backup of your system before upgrading, just in case there is some problem.

Thanks. When I had the previous issue (and the system behavior was even less reliable), I did move versions and things seemed to get worse, so I went back. But I have a feeling that that may have been artifactual and the real issue was the network construction.

Do you know of any specific issues that might have been fixed along the way with the Simple Rule machine? I looked a bit and didn't find anything explicit.

Cheers.

It's not just the app, but the platform itself, Z-Wave, Zigbee and drivers. It sounds like a device issue, and there have been considerable improvements to Z-Wave over the past several months.

That does seem a good next step. I will say that the device itself (Leviton Outlet) is very reliable on its own. Also, the same device and driver (Generic Z Wave Driver) is used for the Grundfos and an espresso machine, both of which work close to 100% from Rules.

I will give this a try and see what happens.

cheers.

OK, did the upgrade and have since used the functionality. Initial indications are that the new release is somewhat worse and more flaky than the prior release in that the Grundfos responds but is now less reliably in reporting state, and the app is still flaky in triggering the light.

So, that does not seem to be a viable solution to this issue. Still looking for ideas on why the performance of the Hub/App in this particular instance is so unreliable.

Steps to try next? Data to gather to help troubleshoot?

Cheers.

Install Tony Fleisher’s (@tony.fleisher’s) Z-Wave Mesh Tool app, look at signal strength (RSSI), errors, etc.

Interesting suggestion, thanks. I will try that when I have a chance.

Not confident it's a signal strength thing because I can activate this path from the Dashboard very reliably and repeatably, so the connection seems good. But I can look at results for the RSR when the App fails to trigger toe light and see if something is happening there.

Cheers.

Since I'm new to Developer App code, I have a newbie question:

I have now installed Tony Fleisher's Wave Mesh App and it is safely in the Apps Code area. What I don't know is how to actually execute this App to be able to get the Web Mesh Report and start to look for clues.

Thanks for pointers.

Since performance seems to have degraded with the newer OS version, seems important to use this App to see what's going on.

Cheers.

Some apps, when installed via HPM, give you the option of going ahead and confuging the app after deployment. If this oen does't, or you didn't use HPM to deploy, you can go to the apps tab and click "Add User App." It should then appear in your apps list.

It should then appear in the apps list

Most excellent, thanks. That did the trick (not yet familiar with HPM, but adding it worked).

With this I see some not unexpected stuff:

The Grundfos has a 57ms +/- 80ms RTT. Not sure why this is a direct connection when there are intervening nodes that would likely make a better connection. Signal strength is -10 dB, lowest of the bunch.

It also has 12 route changes (far and away the most), but I don't know the implications of this.

Still, with the older release, this device was actually very reliable, but is now flakier with the new release.

The light (which was the previous problem) is

100 kbps. 2ms +/- 6ms delay, +1 dB, 1 route change. So this seems a very good connection, but was the item that had issue with the old release. So, it really seems that the issue is not related to the device or device connection, but something about the app interaction.

Going to give the current release a little more time to see if the Grundfos returns to prior stability before I restore the backup to the older, apparently more stable, version.

Cheers.

Is it possible that turning off "on/off optimization" would cause the on or off command to be sent regardless of the reported state of the zwave device, thus effectively over-riding any status reporting issue?

That's it's intent, yes. For sure you should turn it off if you have devices not responding.

Have you tried turning off "on/off optimization"? If it is a problem with the leviton plug controlling the light returning its status turning this off might resolve your issue (though you still likely have something funky going on with that device).

Thanks. Turned it off from all relevant apps after your initial suggestion. We'll see what that does.

It is possible that the device itself has issues. Since I can make it work reliably from the Hub with no fail (and it only has issues when triggered by this app), I find that less likely, but it is definitely possible.

Also wondering if, considering the long delay on the Grudnfos, it may make sense to add a short delay (even a second or two) to the light activation? Then again, this shouldn't trigger until the response comes back to the hub, so tha should be after the delay.

Also wondered about using a simple Mirror rather than a Simple Rule. I had tried this early on, but it wasn't as reliable as I might have liked. I think that this isn't one of the main line apps as documentation for this is scarce (easy enough to use, but it seems secondary in some sense).

Cheers.

OK, so I wanted to give this some run time to see what happened. Results are that this pretty much seems to have done the trick. Thanks to the expertise of the community for this suggestion!

It's now about 90% or so reliable. Certainly any suggestions to eke out the incremental improvement to make it as solid as the rest of the system (basically about 100%) are certainly welcome.

I do have some questions about all of this, so starting with a recap. Before this change, the circulator would have appeared as the problematic link: Further away, longer latency, much greater jitter, slower data rate and lower signal strength (interestingly, the system has since learned a new path and improved much of this). The light, conversely, had very solid stats.

The behavior, though, was the the circulator was very reliable and the light not so much. The circulator just about always came on, reported that it had come on and triggered the rule for the light. Then the light did nothing.

So, the first question is why this change to remove the Optimization fixed anything. It's not clear to me what might have been optimized away (assumed by the system) that is now explicitly sent. Basically not clear just why this worked.

I saw several comments that there was likely something flaky with the device. Not sure I see a justification for this since the light controller, when triggered directly from the Hub, was 100% effective. I don't see how the device "knows" whether a turn on came from me pushing a button or from the app. Always worked for me; not always for the app.

Also saw a comment that the device might not be reporting status properly. But it seemed to always do that for both devices. Circulator came on, said it was on, told the Hub and triggered the rule. When failing, light did not record the turn on, did not turn on, and said it was not on. So,. seems status was accurate in all cases.

Thanks for any suggestions for incremental gain and also for help in understanding what may have happened here. Of course. getting it working was top priority. Thanks again!

Cheers.

Hi folks,

Just wanted to check back and see if anyone had any thoughts on why this connection may still be somewhat flaky-albeit less than before with the change suggested above.

The far away and low signal strength device is quite reliable. The close, higher signal strength device is not. But only in conjunction with the control by the app. Direct contrl is very solid, so the device is good.

Any thoughts welcome. And thanks again for prior suggestions.

Cheers.