What is best practice for setting triggers to account for sensors that fall off the network?

So I've been using Hubitat for a good year now. Have a lot of automations around the house dialed in perfectly so there's a lot of things turning off and on based on motion sensors and contact sensors.

The only times now I see issues and have to go back into hubitat to figure out what's going on is when one of the dozens of sensors either has a dead battery or has just fallen off the network for whatever reason. (I've now setup device watchdog to help me identify dead sensors but that's not what I'm after)

For example, I have a three ZigBee motion sensors in my kitchen at various locations. These are aggregated via the built in "Zone Motion Control" app and I use the output of that in Rule Machine to trigger various automations. However recently I found that one of these 3 sensors had lost connectivity to the Hubitat and the last status it reported was "motion active" and the zone motion stayed "active" forever too. Guess zone motion adds another layer of complexity but the same thing holds true if there was only a single sensor.

So what I'm wondering now is if there's a best way of setting up triggers so Rule Machine can check how long it's been since a sensor has reported any new status changes so that a dead sensor that last reported "active" 2 weeks ago, doesn't keep triggering automations.

Here are a couple of apps that do what I think you are looking for:

Simple Idle Alerts
Device Activity Check

3 Likes

I agree with dennypage, I think an activity check would be the easiest here. If a device hasn't checked in within X hours, consider it "dead".

Community based battery watchdog apps also exist, either that or you can use the built-in HSM app that can track battery levels.

1 Like

I would say the point of your question was literally "how to fail".... "safe" and it's an often overlooked aspect due to the usually "not so critical" use cases. But I think that's an opportunity for improvement and possibly averting disaster as and when people continue to rely on their HA/HE environments in ways that may not be life & death...but are otherwise consequential when they don't work right.

Running independent apps to "tell you" when a device is absent would only really be effective if you create some variables that such apps change which you constantly check within your app independent of the triggering.

To me this begs for a better solution... either with your rule construct or even better with some enhancement to the RM tool kit to help (consider it hand holding, prevention). Like: how should this rule behave if the triggering device(s) has not been heard from in X amount of time, (and I'll constantly check that for you so don't worry).

Just puttin this out there, at the risk of being BLASTED away by one or more sorties of superior logic or otherwise.

1 Like

Here I am! :grin: Not really. I am pretty dumb when it comes to many aspects of this.

The downfall to this in my opinion is the overhead you would be having to process every time the rule triggered. It would have to figure out if it was appropriate to run the rule, or not every time you wanted to do something. That will surely slow the rule down if not the whole hub by processing these events.

And like most things, it is hard to prove a negative. Like how would the hub determine if you have a sleepy sensor or an active one? There isn't machine learning (at least not in this hub) to figure that out on it's own.

Maybe someday this can happen, but with the current state of technology I am not sure it can.

Yeah, I hear ya.

Maybe there's two categories/capabilities here.

a) centrally maintain a best guess on the presence/health/stability of the installed device base configurable to the device such that if you don't expect it to be reporting for days you can adjust for that, but if you know it should be reporting (AT LEAST temperature ) FREQUENTLY then you can adjust expectations to that.

b) avail the information in "a)" for use in Rules at the user's discretion so that they can handle the exception(s) of a device being "out of touch for longer than expected".

Taking a lesson from the auto industry, one approach to efficient and (mostly) reliable problem detection is OBD (on board diagnostics). It creates a table of all sensors and variables with upper and lower expected ranges and durations that each may be allowed to venture out of those ranges before a warning message is produced. It uses simple counters and comparisons and is executed in lower priority (non-realtime) tasks. Many other industries have taken this approach and modified it for their unique use-cases.

I think the hub event warnings was a first step in this direction. Unfortunately, it didn't expose enough of the logic to make it useful to debug and check from problems in a useful manner. Hopefully this logic could be improved and expanded so the user can make their implementation as resilient as each choses.
image

Yes, this is essentially what I was trying to ask. You explained it better than I did :joy:

I personally run both a battery rule in HSM and the "Device Activity Check" snartapp.

If you have battery pwered devices the battery monitor in HSM can be a useful indicator, but battery % reporting is flacky and not consistent across all devices. It doesn't mean anything on some devices. I then have Device activity check setup with a few different groups with different ranges for how often a devices reports. Some devices are great with this with good consistent reporting of lets say battery%. Ring Gen 2 contact sensors for example report every hour.

The problem with both of these methods you are depending on them to fail for some length of time before you find them. Unfortunately that is just the way it is when you consider sleepy devices.

2 Likes