[RELEASE] Device Health Monitor

:new: Device Health Monitor v1.1.0-beta — Now Available!

After several weeks of real-world testing I'm releasing the beta of Device Health Monitor — a companion app to Battery Monitor 2.0 that monitors how frequently your devices check in with the hub rather than their battery levels.

What it does: Device Health Monitor learns each device's normal check-in pattern using EWMA baseline learning and flags anything that goes quiet, checks in late, or stops responding. Unlike fixed-threshold apps, it adapts to each device's individual behavior — a motion sensor in a busy hallway and one in a rarely-used room are judged by their own baselines, not a single global setting.

Health Ratings:

  • :hourglass_flowing_sand: Pending — learning baseline (need 3 samples)
  • :green_circle: Excellent — checking in within 1.2x of baseline
  • :green_circle: Good — checking in within 2x of baseline
  • :orange_circle: Fair — worth watching
  • :red_circle: Poor — likely a problem
  • :skull: Offline — gone dark

Protocol Detection (automatic):

  • :large_blue_circle: Zigbee — directly paired
  • :purple_circle: Z-Wave — directly paired
  • :orange_circle: Matter
  • 🩵 Hub Mesh — linked from another Hubitat hub
  • :thong_sandal: LAN — Hue, Shelly, Ring, Harmony, Roborock, cloud integrations, app-created devices
  • :arrow_double_up: Bluetooth - Feature coming soon (C8 Pro Users only)

Key Features:

  • Single device picker — select all your devices from one list
  • EWMA baseline learning — adapts to each device's real behavior
  • Snooze — tap :sleeping: to silence a device from notifications for a configurable duration
  • Mode restriction — optionally suppress notifications during specific hub modes
  • Pushover support with markup tags
  • Very low hub overhead — no event subscriptions, scheduled scans only
  • Automatically detects broken automations and stale Rule Machine hub variable connectors

Real-world result from my own testing: The app identified a faulty Back Porch Motion sensor that was checking in every 14-30 minutes — way more frequently than the identical sensors nearby. Swapped it out and confirmed the hardware was failing. :dart:

Installation of App:
Option 1 — HPM (Recommended):
Search for Device Health Monitor in Hubitat Package Manager and install directly.

Option 2 — Manual Install:

Go to Apps Code in your Hubitat hub
Click New App
Click Import and paste the URL below
Click Save
Go to Apps → Add User App → select Device Health Monitor

https://raw.githubusercontent.com/jdthomas24/Hubitat-Apps-Drivers/refs/heads/main/Device%20Health%20Monitor/Raw%20Code/DeviceHealthMonitor.groovy

17 Likes

Nearly 24 hours of data, the app does a good job with all device protocols.
Using the app to track virtual devices and hub variables to make sure that my RM rules are running properly. The EWMA doesn't do a good job for these device types.
Suggest that the app identify these two device types differently from LAN devices. Then for these two types, average the Check-in times where the Last Seen value is less than the scan interval. This way the Check-in times are only used when the rules are active.
EDIT: once the values are selected, then do an EWMA.

How about an option to let the user sort the report by any of the columns by selecting the desired column header.

1 Like

More ideas about the different protocols.
Let each protocol have a different scan time. As long as the shorter scan times are a standard fraction of the longest scan time, this should be relatively easy to implement.
Since the rules for the window shades run only for a few hours a day, I'd like to have them run every 30 minutes.

1. Virtual devices and hub variable connectors as separate protocols They behave fundamentally differently from LAN devices. Virtual devices fire on demand, hub variable connectors only update when a rule runs. Both need different baseline logic.
Adding in todays update.

  • Virtual devices: device.typeName contains "Virtual" or driver name contains "Virtual"
  • Hub variable connectors: driver name contains "Hub Variable" or "Variable Connector"

2. Different baseline logic for those types The suggestion makes sense. Only sample when Last Seen < scan interval (meaning the rule actually fired recently), then EWMA on those filtered samples. This prevents the baseline from growing to match inactivity.
Adding in todays update.

3. Per-protocol scan intervals This is a significant rewrite. Not a quick add.
Not on the roadmap yet.

4. 30-minute scan option Will add a "0.5": "Every 30 Minutes" option with cron "0 */30 * * * ?".
Adding in todays update.

5. Sortable columns Not possible in Hubitat's native app UI. Paragraphs with HTML tables don't support interactive JavaScript. Would need the OAuth web endpoint approach like Device Activity Check uses.
No on roadmap yet.

:satellite: Device Health Monitor v1.2.0 — Now Available! SEE LINK IN OP

Improvements

  • Virtual Device Detection — Virtual switches, virtual sensors, and other app-created virtual devices are now identified as a separate protocol and displayed in a distinct color. Previously these appeared as LAN. Virtual devices fire on demand rather than on a fixed schedule, so their baseline learning now uses filtered sampling — only check-ins that occurred within the scan interval are used to build the baseline. This prevents the learned interval from growing to match periods of inactivity and gives a more accurate picture of whether a virtual device is actually firing when expected.
  • Hub Variable Connector Detection — Hub variable connector devices are now identified as their own protocol with a distinct color. These devices only update when a Rule Machine rule runs, making them excellent canaries for broken automations. Like virtual devices, their baseline now uses filtered sampling so the health rating reflects rule activity rather than time since last fire. A hub variable connector showing Fair, Poor or Offline is a strong signal that a Rule Machine rule has stopped running.
  • 30-Minute Scan Option — A new Every 30 Minutes option has been added to the Device Scan Interval setting. This is useful for devices that fire on short cycles — window shade automations, frequent virtual switch rules, and similar. Note that more frequent scanning increases hub load slightly — use hourly or longer for large device lists.
  • Updated App Guide — The App Guide & Reference page has been updated to document virtual device and hub variable connector detection, the filtered sampling behavior, and protocol color reference.

Protocol Colors (updated)

Color Protocol
Blue Zigbee
Purple Z-Wave
Orange Matter
Cyan Hub Mesh
Teal LAN
Pink Virtual
Yellow Hub Variable

Do I Need to Reset Device History? You do not need to reset anything. Existing history is preserved. Virtual and hub variable connector devices will be re-identified on the next scan and their baselines will begin using filtered sampling automatically. Devices may briefly show updated protocol labels and return to Pending while the filtered baseline rebuilds — this is expected and will clear within a few scan cycles.

Known Issues

  • Bluetooth devices (C-8 Pro only) — controllerType value unconfirmed. Bluetooth devices will appear as LAN until confirmed. If you have a C-8 Pro and can identify the controllerType value for a Bluetooth device in the hub logs, please post it in the thread.
  • Sortable columns — not currently possible in Hubitat's native app UI. Noted for a future consideration if an OAuth web report endpoint is added.

Not in This Release

  • Per-protocol scan intervals — this requires multiple device groups and a full rewrite of the scan engine. Planned for v2.0.

Coming in v2.0

  • Multiple device groups with independent scan intervals and offline thresholds. MAYBE
2 Likes

I guess there's nothing you can do to identify device type for devices joined from another hub via hub mesh ...

Nice, thank you :slight_smile:

1 Like

Great update. Significantly decreased the problem children. Thanks!
Here's a thought. A button above each column to do the sort. Perhaps have the button stay highlighted until the next sort or report is done.
Default sort would be the current sorting method.

Edit: Hope this is easier to implement

Closer look found that the hub variables were still identified as LAN. Perhaps because they all have variable connectors.

  • Sortable columns — not currently possible in Hubitat's native app UI. Noted for a future consideration if an OAuth web report endpoint is added.

Send me more data on the LNK devices, this was a pain point and i would like to find a work around.

:satellite: Device Health Monitor v1.2.1 — Now Available! @danabw

Improvements

  • Hub Mesh Sub-Protocol Detection — Hub Mesh linked devices now show their underlying protocol where it can be determined, rather than displaying as plain Hub Mesh. The app uses a layered detection approach: Encoding data values (used by LUMI/Aqara devices), Zigbee cluster data values preserved on the linked device, driver name heuristics, and known manufacturer names (CentraLite, LUMI, IKEA, Sonoff, Tuya, and others). When detected, the device displays in its real protocol color with a sub-label — for example Hub Mesh (Zigbee) in blue or Hub Mesh (Z-Wave) in purple. When the underlying protocol cannot be determined the device continues to show as plain Hub Mesh in cyan.
  • Protocol Override Page — Some Hub Mesh linked devices and LAN devices cannot be auto-detected regardless of the heuristics used — they simply don't carry enough data on the receiving hub. The new :wrench: Protocol Overrides page (accessible from the main page) lists only those devices and lets you set the correct protocol manually from a dropdown. The override always wins over auto-detection. If you set an override and a future update improves detection to catch it automatically, the override still wins until you clear it back to Auto-detect. Overridden devices show a small (override) label in the Activity Summary table so you always know which devices have been manually set.
  • Updated App Guide — The App Guide & Reference page has been updated to document Hub Mesh sub-protocol detection, the layered detection approach, protocol override behavior, and the override-wins rule.

Do I Need to Reset Device History? No. Existing history is preserved. Hub Mesh devices will be re-identified on the next scan and will begin displaying their sub-protocol automatically where it can be detected. No reset is needed for any device.

Known Issues

  • Some Hub Mesh linked devices carry no cluster data, no Encoding field, and no protocol keyword in the driver name — these will continue to show as plain Hub Mesh. Use Protocol Overrides to set them manually.
  • Bluetooth devices (C-8 Pro only) — controllerType value unconfirmed. Bluetooth devices will appear as LAN until confirmed. If you have a C-8 Pro and can share the controllerType value from your hub logs, please post it in the thread.

Coming in v2.0

  • Multiple device groups with independent scan intervals and offline thresholds
1 Like

:satellite: Device Health Monitor v1.2.2 — Now Available!

Bug Fixes

  • False Offline readings fixed — Offline is now triggered exclusively by the configured hour threshold. Previously a skewed baseline could mark a working device as Offline. The ratio check now maxes out at :red_circle: Poor — a device will never show Offline unless it has genuinely gone silent for the configured duration.
  • LAN & Hub Mesh devices require 5 samples — Hue bulbs, Shelly, cloud integrations, and Hub Mesh linked devices now need 5 check-in samples before health scoring begins, up from 3. This prevents false ratings from irregular early intervals.

New Features

  • :warning: Low Activity warning — Devices monitored for more than 7 days with fewer than 3 samples show :warning: Low Activity in the Samples column. Normal for infrequently used lights, fans, and switches — informational only.
  • Per-device snooze toggle — Snooze can now be enabled or disabled entirely from Monitoring Settings. When disabled the Manage Snoozed Devices link is hidden and all active snoozes are cleared.

UI Cleanup

  • Monitoring Settings — Scan Interval, Offline after inactivity, Snooze, and Mode Restriction are now in a single collapsible section with a one-line blue summary showing all current values.
  • Offline threshold renamed from "Mark device Offline if no activity for X hours" to Offline after inactivity (hours). Default raised from 24h to 48h.
  • Help & Support — App Guide, Community Thread, and Buy Me a Coffee merged into one section.

Do I Need to Reset? No — existing history is preserved. LAN and Hub Mesh devices with 3-4 samples will briefly return to Pending while collecting the additional samples needed.

Known Issues

  • Devices that are very infrequently used (guest room lights, attic fans, decorative fixtures) may show :warning: Low Activity indefinitely if they are rarely operated. This is expected — the app can only learn from real activity. If you want to monitor these devices consider using Rule Machine to periodically poll or trigger them.
3 Likes

:pushpin: Known Limitation — Fans, Lights & Switches Showing False Poor Health

Some of you have noticed manually controlled devices like fans, lights, and switches showing :red_circle: Poor health even when they're working fine. Here's the short version of why and what's coming.

Why it happens: The app learns each device's normal check-in pattern. For a fan used every hour during the day, it builds a ~1h baseline — then when nobody touches it overnight it looks sick. The device is fine, the baseline just can't tell the difference between "not being used" and "broken."

The app works best for: Sensors, Zigbee/Z-Wave mesh devices, LAN integrations, hub variable connectors, and automations. These have predictable check-in patterns the app can learn reliably.

Workarounds for now:

  • Raise Offline after inactivity to 72-168h for irregular devices
  • Use Snooze for devices you know are fine
  • Don't add rarely used devices to the monitored list — the app adds the most value on sensors and integrations

What's coming:

v1.3.0 — Offline Verification When a device reaches Poor or Offline the app will send a refresh() command and wait one scan cycle. If it responds, health clears. If it doesn't, the alert fires. False positives on controlled devices should drop significantly.

v2.0 — Active vs Passive Classification Manually controlled devices will get an "Active" mode — offline threshold only, no ratio scoring. No more false Poor ratings on fans and lights.

Thanks for the patience and feedback — this is exactly what makes the app better. :coffee:

3 Likes

:satellite: Device Health Monitor v1.3.0 — Now Available!

New Feature — Offline Verification

When a device reaches :red_circle: Poor or :skull: Offline health the app now automatically sends a refresh() command to the device and waits one scan cycle before treating the alert as confirmed.

  • Device reaches Poor or Offline during a scan
  • App sends refresh() to the device
  • Activity Summary shows :arrows_counterclockwise: verifying... next to the health rating
  • Next scan — if the device responded, Last Activity updates and health improves naturally
  • If it didn't respond, the problem is confirmed real

This significantly reduces false alerts on LAN integrations, manually controlled devices, and anything with an irregular check-in pattern. Hub load impact is minimal — only Poor and Offline devices are polled, not all devices.

Note: Not all devices support refresh(). Battery-powered Z-Wave devices (non-FLiRS) will not respond. Zigbee, mains-powered Z-Wave, LAN, and Hub Mesh devices generally will.

Other Changes

  • Offline threshold default raised to 72h — 24-48h was too aggressive for infrequently used devices. New installs default to 72h. Existing users can adjust in Monitoring Settings.
  • Info page updated — new sections covering Offline Verification and Manually Controlled Devices with guidance on when the app works best.

Do I Need to Reset? No — existing history is preserved and the verification state is added automatically on the next scan.

Coming in v2.0 Active/Passive device classification — manually controlled devices will get an "Active Only" mode that skips ratio scoring entirely and only alerts on the hard offline threshold.

Once we get to BETA v2.0, it will be released to HPM.

4 Likes

It looks like you are getting the data through Maker API. All hub variables need a variable connector to be available for Maker API. These hub variables are still showing as LAN in the app.
Suggestion: for devices and hub variables that update intermittently, could you save the maximum Last Seen time for each day, then only start looking at the health once the current Last Seen time exceeds the typical one.

It looks like you are getting the data through Maker API. All hub variables need a variable connector to be available for Maker API. These hub variables are still showing as LAN in the app

The app uses the scheduled polling of Hubitat's built-in Last Activity and not Maker API. You can manually assign if they are not picked up correctly.

Suggestion: for devices and hub variables that update intermittently, could you save the maximum Last Seen time for each day, then only start looking at the health once the current Last Seen time exceeds the typical one.

There will be a combination approach released.

:satellite: Device Health Monitor v1.3.1 — Now Available!

:bug: State persistence fix (critical) Samples were not being saved between scans due to a Groovy/Hubitat state mutation bug. Mutations to nested map collections (data.samples << value) modify the in-memory object but Hubitat doesn't automatically persist the change. The fix forces explicit re-assignment after every write. This was the root cause of devices showing 0 samples or "Never" for Last Seen even after multiple scan cycles.

:zap: Dynamic sample gate The minimum time between recorded samples is now dynamic instead of a hardcoded 10 minutes. The gate is set to half the scan interval, capped at 30 minutes. On a 3h scan the gate is 90 minutes; on an hourly scan it's 30 minutes; on the 30-minute scan option it's 15 minutes. This fixes devices like switches and contact sensors that fire multiple times in quick succession — previously all of those events were ignored, preventing samples from ever accumulating.

:link: Clickable device names Device names in the Activity Summary and Problem Devices tables are now clickable links that open the device edit page directly in your hub's UI. Works on your local network only — same behavior as Battery Monitor 2.0.

2 Likes

Did a protocol override for 2 devices that are child devices of a Zooz double outdoor plug. Now getting this error for every scan.


Would be great if the protocol override page allowed us to undo sn override

I will repair this shortly... Override was suppose to stay unlocked after you made the entry.... When you fix, or add something, other things tend to break.

This will be released in 1.3.2 today.

2 Likes