[RELEASE] Hub Watchdog - Simple way to monitor if your hub is slowing down or not

That's my metric of choice...but I love options so I've been keeping an eye on this thread. Looks like @bptworld made a very useful app...and requires a LOT less time investment to setup. That said my NR websocket setup has next to no impact on the hub so I'll stick with it for now.

This app can be setup easily by a lot more users and hopefully we can find some correlations with the hub slowdowns.

1 Like

I did disable it for some time. Hub stayed slow as measured by lights and outlets coming on. So I turned it back on :slight_smile:

I'm about to set it up too on my NAS in docker.
Somebody reported that Raspberry memory is not enough.

It will be awesome if Hubitat release some official built in spedometer.
but it may also open can of worms.
that people who didn't notice slowdown start requesting help based on measured numbers.
I'm good example, for my human perception its not so much noticeable.
It is just numbers, platform work pretty good otherwise.

I get exactly what you mean. In my case, I wanted some "confirmation" that my perception of a slowdown every 3-4 days was not just my imagination. And @bptworld's Hub Watchdog confirmed that .....

I tried finding a "solution" today. In the end, I think that scheduled reboots every 3 days works pretty well ....

1 Like

brutal but effective
image

Every driver we publish updates state based on the device reported state change, none of them issue events when the digital command is issued.
For devices that aren't capable of self reporting (which is only very old zwave devices at this point), in such cases we issue a report request after the command request.

Good to know! To me, that is the only logical way to do it...

But I can point to probably 10 ST or HE user drivers offhand that do the digital events immediately in the on/off/level functions.

Thanks for the info!

Sure, likely they don't understand the device and or protocols well enough to get them to report as required, who knows...

Since it seems almost impossible to find a complete solution to hub slowdowns, and the fact that most of us want the openness and flexibility to run whatever we want on our hubs anyway which will only perpetuate the problem, wouldn't it make more sense to put more effort into making sure that hub reboots don't negatively affect apps/drivers and that best practices are available for code to elegantly recover from reboots? I for one am perfectly ok with doing a scheduled reboot every night. But the problem I have is the effort thereafter to fix up issues that don't recover well (eg. Chromecast beta app and others needs to run through a rediscovery process, I don't have fixed ip addresses due to a router limitation at this time so that causes issues with some WiFi devices etc.). I agree we of course need to keep pressure on fixing underlying issues with hub performance but let's also work on how to improve reboot recovery performance.

Could someone explain the difference between the 'mean' delay and the 'median' delay please.
Isn't the mean delay an average of all the measured delay times? If so, what's the median.
Thanks.

The mean is the sum of all the numbers in the set divided by the amount of numbers in the set . (Average)
The median is the middle point of a number set, in which half the numbers are above the median and half are below. (Most numbers have this value)

Thanks for the reply. Much appreciated.
The reason I asked is the figures you have have given above,

show zigbee to be wildly different. All your median figures are lower.
Just wondering what figure is the best to use.

It means he must have 1 or more zigbee delays which are huge.

Thanks. Makes sense.
Median it is then. :+1:

What's a reasonable warning / max threshold for zwave? I set up the watchdog last night, and my zwave has been pretty consistently high timings. I'm currently controlling the laundry room light switch (GE zwave non-plus light switch), until my additional zwave plug gets here, for this dedicated purpose -

Number of Data Points: 57
Over Max Threshold: 48
Over Warning Threshold: 8
Current Max Delay: 0.5

Mean Delay: 0.77
Median Delay: 0.629
Minimum Delay: 0.269
Maximum Delay: 2.198

Zigbee's not been bad, with mean delay of 0.412 and median delay of 0.396, and virtual with mean 0.18 and median 0.117

@bptworld thank you very much for this great app!

Looks like I have some weird stuff going on. I only have 1 test setup on a virtual device every minute. All datapoints are "black" except one reading 49.966 which is way off. Any thoughts on that?

I do experience random slowness in Zigbee switches reacting to hub commands lately also (sometimes I'm even waiting 30 seconds, or so, before it reacts).

@bptworld do i use the same virtual device for all test child's or separate ones? Also what type do i select for the dashboard is it attribute?

Colors are set in the driver...

As for why you have the high reading, only you can look into that one. Try to see what else ran at the time. Where it's just one, probably just a fluke. If you had high readings for 3 or more, then it would be something to look in to.


Pick the device you created, then Attribute, then whichever State you want to display (dataPoint1, dataPoint2, dataPoint3, etc...). You can see the States in the device.

thanks

2 Likes

Definitely seeing a steady increase in all device types since installing. Hub was rebooted around 6:30 on 10/2. Z-Wave has been terrible from the start. Deployed 3 Aeotec 6 Z-Wave repeaters on 10/3 and 10/4 with no noticeable improvement.

I'm running no custom apps with the exception of Hub Watchdog, and only custom drivers are Hub Watchdog and Zooz Double Plug. Zooz plug not included in any automations.

Hub details below. Not real sure what next steps to take.