[RELEASE] Hub Watchdog - Simple way to monitor if your hub is slowing down or not

bobbles · September 2, 2020, 3:18pm

Oh. That's a shame. I did get the odd 1024 warning very occasionally but just ignored it as I was well aware of the cause and the fact that it has no impact.

Would it be at all possible to make this somehow configurable?
I would like more data points if possible but fully understand if you wish to leave as is.
Maybe give us an idea if there are lines in the code we could tweak and on our heads so be it.

I fully understand if you tell me to put this request where the sun doesn't shine.

Rxich · September 3, 2020, 3:56pm

This app is a fantastic tool. However now I'm really pissed off that the hub is slowing down so often. Before I knew it was slow when the light didn't come on and I fell down the stairs, now I have the data, before I only had bruises.
The 2 test devices are physically line of sight, in the same room as the hub. They should never have released the C7, before fixing the slowdown issues with existing hubs.
09-03 11:50 - 0.722 Zwav
09-03 11:49 - 0.665 Zwav
09-03 11:48 - 0.754 Zwav
09-03 11:46 - 2.38 Zwav
09-02 23:27 - 0.957 - Zwav
09-02 23:26 - 0.718 - Zwav
09-02 23:25 - 0.758 - Zwav
09-02 23:24 - 0.755 - Zwav
09-02 22:39 - 10.295 - Zwav
09-02 22:28 - 12.387 - Zwav
09-02 22:26 - 0.717 - Zwav
09-02 22:25 - 0.782 - Zwav
09-02 22:24 - 1.532 - Zwav
09-02 21:39 - 0.807 - Zwav
09-02 21:25 - 0.862 - Zwav
09-02 21:24 - 1.075 - Zwav

nh.schottfam · September 3, 2020, 4:25pm

So how to diagnose slow downs:

odds are these are running out of memory. In another thread there is a discussion on how to see the hub's memory status (if running current version of FW)
[Wiki] Hidden Features
so what can you do about it?
- assuming you are running current firmware (that has the most fixes for any memory leaks):
  - you can try trimming your db further
    Fastest my hub has ever been
    [Wiki] Hidden Features
- you can try to reduce apps and their use of memory
- this might be reduce how frequently they run (and therefore how much they try to store in state / settings variables
- weather apps have been notorious memory, events, and DB hogs
- this might be uninstalling apps things, (which I know may be painful)

In general

the Hub has a fixed amount of memory (today's total is 1GB for OS, JVMs, etc)
- I'm sure HE will offer us a hub with more memory one day
anyone can run it out by just keep installing more and more things
the database can be a big consumer of resources, so reducing the number of events it stores reduces the amount of memory it needs - this has been my first go-to to get more available memory.
app writers can try to be more memory and db friendly, but this another whole discussion on tricks for this
you can see 'some' of the memory an app uses by looking at settings, state, events for an app or driver. This is in the 'gear' icon next to the app or device (HE console -> apps, HE console -> devices)

BrianP · September 6, 2020, 3:03am

So, ever since the update to Hub v2.2.3, my virtual device has been throwing occasional values that seem like they are the correct value + 5 s. For example my readings1 values are :
0.078, 0.099, 0.076, 0.136, 0.09, 0.133, 0.098, 0.111, 0.087, 0.104, 0.07, 0.095, 0.1, 0.111, 0.102, 0.106, 0.108, 0.107, 0.092, 0.082, 0.142, 5.171, 0.114, 0.1, 0.201, 0.082, 0.078, 0.067, 0.088, 0.09, 0.085, 0.109, 0.089, 0.11, 0.09, 0.112, 0.09, 0.084, 0.094, 0.085, 0.097, 0.112, 0.084, 0.108, 0.094, 0.116, 0.087, 0.085, 0.091, 0.153, 0.072, 5.171, 0.106, 0.104, 5.143, 0.112, 0.101, 0.069, 0.097, 0.122, 0.109, 0.103, 5.167, 0.091, 0.109, 0.1, 0.091, 0.092, 0.083, 0.12, 0.1, 0.099, 0.116, 0.095, 0.109, 0.101, 5.134, 0.14, 0.103, 0.219

The next point is a quick test to make sure it's actually slowing down, and that goes back to being about a tenth of a second.

I saw something in the code about waiting up to 5 s and then giving up (I looked a bit ago, maybe last week). I'm guessing that's happening for some reason, but I am not sure why.

Does this happen to anyone else? Is there anything I can do to fix that?

I haven't looked at logs yet, as I know it's set to re-test after a minute (I have max number of fails set to 3 before notifying me), so it doesn't really bother me. Just wondering if I should look into why that is happening on my hub, or if that's just normal behavior. A minute later, I'm back to about a tenth of a second, so no notification/action.

mattias · September 6, 2020, 10:30am

My hubs are having the same issue since 2.2.3 was installed. I believe others have also reported seeing this in other threads. I haven’t noticed any slowdowns with the hubs other than what watchdog reports so I haven’t actively troubleshot the issue myself.

BrianP · September 8, 2020, 1:14am

Thanks, I wondered if it was a new feature...

I'm not worried about it either, so I'm just going to live with the spikes. I have a plot on my dashboard of the last 24 h of readings, and it shows the range and last reading. it's often off-scale with 5+ s for upper range, but I also haven't seen any slowdown issues, so I'm going to ignore it. I may get rid of the plot, anyway, so no big deal.

kahn-hubitat · September 8, 2020, 8:50pm

How do i set the warn and maxvalue in the driver.. the current driver seems to not allow me to enter a number and only sets them to the words..

ie

bptworld · September 8, 2020, 8:58pm

You set both in the app.

kahn-hubitat · September 8, 2020, 9:31pm

got ya it update eventually.. I went through the whole post.. but cannot find docs on what the reporting child is for.. thanks

bptworld · September 8, 2020, 9:37pm

lol, you got the wrong developer if you want docs. This is for fun, not a job.

Follow the prompts within the app and it'll all work out.

What's a 'reporting child'? If you mean the device, it's to hold the data for the Examiner child app.

kahn-hubitat · September 8, 2020, 9:42pm

ya what is the hub watchdog examiner child app. i have not set one up..

bptworld · September 8, 2020, 10:02pm

It's used to compare the data between devices. (virtual, zwave, zigbee)

kahn-hubitat · September 9, 2020, 2:26pm

Unless I'm missing something the reporting is backwards the virtual switch has been running longer 40 data pts vs 18 for the zigbee one I set up yesterday. Seems correct in the chart but backwards in the detail report?

bptworld · September 9, 2020, 2:55pm

Other than dropping from 80 data points down to 40 per device (see discussion above - I'm not going back into that), reporting hasn't changed since this was first created.

kahn-hubitat · September 9, 2020, 5:26pm

Ya what I see in the chart is the chart looks backwards the purple is the zigbee with less data points but purple is showing on the vt chart instead of the zigbeeb chart

eibyer · September 11, 2020, 10:31pm

I keep seeing this red blip on my data points and noticed that it occurs at the top of the hour every time... then I remembered I have my Full Kiosk refreshing every hour when there's motion in the area.

lewis.heidrick · September 12, 2020, 5:11am

Seen a couple people having issues with FK. One user it completely locks up their hub.

eibyer · September 12, 2020, 2:50pm

Hmm, it wasn't my FK refresh that was causing it, something else is. I will have to watch the logs at that time period to see what's doing it.

cwwilson08 · September 12, 2020, 3:14pm

Depending on which firmware I believe there is database cleanup once an hour as well as hourly zwave repair -

eibyer · September 12, 2020, 3:19pm

Ah, I haven't read anything about hourly database cleanup, I'm on a C7 with .145. I pulled up my past logs and there was nothing that stood out of the normal during the time frame where it has been spiking.