Here we go again & again

Angus_M · June 13, 2020, 3:51pm

@Boredom @csteele Good points. I think the devices I have with a lot of attributes from memory are things like Darksky & Open Weather, the Sensibo AC controllers (these are also Cloud so can be quite slow at times). I don't have any power monitoring. I will try to review these in the days ahead and try disabling if necessary. For weather I'm only polling time by time, not dynamically.

csteele · June 13, 2020, 3:58pm

As an experiment, I deleted the Darksky WX device just a few minutes ago... and took a backup.

Seems those 24,000 Atributes were a significant portion of the DB.

Angus_M · June 13, 2020, 3:59pm

wow.

Boredom · June 13, 2020, 4:08pm

My results were the same as yours. DB size decreased significantly.

One item to add, just disabling the device will not reduce the DB size, nor will reducing the number of active attributes. The old values were retained in the database. Only deleting the device and recreating removed the data for the old attributes. Disabling should disable the database inserts and would be a good way to identify that as a potential root cause.

bobbles · June 13, 2020, 5:17pm

Interestingly I deleted the Darksky device. There were 31,000 events logged.
Did a backup and my DB has remained the same size.
Very strange.

martyn · June 13, 2020, 5:20pm

This is something that's been mentioned many times before, we need a way to optionally turn off event logging completely and / or set our own limits.

I can't imagine that anybody needs to scroll through 100's of pages of event history? I would have thought that anybody that needs that level of data would likely be doing their own external logging / graphing anyway?

csteele · June 13, 2020, 5:25pm

I can't understand how that could happen. Even if you had 31000 one byte attributes, and none of the larger text and table data, the backup should go down 31k. After compression, that might be 3.1k and thus too small to "see" but again, I can't see how yours was so tiny, compared to mine.

On the other hand, I did spend a bunch of time coding reductions on the ApiXU driver and all of those improvements went into @Matthew DarkSky version. So maybe I should be happy that YOU were getting all the advantages I coded for, oh so long ago.

Angus_M · July 2, 2020, 3:31am

Latest update...for completeness...

Finally got around to switching all my Gledopto lamps over to my Hue hub. Was waiting to put these onto another HE hub but with the current supply chain issue decided enough was enough and put them onto Hue instead. It was quite a job to replace the devices with virtual ones, then do the remove and switch, and finally reassociated from the devices pulled across from the Hue integration app. Anyway, finally got it done. And performance is improved for sure (after a week or so testing)! Fewer issues with motion sensors getting dropped from the mesh. Faster response all around thus far. WAF a little improved (for a change lol). I'm still seeing some errors in the logs but it seems this change was a step in the right direction so I'm happy for that. One thing I did decide to do was leave most of the individual lamps just in Hue and bring across just the groups to HE. My logic was that this would reduce the traffic further, especially the polling required with Hue to synchronise the HE dashboards where needed. So if I want to change an individual lamp I can do it in Hue. But for the core automations which drive groups of lamps, they will still work fine in HE.

jtmpush18 · July 7, 2020, 5:06pm

It is an undisputable fact that the number of events in the log contribute greatly to the database size.
I have had some (minor) success in reducing the size of my database size by:

changing the device driver for every power plug to the simple driver that does not have power reporting (i.e. if you don't need that functionality, don't leave it in)
every time that you do a z wave repair, that increases the size of the event log (if you have a lot of devices)
if you don't need the "special" attributes of custom drivers, use the built in ones instead
delete all custom apps and drivers that you aren't using... it's only taking up unnecessary space

Angus_M · July 7, 2020, 5:41pm

Yeah, good advice. But I dont have many z-wave devices and absolutely no power reporting going on. So it's still really a mystery to me as to why my backups are so enormous.

jtmpush18 · July 7, 2020, 6:53pm

Database size is directly proportional to events.
Less events and less drivers with MANY attributes = less log entries.

I also think that there is a performance enhancement if you get the size of your database to less than 8GB, but I haven't been able to prove this.

Angus_M · July 7, 2020, 7:12pm

Just checked my recent hub backups. They are now at around 28Mb. Significantly less than before and a bit of a surprise how much it's come down. I've been moving a few rules over to Node-RED for lighting (so about 6 flows are disabled based on that recent work) and I deleted all my lamps from HE (moved them over to Hue) and just left the groups in HE linked across to Hue. So that probably removed around 20 lamps. What on earth was going on when the backup was way over 100Mb, I really have no idea. But anyway its looking better now and my hub performance is reasonable and no crashes or lockups for a long time now! I'm only rebooting the hub once a week now. I've definitely had my share of issues but through the actions listed above and now this work to remove time sensitive flows to Node-RED, it's looking good! I was going to buy another hub but maybe not necessary anymore (although I'm hitting the hub hard with websocket and Maker-API now for Node-RED and for my custom dashboard... so maybe just one more now the C7 is imminent)

scottgu3 · July 7, 2020, 10:57pm

@aaiyar I found this issue to be one of the worst with regards to my C4 hub responsiveness. Today my hub began to not just crawl, but practically perform in geologic time, even though it has Zero rules, just integrations and hardware...and...
dashboards... checked the dashboards after a reboot didn't solve the pliocene Era performance, and found two phantom devices. Deleted them, and Voila! Performance was restored.

Amazing. I think I'm going to nuke all but one or two dashboards, just to minimize the risk of this ocurring!

jtmpush18 · July 8, 2020, 12:46am

If you don't mind me asking,

What kind of phantom devices? zwave or zigbee? And how did you identify them?
how did you delete them?
I assume that you knew exactly where they were, and what they were?

scottgu3 · July 8, 2020, 1:08am

In this case, they were probably both zwave devices, but i can't say for sure. I just saw an "unknown" displayed on one tile, which usually indicates a sensor that I had configured to report temperature, and one with a circled ? Which I generally associate with a switch.

I had to exclude a Zooz Zse19 siren the other day, so that was likely one, as it gets a new id on include. The other one could have been another Zooz switch that i updated the firmware on, not long ago.

Getting rid of them was as simple as deleting the tiles from the dashboards. A little different than the Phantom devices people find with their Zniffers and z-sticks i think.

The only observation I have on this, is that the performance hit is strangely large. My hub was all but unusable. This seems a little disproportional for a bad reference, but I'm not sure how the dashboard works....

S.

brianwilson · July 8, 2020, 5:48pm

I'm having daily lockup/reboot issues as well again, but I've got my hub plugged into a span port to see if it IS a network stack issue or not. I have lots of network-based devices (Hubduino, Tasmota) and can see right where the device dies and the traffic doesn't appear to spike during those periods - so either it's a very fragile network stack that finally can't take anymore or it's not a network issue.

Guess I'll move my network-based devices to their own hub (I have 158 devices on 1 hub but have another that is my "dev" hub), and continue to look at Node-red and migrate off what I can from rule-machine.

It's super-frustrating to have an extensible platform like this but not be able to see what is going on under the hood to know what you need to get rid of. I realize the team continues to say giving "super users" access to that information wouldn't be helpful, but I tend to disagree. If I have issues on a my home server, raspberry pi or with an app I've written, I have the tools at my disposal to figure out what has gone wrong. I've been reduced to sniffing traffic and trial/error, all which take more time than I'd like to spend on it. Give me access to a beefy system/container/something just handle rules/network devices, then HubConnect can get me far enough to make this work.

bravenel · July 8, 2020, 6:39pm

There really isn't anything we can give you access to. Some people thought the "internal logs" was what they needed, but these wouldn't give you the insight you seek, They are of use to our engineers for looking at information internal to the hub, that would be meaningless to a user. We would rather spend our engineering resources digging into the problem than creating diagnostic tools that may not even hit the mark.

We are pushing right now on digging into networking issues with the hub. We are frustrated that you have the problem you report. If we had a solution, we would release it. Our next release will have some improvements, and hopefully one of those will solve your problem.

Angus_M · July 9, 2020, 4:03am

It's awesome you guys are so honest and transparent on these issues and are working hard to fix remaining performance problems where possible. All due respect on this.

brianwilson · July 11, 2020, 11:19pm

Good to hear there are plans in the works. So of course I took a backup, did soft reset and restore and it’s been up for 2 days. It was locking up daily. So what does soft reset do that a reboot doesn’t?

Ken_Fraleigh · July 15, 2020, 4:42am

A reboot doesn’t reboot the radios. Besides the database cleanup, maybe the soft reset reboots the radios.