Arrrgggh - here be Zombies!

storageanarchy · May 21, 2019, 9:14pm

I'm curious if anyone else has noticed that hard code failures in an app or a driver appears to significantly slow down processing on the Hub.

In my new Universal Ecobee Suite, there is a large main loop that gathers, filters and forwards data from the Ecobee servers to the Ecobee thermostat & sensor device drivers. This loop is scheduled to run periodically. I've noticed that if something fails in that main loop, or if one of its child applications fails for any reason, the whole hub will start slowing down. And it seems the more failures, the slower things get.

It's almost as if these scheduled threads and/or one-shot applications don't die when they fail, and they continue eating up resources. My only indication is that the loop cycle times will double or even triple once a failure has happened. And the only way to get back to "normal" performance is to reboot the hub.

Now of course, the code shouldn't fail, and for the most part if it does it is something I've coded wrong. I squash the bugs as fast as I find them, but there's really no such thing as bug-free code.

Is there perhaps a garbage collection/dead process eradication issue lurking inside the Hubitat code that's leaving Zombies free to roam and eat available resources?

UPDATE: My latest version uses far less resources, and generally will gracefully recover from any errors. Erros experienced by other applications on your Hubitat do still seem to effect overall pefromance (not just my Suite). As a results of code changes I have made, the latest versions do not experience as serious a slowdown, even when the overall Hubitat performance has been degraded.

stephack · May 21, 2019, 10:47pm

Interesting. There has been a lot of conjecture as to what causes hub slow downs...that sometimes leads to a crash of the hub entirely. Nothing has been confirmed but errors are no be avoided at all costs. My hub has become a LOT less fragile with the past few firmware updates but I also removed a lot of unneeded apps and driver's.

Of great concern to me (unvalidated theory) are driver's and apps that talk on the LAN and use http calls of any kind...especially those that communicate out to the internet. I updated most of my apps and driver's that talk on the LAN/WAN to use async calls as well. My hub seemed to get hung up at times if the http calls went unanswered or generated any errors. Using async calls where possible helped a lot with this.

stephack · May 21, 2019, 11:01pm

Also, there is some recent discussion about this here.

storageanarchy · May 21, 2019, 11:40pm

Thanks for the info.

I do use asynchronous http and local hubAction everywhere I can. I have also significantly reduced writes to state/atomicState because they can stall things if the writes are large and/or there are write errors.

But the errors that seem to really kill things are those uncaught errors (type casting, null devices, etc)

ogiewon · May 21, 2019, 11:57pm

Try-Catch is your friend for dealing with unexpected exceptions. Adding this error handling to all functions has saved me time and time again.