Trying to Debug Hub Lockups

So, I have had this strange thing happening. Every week or so, my hub will in-explicitly lock up. I have looked through System Events and no, one of my 4 WebCore pistons is not firing at the time. In fact, they fired about 20 hours before that and nothing bad happened. And I'm not even home at the time, and nothing else is scheduled to happen at the time the hub locked up. The problem is, I was away so I wasn't able to log in immediately to see if there was anything in the logs that might have caused it to lock up. The only thing I have is the System Events which doesn't really provide me with any info.
So, my question is this, how do I start to figure out what could possibly be causing this since it only happens once a week. Are there logs that are persistent for a longer period of time? Or a way to write the logs to a file? I don't really want to start to remove apps as they will have to stay off for a month or more to prove that they were the problem. So, i'm looking at months and months of debugging before I find a solution. Any advice would be appreciated.

You already know the official answer is going to be: turn off all user apps and see if it happens again.... lol. So if it takes a week, turn them off for a week.

After the hub locks up and you reboot it you should be able to go into past logs and get an idea of what may have happened. I would start there.

Also. You mentioned WebCoRE. It’s a known problem app on HE. There are a lot of posts around the forum about it. I don’t know what others you have installed but that would be a big one to disable. Look to migrate pistons to the native rule engine and remove it completely. I know it may be a lot of work but the HE box will really appreciate it.

As I said, none of my webcore pistons fired at the time. I have 4 and the only action they perform is to parse JSON data from HTTP calls from IFTTT. I looked at my IFTTT activity and none of my applets fired at the time so webcore is not the cause.

Also, if you read most of the posts around webcore you'll see that I commented on them thoroughly. The likely cause of webcore lock-ups is related to the use of Global Variables and cross-firing of pistons. WebCore is not the issue.

And as I said, I'm not home when the lockup occurs so I can't get into the past logs.

No, unfortunately.

No, unfortunately. This has been asked before, and that was the answer.

But in order to prove which app it is I would have to remove them one at a time. And then I could only definitively say which it is if it doesn't lock up for like 2 or 3 weeks. That's a LONG process.

To me that is a big downside. If you're going to have an open system that supports user-developed applications, having support tools to allow for debugging is an essential piece of that.

No, you could choose to remove them all and see if it happens again. If not, then add them back one at a time. Not a fun process, which is why I bought a second hub.

How does having a second hub help me?

That doesn't means that it will not lockup, the program is running, I think webCoRE has a memory leakage or something, but we don't have tools to see that.

Because you could put the user code on the other hub, en masse or individually, to segregate loads and figure out which one is killing you. Assuming it is user code.

Or if you get to the point you have no user code on a hub, and it still dies, then it is clearly a hardware or base Hubitat problem.

Anyway, I can't add anything else to this discussion, and don't want it to devolve into arguing.

I'm just re-iterating that every single time this has come up before, support says to turn off user code and see if it still happens. You may/may not want to do that - and I understand. But there is no other enhanced logging that can be done, so it is going to be hard to figure out without doing so.

Good luck!!!

But how do you then not also have the added problem of having to run the hub-to-hub communication, therefore increasing the load and increasing the chance of a lock-up. It seems that introducing another hub would just complicate the matter, not make it easier.

And that is what I am pointing out is the problem. You may not like it but that's a feature that I think the hub needs. If you don't like it, then don't comment. But telling me the same answer over and over again doesn't help.

Then every lockup would follow within a period of time after a webcore piston. I was away for 5 days before the hub locked up. No webcore piston executed within that time.

Not necessarily, probably when the program is executing pistons it will lockup in maybe 2 days instead 5, just a theory.

I agree it is needed, I've said so many times. So send support a feature request... I did.

But as the product doesn't have that today, that will do zero in helping figure out your problem right now. Again, good luck! I hope you figure it out.

I don't see how an app that ran 2 days ago could still be persistent and cause a lock up that much later without causing a lockup when it is executed when I am home and it runs more frequently. it seems I would have seen it when I'm home for a week, execute the piston multiple times during the day, and never get a lockup.

And spending another $80 for a second hub, IMHO, is a ridiculous recommendation. I shouldn't have to spend another $80 to get the full use out of my first $100 purchase.

You don't "have to" buy another hub - you can disable the user apps on your existing hub.

I was trying to give you a second option that may be less painful long term. Don't kill the messenger, I was trying to be helpful.

And don't worry - I won't be making any more 'ridiculous' recommendations on this thread. Good luck.

And you made it...several times. Thank you. Moving on....

Your pistons don’t have to be firing to cause issues. WebCoRE is a resource intensive app and does a lot in the background. It’s been a while but one thing to try is minimize the amount of devices you give access to WebCoRE to only the devices you use. This should maybe help. That way it won’t subscribe to stuff you don’t use.

Sorry. When I think of a lockup I think everything is hung. So after a reboot you should be able to see what was last done before it hung. But I guess in your case logs are still being written to during the lockup. You can try turn off debug logging for everything you don’t need it for. Then just have it enabled for key apps. If it hangs again it would hopefully not overwrite itself before you have a chance to get to it. Do that in an app by app basis and see if it helps. Beats having to disable apps while you work on this.

WebCore has access to exactly 4 devices. Since the only thing my pistons do is parse JSON data, a function HE can't do, they don't need access to a lot of devices. And when I am home, and they execute multiple times in one day, I don't get a lock up any sooner. So, you can keep making that case and in most other people's case you might be right but you are wrong here. I don't know how to say it any plainer than that.

I have to reboot the hub remotely via a wifi smartplug. As i am not home at the time (and won't be for several days) I have to reboot it since I need the security part of Hubitat to work correctly. Otherwise the system isn't much good.