Feature Request: Additional system info

erktrek · January 18, 2019, 1:29pm

Every time I contact support they provide some excellent information that I can't easily get or put together myself. It would be helpful and may cut down on support tickets if such info could be put on a support page off of the hub info page or something.

Things like ghost/non responsive devices, drivers/apps that are too spammy or taking up too much processing - anything that the first level support response would look at. Maybe with thresholds to filter out the normal stuff.

I know I can get some of that already with some digging but having it consolidated on a page, highlighting the problem areas would be great and maybe save HE some $$$ in support costs.

keithcroshaw · January 18, 2019, 2:40pm

I know it could open a can of worms but I would feel much more in control of my system if I knew the CPU, RAM, individual App consumption (Might be a huge pain.), and the Z-Wave ghost info that @erktrek mentioned.

erktrek · January 18, 2019, 2:51pm

I assume the community is growing faster than the support team so it's probably becoming harder and harder to deal with all the issues and otherwise that come up.

There is probably a basic set of information that the Support staff uses over and over. That data likely points to solutions for most of the common issues. I know it has for most of my support tickets. If that data were provided up front then possibly it would eliminate a chunk of tickets and empower the community to handle the low hanging fruit issues freeing up Support to handle the more esoteric stuff.

bravenel · January 18, 2019, 2:58pm

Enough said. We aren't going in this direction, certainly not with respect to CPU, RAM, app consumption, etc. This is not information that we ever look at, it's not helpful or meaningful, and would create more problems and support issues than it would solve.

We will continue to improve error reporting to the Logs, and will continue to improve the Z-Wave implementation wrt issues like ghosts.

Yes, there is. It's custom apps/drivers identification -- that they are present. You have that information already. 90% of our support tickets involve custom drivers and apps. I hope this sheds light on what we're facing. We aren't able to support the debugging of custom software. We've added tools to help you do that, such as the ability to disable any app or driver. If you are having a problem, there are two basic tools: the Logs (where errors are shown --> these reveal bugs, for both built-in and custom code), and disabling custom code to isolate the source of the problem. Logs are the primary tool used by support, and the primary means that engineering has to identify issues.

erktrek · January 18, 2019, 3:03pm

I was looking for basic info like ghosts, maybe devices that are barely in range stuff like that. Some sort of operational dashboard or something.

Just was thinking about all my queries to Support - hate to keep bugging them.

JasonJoelOld · January 18, 2019, 3:15pm

Maybe I'm just a jerk, but if I don't have tools to troubleshoot myself I have zero problem bugging support. That is the developers doing, in my opinion.

That said my system is basically trouble free at this point - other than the 4 or 5 usability bugs I've reported to support (which are minor).

erktrek · January 18, 2019, 3:18pm

I keep fiddling with things which has been my downfall but I'm learning. Recently have gotten a lot more conservative though.

JasonJoelOld · January 18, 2019, 3:25pm

You have to appreciate that turning off apps is easy, but turning off drivers is not and has other implications.

It would be good if there was a better user led troubleshooting step for drivers other than "turn it off".

bravenel · January 18, 2019, 3:28pm

There is. Put debug code in the driver -- you have the source for it. Or ask the author to do that. Adding log.debug is the method we use when developing apps and drivers. Sometimes you just have to hunt down a problem in the code step by step, using log.debug to look at what's going on.

JasonJoelOld · January 18, 2019, 3:29pm

And I do. Extensively.

If all driver issues originate with an error, and not loading or other undiagnosable issues, then I agree.

bravenel · January 18, 2019, 3:34pm

There are any number of ways that code can go off the rails. Loading the driver is not one of them. It's just code that executes like any other, including apps. Not all driver issues originate with an error, It's just that an error is a big red flag that demands investigation and resolution. With many support tickets we see, there are errors all over the logs. If something isn't working right, look at the logs first. Often, you can catch the culprit making a null pointer reference, or some other basic code failure. The harder ones don't show any error at all, but don't work properly. That's where log.debug becomes the tool of choice. BTW, this is not something that our support team does.

cuboy29 · January 18, 2019, 3:51pm

Maybe an enhancement can be made to help. Add another tab to System Events to show errors ("System Errors" ) and make these error entries persistent until user clears it. That way we won't have to constantly leave live logging open to catch errors.

bravenel · January 18, 2019, 4:02pm

Sorry, but no, Wouldn't you rather we invest our efforts improving the basic hub for everybody, as opposed to investing for people with buggy code?

gavincampbell · January 18, 2019, 4:42pm

Being able to port code is a blessing and a curse.

Its nice that we can port (fairly easily) code over to work in HE... but on the other hand, so much bad code is being ported over and marked as working without proper testing.

I was guilty of porting over everything when I first came to HE. But as I learned groovy, writing apps and drivers are started to see how bad the code really was. Just because it looks like its working for you doesn't mean its a good app for the hub. There is a lot going on you don't see and its hard to test every method in the code.

There are drivers (supplied by manufactures) that have bugs in the code where the zwave configure commands actually send nothing due to bad logic. There are apps that send way too much data and 'discovery packets' and subscribe to way too many events that are not required. There are apps that have polling in them but don't recovery properly after a reboot and some that get stuck in loops that run on forever and too fast and you would never see in the logs. Even worse a lot of these drivers for popular things are running on unsupported/undocumented api's and hacks to get them working. Not even the company would support it.

I have since rewritten almost every app I have. I always start with the original code, strip it all the way down learn how it works and build it up to what I need and no extra fluff. I have done this with every app I need and rewritten a bunch of apps from scratch (even replace some of the built in apps) and my hub is working perfectly. The WAF is now very high (except for my dome siren that dings multiple times) and she doesn't even realize how much is automated for her.

The best part about doing this is that I know it inside out, if something isn't working right, I don't need to capture debug logs contact a dev and troubleshoot. I jump right in and fix it myself right away. If I need a new feature then its usually a simple as adding a few new input statements and logic changes and BAM done in less than an hour.

I know not everybody can do this but hopefully it sheds some light on what support has to deal with sometimes. They have to draw a line between supporting their hub and code, 3rd party code (which can be thousands of lines of code) and 3rd party hardware issues with zigbee/zwave meshes and bad devices. Yes HE has bugs too but they know the % that are app/driver related.

As a power user a CPU % is always a wanted feature. But if I did have it and see my cpu at 100% the first thing I'm going to look at is my apps/drivers. Identifying the specific would would be a pain, where this would come into play. I bet 90% of users (especially when they start to get more no power users on board) will never use this though.

It would be nice to have a central repository of ported code so people could work on it together etc... but I also found you spend so much time trying to convince dev's as to why their way is causing problems or better/more efficient ways to do it sometimes that its just not worth it. I can rewrite it in less than half the time and it works great for me.

Just my thoughts. I understand both sides of the argument though.

gavincampbell · January 18, 2019, 4:42pm

Damn. These posts never look as long when I'm typing them out.

bravenel · January 18, 2019, 5:10pm

Gavin, Thanks for your story. I really relate to what you are saying, as this is the way I got into writing my own apps. Writing my own apps led to how most of the built-in apps on Hubitat came about. Rule Machine, Motion Lighting and Mode Lighting are apps that I've worked on for 3+ years. When someone reports a bug against one of the apps I wrote, I almost always know right where to look. There's a large enough body of code now that bugs are inevitable, but thankfully they are usually easily fixed. Adding features is often an exercise done in an hour, for the reasons you articulate well.

So my comments below are not really directed to you, but to the general audience who advocate for cpu % being displayed in the hub.

This is of almost no value in our system. You have multiple ways to know your hub has gone south. The vast majority of times that I've done it to myself, I've known right away what caused it -- that last change I made to the code I just installed! Oops, that recursion went off the rails, or whatever. I have enough experience as a developer to know what is risky (loops with logic for termination, recursion, etc.), so if I'm in a risk prone piece of new or modified code, and the hub starts acting wonky, I don't have time to worry about %cpu usage (which I can see if I want to). I want to kill that app asap before it crashes the hub.

I know from internal discussions that none of us ever look at cpu% as a useful diagnostic tool. Those of you who advocate for this are projecting from other environments where perhaps this was useful. Our hub is implemented in Java, running on the JVM. If you look at cpu% at the OS level you would see that java is running -- whoopee, I already knew that. You aren't going to have any visibility into the jvm and its threads. For whatever it's worth, what you would see is that java is consuming from 5% to 25% of cpu with occasional spikes higher, usually at less than 10%, where 400% would be max cpu utilization (quad core cpus). In short, it's not meaningful information.

You will just have to take our word for it that it is not useful in this environment (Groovy running on our hub). If we can't use it, and we spend 60+ hours a week developing and debugging in this environment, what makes you think it's going to be of use to you?

Similar remarks could be made about RAM utilization. In this environment apps and drivers are constantly being loaded and discarded from RAM, and there is caching taking place. There is no valuable information to be had, except for one piece: stack overflow. We now report that as an error. RAM exhaustion will typically occur well after other things have collapsed, for example, a run-away loop or recursion has brought the hub to its knees. The RAM footprint of these apps and drivers and hub platform are not great, not pushing the boundaries of the physical resources. UNTIL, the software goes off the rails. The whole way forward is for software to not go off the rails. When hubs fail, it's almost always from this cause.

As Gavin says,

csteele · January 18, 2019, 5:27pm

Gavin...
There's at least two attempts at building that in play now. A Wiki and a Community Github. Brian @bptworld handles the Wiki (www.HubitatApps.com) and I elected myself as the owner of HubitatCommunity on Github. (HubitatCommunity · GitHub) I'll send you an invite and you can either put your code there OR create a README.MD that points to your real github. (or decline the invite )

gavincampbell · January 18, 2019, 5:33pm

Ya I've seen those. Wish it was more "official" but a great start. I don't publish my code mainly because I just don't have the time to support it. Most of it is just stripped down to work exactly like how I want. But most of them can be done using other community apps though too. I'm a big fan of what @Cobra has done with his apps.

I'm hoping these efforts though will help with producing less buggy apps/drivers though. That would really help the platform.

csteele · January 18, 2019, 5:54pm

Hubitat is working to a different plan, they have said. Given all they have to do, I have to believe it's quite low in the list. And personally, I'm ok with that. They added Import from a URL into both App Code and Driver Code and (again) for me, that's plenty, for today.

Because Hubitat is working to a different goal, they can't "endorse" either of the existing. That's OK too. But the ultimate problem is... content. Neither attempt is getting much participation. That's not to say none, or not enough, just not as much as the Requests For would indicate.

My original thought was that: If the code is here in some buried thread, then it's been "published" and it should be more easily found. Both the Wiki and the Community Github are simply that. But the uptake has/is low. I think a lot are in your shoes. "I just cobbled this together, and I'm not a programmer, I don't want to support it." But the reality is, people have said "I use this" or "I migrated this" and yet it gets lost in all the rest of what goes on in a Forum.

I think that was nearly 90% of my intent when I clicked the button on Github to create HubitatCommunity. At the time, at least half of the threads where a chunk of code was 'published' was followed by "I hope you don't mind but I fixed a couple bugs in that and here it is over on yet another repository."

Bottom line is.. the potential is there, but not as highly utilized as I imagined when I clicked the button I encourage you to accept the Invite and fix some code Start with mine, ok?

keithcroshaw · January 18, 2019, 6:37pm

I haven't forgotten it.
Soon I will put anything I feel useful up there.
I haven't had a lot of personal time lately that can't be filled with enjoying or taking care of family.