At the current time, for a number of other reasons, most of the smarts I originally had on my HE have been moved over to a Microcontroller: Teensy 4.1 if you're curious. Pretty much all that's left on the HE are 10 rule machine rules to forward Z-Wave switch presses to the Teensy, and a few devices exposed
via Maker API so the Teensy can control them.
Other than that, the HE is idle. Yet it still managed to do this:
and in that condition it stopped relaying Z-Wave switch presses. Other than completely removing the HE from the picture, and replacing it with something like a HomeAssistant instance, what options do I have to keep the HE running smoothly?
WAF takes a very bad hit in cases like this, when every smart bulb in the house stops working. As I'm sure you can understand, this is something I really wan to avoid.
Every type of ZWave device I have here is represented. I've got Zooz ZEN 71s using the "Zooz ZEN Switch Advanced" driver (e.g. device 278), Zooz ZEN 34s using the "Zooz Remote Switch ZEN34 Advanced" driver (e.g. device 394) , and a Honeywell T6 pro Z-Wave thermostat using the "Generic Z-Wave Thermostat" driver, which isn't on that list, but it's throwing these errors as well.
All of those alerts in the "Parse" Method. That tends to be related to calls that send updates into the Hub. If you are getting excessive events and it is hitting that method i have to wonder what are you doing that the hub is constantly parsing data. Are you frequently refreshing a switch or making calls to some kind of endpoint?
Edit all of the Zooze stuff is zwave 500/700 so polling wouldn't be needed. I use those drivers myself for the same switches in my setup and don't get those errors. Do you have any RM rules that are doing any action on those devices on a regular basis? like a poll or refresh?
Can you provide some context around what you have setup with Maker API. It isn't hard to do bad things to the hub with maker API with external systems. I have blown up my hub with a single device in Maker API.
I have Zooz Zen34 remotes, and they occasionally do this weird thing where they become unresponsive and cause all hell to break loose. I found it only happens when they are too far away from their usual location. Not sure why, but it's definitely a thing I've noticed. Sometimes they also blast out their battery status every few seconds causing the hub to slow down.
Honestly, between these and a multitude of other Zooz products I have, I'm starting to think maybe Zooz is rubbish.. I have generally not be super satisfied with the reliability of their stuff.
Perhaps @jtp10181 can shed some light on those errors in his driver. I looked at the code briefly and it looks like those errors may be related to bad data being stored for the switches. If you pull up the switch details page and scroll down what do the states and data values look like. Do you see anything that doesn't look right.
This switch was having issues and showing it was a UNK00 device type. I had to factory reset it and then use the swap device procedure to get it paired properly. I think it was related to either me flashing it and not doing a exclude - include after it was updated or the pairing after the Flash didn't fully complete.
I wouldn't say your hub is idle at all. On the contrary, it is trying to process a lot of events. As the logs you shared show, you have devices generating consecutive events every few milliseconds. Are you asking Teensy to check on your devices? If so, how often. You may want to enable debug on Maker API to see how often is triggered.
There's something bizarre about your setup. I have a similar setup with two Hubitats, and the majority of my automations are done using Node-RED via MakerAPI. I have ~100 devices (70 physical) on one, and about ~120 devices (40 physical) on the other. I've never seen the errors you have.
Are you polling from the Teensy? If so, there is no need to do so - because MakerAPI can push device events as updates using http POST.
The reason the switch was getting hammered there was because my wife went into the master bathroom, and tried to turn on the lights. Which weren't working, so of course she tried pressing the switch paddle several times. Then I tried the bedroom lights a couple of times: dev 394, and of course those didn't work either.
Normally, those lights aren't generating any events at all, so I did a deep dive and found where it started: It's the Thermostat. That reports whenever the temperature or the humidity changes, this shows where the problem started.
I was working the various lights about 40 minutes earlier, because I was transitioning from the "test" version of my control software running on my desktop to the production version running on the Teensy, so the IP address in all of the RM rules to make HTTP calls needed to be changed, and I wanted to verify everything was working. Even so, that's no more than twenty or thirty button presses over a couple of minutes, finishing as you can see in the Master Toilet at 18:15H. And then at 19:03H the Thermostat starts erroring out, after which the entire Z-Wave system stopped producing events.
You can stop reading here unless you're curious enough to want to know the exact setup.
Still here? OK. Right now the HE is nothing other than a Z-Wave and Zigbee gateway to interface those two systems to the brains of the system running on the Teensy. When a Z-Wave switch is pressed, it triggers RM, which makes an HTTP call out to the Teensy, which then does all the heavy lifting of turning lights on and off. Most of the bulbs are Kasa TP-Link, so they're WiFi and the Teensy can control them directly.
However Zigbee devices, of which I have a few, are controlled via the exact reverse path: the Teensy makes an HTTP call into Maker API which in turn controls the appropriate Zigbee switch.
And then there's the Thermostat. All that happens there is that every five minutes or so, the Teensy calls Maker API to get thermostat status: operating mode, temperature etc. etc. etc. And on those rare events (maybe four times a day at maximum) when Teensy needs to change the thermostat mode, it'll again hit the Maker API to switch from (e.g.) heat mode with a setpoint of 68F to cool mode with a setpoint of 76.
I think the problem is there is clearly more going on then what we are hearing. The hub can handle well over 100 devices but can also be overwelmed with poor code or a bad implementation.
I think one thing I would try to do is let maker api send updates to Teensy as well as recieve commands from it and stop all polling. Maker can easily do that. Preferably you want the hub to react to events so if at all possible let it send out commands when needed and then recieve when needed as well. Schedule repetative stuff slowly adds up. It can also be very detramintal to have commands come in to often to the hub.
Is the only indication of a load issue that error message do you see any alerts on the devices in the HE UI that indicate heavly load.
Are those errors still occuring. how busy do the logs look now. If there is as little stuff on this hub as you mention you should be able to turn logging on for everything so we can all see how busy everything is. It may be nice to see all of that so we can see everything that is happening.
The only slight surprise is that Carriage Light is routing through the Master Toilet, since Carriage Light is almost twice as far from Master Toilet as it is from the hub itself. But whatever, it all works.
In terms of geography, Master Toilet, Master Bathroom and Bedroom switches (including Maralen and David) are all in relative close proximity to one another, Hall Bathroom and Carriage Lights are elsewhere in the house, and the Thermostat is just across the main hall from the HE. Maximum distance to any of them is no more than 30 ft, although there are a few walls getting in the way.
That's it. Trust me when I say there is nothing going on. When you say "stop all polling" what are you referring to? The only "polling" that I'm aware of is the Teensy pinging the HE every five minutes for Thermostat status. Other than that, everything should be event driven, in response to either a Z-Wave event from a device, or an HTTP call being made to Maker API.
How exactly are you anticipating I should remove RM and use Maker API. A typical RM rule uses a button press on a switch as a trigger, and then when that happens it makes an outbound HTTP call from the HE to the Teensy. I was under the impression that Maker API could only work with an inbound HTTP call from some other source on the LAN to the HE. I'm not seeing how this could be used to relay a switch button press from the HE to the Teensy.
The system had been complaining of heavy load for several months when I had all the brains on the HE, that's part of the reason I offloaded it all to the Teensy. There were several apps I'd written to do what is now being done by the Teensy. Light control, turning the Carriage Lights on and off and sunset and sunrise, etc. etc. etc. As far as I know they were well behaved, in that they only responded to "input", e.g. a switch press, or as variable change via a variable connector, did their work to set lights as appropriate, and then went back to sleep. But in any case, all those apps are now gone.
Alert is a firmware update, I can do that. Is there a major Z-Wave overhaul between 22.214.171.124 and the current 126.96.36.199?
How often do I reboot? Very rarely. I've been assuming that the HE software is well written and is capable of running for months at a time without needing any intervention.
No other devices, what you see on the device screen is all that's there.
I would definitely do that Z-wave firmware update on the Z-wave page. It is different than the hub firmware. There were significant improvements to Z-wave radios with that update.
How many events are you storing for that thermostat. I would dial those way back, there really isn't a reason to store hundreds upon hundreds of thermostat events. In fact, I think it is a good practice in general to limit events unless you really need them for some odd reason. Nobody in their right mind analyzes how many times you turned on a light in a month, so these events are mostly clutter.
And yes there are LOTS of updates in the latest firmware. There are even LOTS of updates in the 2.3.4.xxx series after the 188.8.131.52 update you are on. I would advise updating.
Edit: in particular...
Possibly fixed elevated/severe hub load experienced on previous 2.3.4 builds.
and * Fixed high CPU usage/memory leak under certain conditions.