Seeking help for C7 unresponsive

I would disable all the ecobee stuff and see if that helps. I feel like a lot of people who have lock up issues are using the ecobee integration.

1 Like

Thank you. Have done that before, disabling all: ecoBee, webcore, airthings and such. All was disabled and then turning back on in varying combinations. I can confirm ecoBee helpers definitely needed to go.

Curiously, there was a recent 3 day stint - yay, but not repeated in months - when Adguard Home (pihole alternative) was disabled. I am testing ecoBee left out of Adguard to see if it changes anything (reports stated nothing was blocked so this is a double check).

Please know reverting to 236 was promising, until webcore was failing. The webcore downgrade did not seem to work well. The latest 238 is now installed.

A quick update... The upgrade to the 2.3.128 seems to have stabilized things. Now testing adding a ecoBee contact helper back.

Should the contact helper work, it can mean everything is back to normal :slight_smile:

Here's hoping!

Thank you for support.

1 Like

Update...

  1. Just in... the C7 using 2.3.8 within 6 hours of enabling the ecoBee Suite Contact Helper, which was used without issue on the C7 with 2.3.6 for over 10 months reliably, went done :frowning: Now testing its removal and if the multi-day uptime will resume.

  2. A new webcore piston using ecoBee Suite devices was introduce last week and was running fine for during the +3day uptime. It is a webcore "smart circulation" piston using the ecobee Suite devices. Assuming things restore, a "contact helper" piston may need to be tested.

QUESTION: What can be done to get C7 back on 2.3.6 and its HE webcore working?
The 2.3.6 downgrade broke HE webcore on the C7: webcore would not load stating to logout and back in to fix, a repeating loop. Once the C7 was restored to 2.3.8 its webcore was accessible again, yet the paused pistons needed to have some of their devices corrected. Please, know, there is a HE webcore also on the C8 v2.3.8.

Not sure since I do not use webcore, but I think there is a way to backup all your pistons? You might need to do a backup, delete the app from the hub, reinstall the app then restore the backup? Hopefully someone else who uses webcore can chime in if this is safe to try.

1 Like

Well, with the contact helper disabled and using hopefully a new WC piston equivalent, the C7 has been up for double the time (12h) compared to when the contact helper was enabled (6h). The ecoBee Suite's devices are still being used. A sample of the piston expression e.g.

!contains([EcobeeTherm: ecobee:equipmentStatus], 'fan' ) && (abs([EcobeeSensor: Desk (QBZN) : temperature] - [EcobeeTherm: ecobee : temperature]) > {MAX_DELTA})

Hopefully, this can:

  • perhaps point to what the C7 change was between 236 and 238
  • perhaps help someone else that is encountering this issue

For now, Ecobee suite helpers are not being used, and things seem to be working. @jtp10181 perhaps I can test 236 with the piston backup this weekend. Thank you for the idea, I had not consider the remove and restore from WC backup.

Otherwise, I can report back over next while, hopefully, to share the C7 has been responsive and long uptime :slight_smile:

1 Like

ecoBee or Airthings may not be the problem on the C7.

ecoBee and Airthings were moved to the C8, and removed from the C7, and the C7 still went down after a day.

Summary of new setup:

  • moved ecoBee and Airthings off C7 to a C8
  • C7 now has webcore for light and motion pistons, Lutron, Hue, zwave motion/sensors, and the HubitAlarm
  • C7 hosts HubitatAlarm (DSC alarm part is on a separate PC/SBC) and shares some contacts sensors to C8. C8 uses these sensors in some WC pistons.
  • C7 Does share some other Zooz SE18 and SE11 to C8.
  • Confirmed no jumbo frames on a NAS, or used by anything else, i.e. MTUs are all default 1500.
  • C8 has been up for days now.

Is it still the case where you can access the diagnostic port to reboot?
That basically means the main platform / UI is crashing.

Try a different power supply and cable, just to be sure its not some sort of under power issue.
Do you check the temps at all using Hub Info? If temp gets too hot it can cause CPU issues, which might lead to the platform crash.

Otherwise I am thinking you should downgrade to 2.3.6 and see how that works.

1 Like

Yes, you understand correct: 8081 works, UI/main platform unresponsive.

Will do for the PS and cable. The C7 original PS and cable are still in use. These I can test soon to rule out.

As to the CPU temp, I can confirm the temp is in the "green". I can also confirm the CPU usage does start spiking, even significantly. It was observed above .9, to even 1,8+. All per hub-a-dashery.

I can also confirm your idea to restore pistons from a backup works. It was used to convert ecobee and airthings from C7 to C8; it is a lot of work. So, the remaining complete test to downgrade to 2.3.6 again is being put off for a bit due to the light/motion pistons to restore.

For what it is worth, I have consider attempting to shape the IP tx/rx traffic to the C7. My suspicion is there is some change after 2.3.6 in this area. This past weekend while installing a new switch there was a significant load observed on the C7 CPU (UI was not responsive, but the zwave devices were still sluggish) , which it recovered from, yet memory did not recover. Additionally, I noticed before moving airthings off the C7, that CPU load on the C7 increased when Airthing's IP connection was being converted.

Thank you for the thoughts!!

Interesting... Two hue C7 bulbs were mesh shared to the C8 for use in a piston. The piston uses an ecoBee temperature sensor's motion on the C8 to turn the bulbs on and off.

The C7 did not stay up 3 hours when this was implemented. ecoBee is on C8, and not shared to the C7. Crazy...

The C7 has other hue bulbs tied into its on webcore piston and stays up.

I might try the other was - ecoBee sensor shared to C7 and C7 runs the piston.

So... should two C7 Hue bulbs mesh shared to a C8 create high RAM usage: 304MB? In attempt to fix the above the bulbs were removed, and available the C7 RAM improved to 441MB.

Otherwise, the morning will hopefully result with C7 up and responsible :slight_smile:

RAM went from 304mb to 441Mb, or it increased by 304Mb, meaning it was at 137 prior?
304mb vs 441mb free ram wont make any difference. Technically 137 should be fine as well but sometimes when you get to that point (below ~150ish) the hub starts to get sluggish.

2 Likes

Sorry being confusing...

The remaining RAM has typically stayed about 290 on the C7 before all this started. Removing ecoBee and Air Things from the C7 improved RAM to 304MB; not as much as hoped yet it is what it is.

Then, with the removal of the shared bulbs the free RAM jumped to 441 for several hours. It seems the shared hue bulbs to the C8 piston were using more RAM than ecoBee and Airthings on the C7.

FWIW, the morning did not result well. So, more pistons are disabled to see if the C7 can stay up longer than a few hours.

Aside... my wife and I love what Hubitat does and offers. So, we are getting a C8 Pro in hopes of migrating the C7 to the C8 Pro. It will be interesting to see if the problem follows or remains on the C7.

Some good news, ecoBee and Airthings on the C8 are working amazingly well. Yay.

Just FYI there is a potential fix for the unexplained hub crashes that is being beta tested right now. So if you wait it out a short while longer, it might fix it.

4 Likes

Will do :slight_smile: Thank you for heads up.

Okay different update:

  • Migrated C7 to a C8 Pro 2.3.138
  • C8 Pro does not have same symptom of unpredictable unresponsiveness, it functions fine.
  • C8 Pro may have some Homekit Migration issues… working on it.
  • C8 has been running ecoBee and Airthings functions fine: been up for days, until it was rebooted to install 2.3.138.

I may shutdown C8 Pro to put C7 back on line to see if the new 2.3.138. Not sure if it will be done. For now, enjoying the relaxation of not having to monitor things.

1 Like

It has been a bit. The title should change; however, it is kept the same to share these observations in the case it helps. The C7 was replaced with C8 Pro. Effectively all remains same, as a migration backup was used to move to the C8 Pro. Still running 2.3.8.

The C8 Pro has been working and last days which is great. The observations, after about 5 days:

  • no change in CPU usage
  • no change in memory usage
  • zwave Zooz SE11 devices tied to a webcore piston starts to become slower in response to motion: motion, or its lack of, turns on and off a Meross switch
  • zwave Zooz SE11 devices tied to a webcore piston starts to become slower in response to motion: motion, or its lack of, turns on and off a Lutron switch

If the C8 Pro is not rebooted, then eventually zwave seems to become not responsive with webcore pistons.

Question: is there a metric that can be watched other than memory, and cpu. There is obviously something that is not working, and it would be great to monitor and at least automatically reboot when performance reduces.

Aside, if things go right I plan to have grafana and influx setup with concept to capture data.

Thank you.

1 Like

Probably should spilt this out into its own thread, but in the mean time can you post a screen shot of your Zwave Details page?

4 Likes

I can create a new topic in a bit... I think I have influxdb and grafana working in a docker-compose file. If it works out I can start the post with that and seek items to monitor,

Please find the ZWave details. I do not see anything suggestion a ghost. Please let me know. Thank you for the guidance.

1 Like

0x1a and 0x2b seem to be switching routes more often than the rest of your mesh, but not in the extreme, and nothing that looks to be an issue. Might check the device tab under logs to see if you have a device that seems particularly chatty.

1 Like