2.2.4.147 Broke Scenes

I don’t have to, I had an open ticket for this when I migrated to the C7. The fix was to do just this, which fixed everything except one device, and the fix for the one device was to perform a repair on a-wave... Others did this and it fixed their issue too. As the source is closed, we have nothing to go off but experiences, and we share our experiences and are known fixes. You get the benefit of knowledge of the underlying system, we can only guess as to what the features do.

We post in these forums because the system is behaving oddly. We are seeking help from anyone that is willing to offer it. Some of us have experienced similar issues and share our solutions. My issue is mostly documented here, but there was also a PM thread to work through my issue. As @r.p.ulivella commented with a similar issue to mine, I am sharing how mine got fixed.

You may feel I am attacking Hubitat — I am not. I am trying to help.
You may feel I am wrong — I probably am. We don’t have the insight into what the system is actually doing to know.
You may feel responses like this are counter productive — it may be, but responses that nothing was changed and leaving it to us to solve the issues leads us to derive what we can from what we have available to us. And these types of responses turn a blind eye to regression bugs — bugs introduced by other unrelated changes, that actually have a negative impact on the subsystem that wasn’t even touched.

So, what else are we to do, than to share with others what has worked for us?

Not in the least. But what you are suggesting makes no sense. It's not my role to either bash or just allow wrong information to be left here as fact. Having a ticket doesn't mean that your diagnosis was correct. So it's not really helping others when what you say isn't based in fact. There are lots of things that get said in this forum that are tantamount to "black magic" -- something seemed to solve a problem but no one involved actually knew what they were doing.

Truly get to the bottom of a problem, and don't post "solutions" that aren't tested in a confirmable way. There is a definable set of things that can go wrong with a hub when restoring a backup. Having to touch every Scene after restoring a backup is not one of them unless the prior state of those Scenes was itself corrupted. You don't have to believe me, but this is very testable. Try it yourself, I did.

I have one data point in this area.

I recently did a soft reset and restore backup. I have two scenes I use daily for setting blinds. One to open them in the morning, one to close them in the evening. Those scenes have continued to work without modification or change or me re-saving them since the restore of the backup.

2 Likes

I did this based on one of your earlier posts in this thread. It seemed at first to have improved things but I definitely still have broken scenes.

I expected scenes to be quite lightweight and I prefer that about them to burying the recording of a "lighting design" within a rule that I prefer to be cleanly focused on logic and function. I looked through the Application State pages as you suggested and I cannot find anything necessarily erroneous there based on what I could decipher of the data (it was quite simple to read).

The screenshots below are taken from the state pages of two scenes (one large, one small) that exhibit this failure. One scene (the small one) is pure ZWave devices only. Some are S2, some older/no security — I mention this because it is consistently the S2 devices (paired as non-authenticated) that show the worst symptoms of the issue and the older style ZWave devices consistently work correctly. All of the devices in the simpler scene were part of the restore from backup and then manual reconnect of ZWave devices to the new C-7 per your instructions elsewhere on this forum.

The larger scene has devices of different types including ZWave (mostly and about 50% S2 non-auth and the rest old style ZWave), WiFi (local, no cloud devices here), and ZigBee. Strangely this scene fails in all sorts of ways but it's only one or two devices at a time. Those sometimes are ZWave, sometimes WiFi, and rarely ZigBee but there are only two of those so the odds are in its favor. The other strange thing about this scene is on repeated attempts, I have seen some lights dim to values from ancient versions of this scene. For example there used to be a WiFi desk lamp in this scene at some non-zero value. Then I removed it completely from the scene. When I run the scene, sometimes that WiFi lamp will do nothing as expected. Sometimes it will turn off (strange). Sometimes it will go up to that non-zero value from some version of the scene that's over a year old. Yes this is erratic behavior and I wish I had more to give you to help troubleshoot it. I do have "ignore activate switch off" checked for all of my scenes.

Last but not least I should mention that if I do send several different lights to different values from a rule (including some of the same lights that seem to suffer from this issue within the scenes in question) I get quick responses from my devices as I would expect.

@bravenel also — thanks for your time, support, and the creation of a fantastic system. I'm a happy Hubitat owner for a long time now (since C-3 first became available) and a much more enthusiastic HA user in general as a result of Hubitat. This is my first interaction with you directly so I have to say thanks while I have the moment!

ya thats a butt load of devices, i am not at all suprised it is failing.. you are overwhelming your mesh trying to send that much all at once.. maybe a rule instead or something like that where you can send commands to 5-10 devices and then have a delay between to give the mesh time to deliver/recover.. or multiple scenes or groups fired off with delays between components.

my biggest group is 2 dimmers plus 8 light switches..

works fine, there probably should be limits in the scenes or groups to how many you can do at once, but id guess the setting would be aritrary and vary based on hub and device configurations but i'd imagine with too many devices you could easiliy overload any hubs spectrum space.

I have years of history to the contrary. Until I moved over to the C-7/2.2.5 all of my scenes functioned reliably with not a single device being forgotten (or being sent the wrong value which to me is even more strange). So I have to expect that the issue lies elsewhere. In the past I had more ZWave devices as my entire HA environment was ZWave without any WiFi or ZigBee to share the load. So the fact that scenes worked on the older hub with older 300 series ZWave devices suggests the problem is something else. Whether it is with scenes or not I am not sure which is why I'm posting so much here.

This also does not explain why the same problem is seen in WiFi and ZigBee devices. Also if this were a ZWave mesh problem I should be suffering with other ZWave mesh problems, no? I'm mostly 500 series devices and I have a really well distributed mesh around my apartment. Also I am experiencing this issue with scenes that contain only 3–4 devices and scenes that contain a dozen. It does not matter the quantity. In fact a scene with only one ZWave device and one WiFi device exhibits this strange behavior with both of those devices being within 6' of the hub.

Lastly I'll add that I see in the logs that Hubitat doesn't send an instruction to the forgotten device(s) at all or it sends the wrong instruction. I do not see the Hub attempt to send a message but then that quietly fail with the device not responding. Instead I just see no message sent. That could be how scenes actions are logged or it could be how ZWave mesh overloads quietly fail. But again then why do I see the exact same forgotten/incorrect value with WiFi and ZigBee devices?

I tried making a rule that duplicates the effect of a big scene by sending lots of devices instructions all at once. The result is perfect performance. So there is a difference. I do not like this because I prefer to separate concerns: scenes record/recall state, rules express logic and function… However this might be my only solution since I have scenes failing that only contain two devices.

Perhaps there is an opportunity here to add to scenes a thin layer of smarts where scenes can optionally "confirm" that they're correctly completed or (and this is probably better as it is lighter weight) expose a way of partitioning scenes into separate chunks to be run with a 0.5s delay between for mesh reasons?

1 Like

Can't prove a negative. Scenes does not log the commands it sends, and you'd have no way to see them but for the events of the devices that receive commands. So if a device doesn't show the command in its logs or events, we don't really know whether or not Scene sent the command, or if something else happened. Perhaps logging should be added to Scene-1.2.

2 Likes

This is really helpful insight. Based on what I’m seeing in the logs it does look like the scene is logging its instructions but what I’m actually seeing is the device traffic.

@bravenel based on what you know about this mystery if you were experiencing it what would you do next? I feel like I’ve exhausted what is to be done examining scenes themselves and my mesh is appears robust in all other respects.

We are looking into it -- it's not at all clear what is going on.

Look at the device events (button at top of each device page). Tie together those events to Scene activation time. Are there events for each device activated by the Scene? Or are some missing? This is a critical piece of information.

What I'd do next and what I can suggest to you are two different things. Based on what you've said, I would suggest using a rule since that seems to work for you. But, keep those scenes around. We will figure out what is going on... Logging in Scene-1.2 will help, but so will the answer about device events.

2 Likes

Is there any discernible pattern to the types of devices that fail? Zigbee, Z-Wave? CT bulbs? etc.

I’ll do this when I next have time to fiddle in the next few days.

Wow I’ve asked this question so much staring at the issue. From what I can tell the devices that have the most trouble are ZWave dimmer devices. Most are paired S2 without authentication and contain 500 series chipsets. But I have seen the issue occur with all types of ZWave dimmer devices.

The only ZigBee device that has experienced the issue is a pair of Hue GU10 CCT adjustable bulbs in a group that’s exposed by CoCoHue (and therefore technically not ZigBee driven by Hubitat but UDP/IP instead). Driven individually (but still through CoCoHue and through my Hue Hub) I have not see a forgotten device nor erroneous value. All other ZigBee devices in my network are sensors or switched plugs. None of those appear in scenes.

I have four WiFi connected bulbs/lamps on the network. All have DHCP reserved IP addresses with maximum length lease times. So they never take on another address. Two are dimmer only types (fixed CCT), one has a CCT parameter in addition to its dimmer, and one is CCT+RGB. The dimmer only types and the CCT adjustable type have experienced the issue. The CCT+RGB has never failed but it rarely changes and in fact is 99% of the time left off.

There is a known issue with some CT bulbs for Scene: Scene sends two commands back to back in quick succession to set the color temp and then the level. Some drivers fail with this. We are going to address this issue in the not distant future. For comparison, a color bulb accepts a map of hue/saturation/level in a single command, while a ct bulb needs two commands.

Our impression is that this might be Z-Wave related. We will be doing extensive testing...

3 Likes

Totally understood and an interesting detail.

I’ll stay tuned. Anything I can do or share just ask.

If it’s relevant or helpful you should know that I’m a lighting professional. I’m a gaffer in the feature film and television production industry. So a lot of the lighting aspects of this are my language (and passion). But the HA parts are not of course!

For me, I had z-wave and integrated hue devices. As for the hue devices, I was able to make a code change to accommodate the scene replay, and the z-wave devices were fixed by doing what I did, except for one device which was fixed by issuing a repair. All my scenes are working correctly at this time, so I cannot chime in on current state to reproduce.

yes. This is what I believe I saw in the case of the hue lights. My hue driver allows setting all attributes, and they are stored locally while the bulb is off. When the bulb is turned on, I send all the state data in one message. I may be wrong, but I think the issue was the scenes only issued the level and temp, but not the on event. I had to change my driver so that a level of non-zero triggers an on state. I wonder if this carries over to z-wave bulbs? I stopped using my one z-wave bulb, because they were too clumsy (color changes, and brightness changes behaving oddly, and colors not matching up). Could it be that a level change does not turn on the bulb? with my RGB Z-Wave bulb that I had, I know I had to turn the bulb on first, then I could send it commands, like HSV or CT values. If I sent anything while off, they did not do anything.

Only if he's using a custom driver that doesn't follow the platform standard and every built-in driver. setLevel(x), for x > 0 should always turn on the device implicitly. To the best of my knowledge this applies to all Z-Wave devices, all Lutron devices, etc. Only a non-conforming custom driver would not turn on the device from setLevel().

okay. That's what I figured but thought I would throw that out there. I don't recall the specifics, but I recall there was another system where setLevel(>0) would not turn the light on until an on() event was submitted.

I see something absolutely bizarre with the last two devices that had trouble tonight. Both are Inovelli Red Series LZW31-SN ZWave dimmers and both were migrated from the C-3. RSSI looks solid and device page control is perfectly smooth (and faster than ever). However the only recent events for those devices are "lastActivity" and the date/timestamps show Feb 2 with "lastActivity" as the name and an event every second. There is nothing showing that it was sent a dimmer command when the sleep scenes were triggered tonight.

Clicking the ON button on the device page or sending the dimmer to a level works perfectly. But the event is not logged in the device events page nor does the device page update until I click refresh. This is not the case with other dimmers of the same type on the network. I also see other dimmers of a different type that show the same exact symptom and those too were migrated from the C-3 hub (on which they were working perfectly within scenes).

Which driver does this device use?

Try using the internal drivers.. I just tested mine (same model) with the internal drivers and am getting the reports/events as expected:

1 Like