What is Hubitat's thread safety model?

You may not fully understand the issues in concurrent processes that interact, and which need mutually exclusive access to a resource.

Just as one example, the Litter Robot driver needs to poll a cloud server to get status and receive events. On my hub, a rule needs to interact with that driver and with another rule that changes color (red, yellow, green, and flashing red) of Hue under-vanity lightstrips to indicate the level of cat poop in the litter drawer, and whether the poop level is critical and needs to be emptied.

Those processes are asynchronous and unrelated, and critical regions are needed to control the lightstrips. It’s the asynchronous nature of the cloud server responses relative to the polling of the cloud server and the flashing of the under-vanity light period that creates the concurrency issues.

Just one example. I’ve got several.

1 Like

First - energy metering devices can send many events per second.

Second - you sound like you’re assuming a single event type for one device. What if I have a handler that is handling all temp changes for 20 different devices?

Yeah, I find synchronization to be a challenge some times. Not because it’s hard (though it is a more challenging programming topic) but because as @artyom.tokmakov said, none of this is documented and so we just get to try to figure it out and make our best guesses about what is going on under the hood.

2 Likes

I'm really curious if you could show me an example. Are you having events from all 20 devices touch a single shared resource? (And thus wanting to synchronize it.)

No, I've spent years of my career working on parallel systems and synchronization. I'm fine with "issues in concurrent processes that interact". I'm more asking about your specific use cases here, because in my own personal code inside the Hubitat runtime, synchronization has never been an issue. So I'm curious what your use case is.

Just as one example, the Litter Robot driver needs to poll a cloud server to get status and receive events. On my hub, a rule needs to interact with that driver and with another rule that changes color (red, yellow, green, and flashing red) of Hue under-vanity lightstrips to indicate the level of cat poop in the litter drawer, and whether the poop level is critical and needs to be emptied. Those processes are asynchronous and unrelated, and critical regions are needed to control the lightstrips.

I think this is where I'm not understanding your case yet. It seems that your rule would be triggered by a state change on one of the Litter Robot's attributes, and would then send a command to the light strip. The rule doesn't need to know about the internals of the litter robot driver, and doesn't need to know about the asynchronous API call. What am I missing that makes it need to be more complicated than that? Is it the "and with another rule" part? What is going on in that rule?

2 Likes

Yeah. I’m not at home but I’ll find you a sample. Like imagine having a situation where an event should increment a state variable. I’ve have situations where 10 events happen but the end result is 8. Why? The state.variable++ stomps on each other because they’re not synchronized. It’s basically a TOCTOU bug because it’s not atomic. When state.variable is read it still has the old value (because the other thread hasn’t written it yet). Atomic state reduces the odds of this, but that’s still not actually atomic. This is the classic “ATM withdrawal” example all software engineers learn in school. The solution is thread synchronization

I have had to synchronize access to atomicState for device discovery in my apps where I want to maintain a global list of what is discovered.

I send out a upnp discovery message, and everything on my network chatters back. Asynchronously and unpredictably relative to each other, with a lot of potential for collisions and concurrent running of the response handler.

So, it's either fiddle with it until it seems synchronized "enough", or say "just run again if it doesn't work" in the user notes. Or both.

Very true! That was the first time I saw the synchronization issues on HE. At the time I didn’t know enough about their thread model to know what was happening.

Do you do any Z-Wave driver development? Hubitat recommends using a couple static field variables to handle Supervision: [GUIDE] Writing Z-Wave Drivers for S2

I've seen some people complain of errors that look like concurrency issues, which I see that the 2.2.8 release notes note a fix for:

C7: Fixed concurrency error produced on some Z-Wave drivers when using S2 and Supervision.

However, I'm curious what the fix was...switching to ConcurrentHashMap instead of the default Map, maybe? :slight_smile: In any case, that is a practical example where this matters--not only could Hubitat could get two Z-Wave reports back from the device in quick succession (while the first instance of the driver does not finish executing before the second begins), but static fields are shared among all devices that use the driver, not just one particular installed instance--hence the reason these are static Maps indexed by device ID (which if someone could tell me why they recommend converting that to a String instead of the actual Long type, I'd be interested in knowing, but that's unrelated...). So, if two devices (using the same driver) just get a single report back at nearly the same time, the same concern applies.

(If ConcurrentHashMap was the fix, I don't see that mentioned anywhere and definitely not in the above. Might be worth asking, unless this was some platform-level change, but the "some drivers" phrasing doesn't sound like it. Probably worth asking in that other thread...)

Not much, no. Most of my issues have been in apps.

Yes

1 Like

Ah, that's interesting. I see what you're saying there. It's not a situation I've had to solve in my apps, but I follow now. What kind of state variables are you incrementing?

I'm really curious if you could show me an example. Are you having events from all 20 devices touch a single shared resource? (And thus wanting to synchronize it.)

I've hit this with InfluxDB logger, where it collects events from different devices, and then sends a batch http request to influx db to log them. Thus it needs to share the state between threads which add events and a thread that sends the events to the DB.

But it doesn't have to be as complicated as here and in examples others have shared. You can have one timer or http handler, and the events they trigger will already be in conflict with your lock events, if they share any state.

I really wish Hubitat's documentation would be clear about this issue, and about solutions (maybe it is now, and I'm not up to date?). It just hurts developers and ecosystem for no reason.

I was building a presence app. Geofence is so unreliable. I basically was using multiple apps and keeping track of > N report you’re away, then you’re away. But I noticed the count didn’t match reality. It turned out to be this issue.

Couldn't this be done without keeping your own independent state? Recalculate the total each time by inspecting the attributes of all your presence sensors? Then you wouldn't need your own state variable for the count.

That assumes the state doesn’t change between the event and checking the value. That’s another TOCTOU issue.

I have the same issue with a voting app to catch the number of Ring Alarm Extenders that have seen “switch to battery” events during a power fail. The power is flapping. Ring extenders flap back and forth from mains to battery to mains. The point of the voting is to detect true power fails as contrasted with my wife knocking the Ring extenders out of the socket or a circuit breaker tripping.

Exactly. If you check the attributes in that example they may have already switched back since the event fired. The only accurate way is to keep track of the events.

2 Likes

But all events will fire, and at the final one, you’ll get the true count, with no more TOCTOU. So your final count can set your true output.

I’ve had a similar case, where a bunch of input events can come in quickly, but I really only want to do my output command once, after it settles down. I use runIn() to set a 1 second “debounce”.

If you want after it settles yeah. But if I want to know if it EVER exceeded, not the final resting state, I need a counter. Events are just one example, I’ve had to implement synchronization for multiple uses.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.