LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
BertMcMahan

New function: Atomic semaphore unlock with reference release

Status: Declined

Any idea that has received less than 2 kudos within 2 years after posting will be automatically declined.

I ran across this issue as described in this thread:

 

https://forums.ni.com/t5/Actor-Framework-Discussions/MGI-Panel-Actor-hanging/td-p/3759297

 

If you're using named semaphores, you can come into an issue when one process wants to release its reference without the entire semaphore being released. If you have multiple processes looking at the same semaphore reference, you can end up in a situation where one process has the lock when another releases it. When the process holding the lock tries to release the lock, it finds its semaphore reference has been released, so the function returns an error. When this happens, the lock is "eaten" and is never returned to the pool.

 

To combat this, ideally you'd have your "Close" process wait until it obtains the lock, then close it right after. However, if it waits on a lock (so other processes finish what they're doing), then releases the lock, then clears it, you have a small window of time where another process can grab the lock before the Close process can release the reference, in which case you can't release the semaphore. There isn't currently a way to perform the Obtain lock- Release reference atomically.

 

A workaround is to use a new temporary reference (as I describe in that thread), but it would be nice if Release Semaphore Reference could have an input "Wait on lock(s) before releasing" input. I'm not sure how this would work with semaphores with more than 1 element... I've never used one with more than 1 resource available 🙂

10 Comments
AristosQueue (NI)
NI Employee (retired)

The intended use is that every individual process should have its own refnum all pointing to the same underlying semaphore. If you call Obtain Semaphore multiple times with the same name, you'll get different refnums. Release Semaphore invalidates only one refnum. Release Semaphore does not try to acquire the lock first because it would just have to release the lock in order to destroy the refnum -- if it doesn't, then after it destroys the refnum, it can't release the lock. In fact, the intention is to not acquire the lock that anyone still using that refnum errors out, highlighting a problem in your reference management.

 

TL;DR: Don't share the refnum between processes.

drjdpowell
Trusted Enthusiast
highlighting a problem in your reference management.

There is a BIG difference between an arguable coding weakness leading to an error that can be handled, versus one that produces an unrecoverable lock-up of the application.   I am, right now, implementing this workaround to this issue in mission-critical code that can handle errors but must not lock up.

 

Following the principle of least astonishment I would have expected the following:

 

If I acquire a semaphore reference then destroy it, I expect parallel code trying to acquire that same reference to error out on the "Acquire Semaphore.vi", and any code using a separate reference (to the same named semaphore) to be able to continue to function.   

 

Instead, I have parallel code erroring out on "Release Semaphore.vi", and code using a separate reference hanging forever.  This is deeply flawed behaviour and I would recommend against using the current Semaphore implementation in LabVIEW.

AristosQueue (NI)
NI Employee (retired)

> If I acquire a semaphore reference then destroy it

 

You destroyed the semaphore reference, not the semaphore. There's a huge difference. Anyone attempting to acquire the semaphore through the particular reference that you destroyed would get the behavior you specified, but would open up other race conditions. Release Semaphore Reference *cannot know* (not just "does not" as in R&D coded it that way, but "cannot" as in logically impossible) that it should do a Release Semaphore as part of its operations -- there's no way for it to know that logically there is an acquire upstream from it aside from some trivial cases, but even those would arguably violate some of the encapsulation of activity of dataflow nodes. LabVIEW does not have one thread per diagram. There's no guarantee that the thread that executed the Acquire Semaphore is the same one executing Release Semaphore Reference (RSR), even on the same diagram. Once you put RSR into a subVI, all bets are off.

 

The API is quite robust when used as designed.

 

Let me fix this for you:

> I would recommend against using the current Semaphore implementation in LabVIEW.

 

I did not cover it in the AF thread because I was looking to untangle your current code with as light a touch as possible, but the fact that you even need the semaphores is a red flag for me. The activity of coordinating the hand off of the subpanels should be possible by moving the handoff to a coordinating caller actor or similar activity. ALL uses of semaphores that I have ever seen in LabVIEW in almost 2 decades have been patches for architecture mistakes. I will not say that is absolutely true of your code, but long experience makes me suspect it is so. You might consider refactoring to avoid the need for them. 🙂

BertMcMahan
Active Participant

AQ, just FYI the guy you replied to wasn't the guy in the AF code, that was me 🙂 And again, that code wasn't something I'd written, just something I'd been using and had to make work. I think I've used semaphores exactly one time before this, and never had a need for them any other time. I agree the way they're handled in the subpanel code makes for a dangerous implementation, as evidenced by the race conditions I discovered in the MGI code.

AristosQueue (NI)
NI Employee (retired)

Oops. Sorry! Mixed up customer threads. 🙂

drjdpowell
Trusted Enthusiast
ALL uses of semaphores that I have ever seen in LabVIEW in almost 2 decades have been patches for architecture mistakes.

This, my one and only use of a Semaphore in 19 years of LabVIEW, is indeed a patch for an architecture mistake, that of global variables used internally by a C dll that I am wrapping.  

-- James (not Bert)

drjdpowell
Trusted Enthusiast
Release Semaphore Reference *cannot know* (not just "does not" as in R&D coded it that way, but "cannot" as in logically impossible) that it should do a Release Semaphore as part of its operations -- there's no way for it to know that logically there is an acquire upstream from it aside from some trivial cases

If I were to rewrite the Semaphore, I would have to make "Acquire Semaphore" actually acquire something; the current implementation is misnamed, as it merely locks the Semaphore, with no state info anywhere to say which reference actually has acquired ownership of the key to that lock.  With knowledge of ownership, "Release Semaphore Reference" can know if it needs to unlock the Semaphore.

AristosQueue (NI)
NI Employee (retired)

DRJDPowell: That would ameliorate the situation in some cases but still wouldn't solve the problem. People do create unnamed semaphores and share them with two parallel operations within the same top-level VI.

 

Acquire and Release should be used as a pair. Obtain and Release Reference should be used as a pair.

drjdpowell
Trusted Enthusiast
People do create unnamed semaphores and share them with two parallel operations within the same top-level VI.

I don't believe unnamed semaphores have any problem.  If I Acquire a Semaphore then destroy it from one loop, then the parallel thread will error-out at "Acquire".  That is the exact behaviour I would want to happen.

Darren
Proven Zealot
Status changed to: Declined

Any idea that has received less than 2 kudos within 2 years after posting will be automatically declined.