LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Queues - Enqueue / Dequeue Element Scheduling

I modified the original example to show this more clearly. If you open and run the main VI in the project, and then go to the bottom loop and drill two VIs down, you will see that the EnQ prim gets a timeout of 1000, but the timed sequence takes ~2200 ms to execute and the timed out output is FALSE. I don't know if the timed sequence allows playing with the thread allocation or not (I'm on a VM where I can't even assign it to a separate processor), so maybe to completely discount your clump suggestion we would need two separate VIs configured to run in different threads.


___________________
Try to take over the world!
0 Kudos
Message 21 of 28
(2,554 Views)

@AristosQueue (NI) wrote:

Looking at tst's latest posted VI, I have an explanation for its behavior. I'm pretty confident of this... though I haven't put it under a debugger to make this definative. If you have evidence to contradict this theory, please share it.

 

Remember that LabVIEW does cooperative multitasking between loops. We divide a block diagram into clumps. Each clump runs atomically and then provides an opportunity to switch to another clump. The Enqueue primitive always creates a clump division because of its asynchronous nature. So it is entirely possible that the very first call to Tick Count executes and then a thread swap occurs, allowing some other clump (like that third while loop, or a VI somewhere inside LV itself) some time to execute, and then the Enqueue happens. That interim is adding time to the final calculation between the two Tick Counts. After the Enqueue executes, there is then *another* opportunity for the clumps to swap out, possibly adding more time before the second Tick Count is read.  The reason you don't see this hit every iteration of the loop is that sometimes there isn't another clump ready to run, so loop ends up running as a block.

 

Now, the info I've presented above would result in some amount of skew. But exactly 965 milliseconds every time on both your machine and mine? That's odd. However, I note that if I change from "2533" as the delay on the bottom loop to be "2333", then that reproducible time changes to "1765" -- drop one by 200 ms, the other drops by 200ms. On the other hand, if you go up to 2733, the reproducible time changes to 1133. So there's some amount of interaction here, having to do with timing of thread swaps.

 

You also have to remember that you're on Windows. Windows thread swapping happens when it happens, and the OS is caprcious. These kinds of time slips are more than reasonable. If it bothers you, you'll need a real-time operating system.


I understand what you're saying with regards to both the OS scheduling and the clump execution via abstracted "threads" as part of LV's execution system. What I'm still puzzled with I guess is why, no matter how large I set the "Time 1" control doesn't the second loop ever timeout until the first loop dequeues? This works as expected in the first simple example in the Enqueue Issue Test virtual folder but not in the larger example that tst is playing with. A certain amount of jitter is fair enough but the larger example demonstrates some form of blocking operation that never times-out. I can never seem to make the second loop time-out.

 

For example, if I set Time 1 to 10000ms, then the second loop blocks for that time minus the setting of the "Time 2" control which is when the second loop is executing ie the Frame Duration indicator is 9900 ms (10s - 100ms setting in Time 2). How does OS jitter and Thread execution explain this block for almost 10 seconds when the TImeout input is only 1000ms?

 

I also notice that the Get Queue Status primitive doesn't show any pending inserts (ie. this count is 0) in the same example (this is dervied via the Property Devices Waiting in the RS485Network class). So it's almost as if the Enqueue itself is blocked. Again, if I modify the simple example to obtain the pending insert status, this clearly shows "1" since there is almost always 1 loop waiting to iterate, exactly as you'd expect.

 

I've tried not using the NetworkRef queue refnum in the Release Network vi but instead obtaining the Queue by name as shown below but this appears to make no difference. I've also pulled out the Processor option of the timed sequence so that the top level vi can configure the processor that each loop uses for that timed sequence but this doesn't appear to make any difference either (sorry, the snippet below doesn't show this but he attached code does).

 

It seems all kinda strange and is putting me off using Timeouts on the Queue primitives in LV classes at all, which is a pity since this is a great way to prevent the caller blocking in some of our applications eg. simultaneous TestStand calls. I know that the Dequeue primitive is a lot more popular in my use case and has thus had a lot more exposure to debugging as new features have been added to the LV environment.

 

I have attached a modified version of tst's example that I was playing with. Any thoughts appreciated.

 

EDIT: I forgot to mention that I have updated my threadconfig to have 8 logical threads per priority per Execution System, and I'm running in a non-VM, Win64 environment with 8 logical cores, which gives me 64 little execution engnes to play with in this simple example. 

 

 

temp.png

0 Kudos
Message 22 of 28
(2,532 Views)

@AristosQueue (NI) wrote:

And as a senior software engineer talking to my new hire self, I express the same shocked expectations. And my new hire self looks back at me and asks, "And where were you 13 years ago?" No one caught it until this year.

Everyone shoulf be forced to talk to their fomer selves from earlier (in my case nearly 20 years ago!) in order to humble themselves and also to have the nice side effect of realising how much they've (hopefully) learned in the time between.

 

"Never forget where you came from".

0 Kudos
Message 23 of 28
(2,522 Views)

As soon as you put the timed loop in there, you're out of my domain. Does that even compile? I thought a timed loop would break if it included an asynchronous node? I guess not. Are you testing this on an RT system or on Windows? On Windows, the timed loop is just as flakey as any other timing operation.

0 Kudos
Message 24 of 28
(2,495 Views)

@AristosQueue (NI) wrote:

As soon as you put the timed loop in there, you're out of my domain. Does that even compile? I thought a timed loop would break if it included an asynchronous node? I guess not. Are you testing this on an RT system or on Windows? On Windows, the timed loop is just as flakey as any other timing operation.


Yep, compiles just fine on that Win7 x64 machine I was talking about in my earlier post. It's actually the code that tsts was working with and posted, I just tried a few things out with it using it as a sandbox. Though having said that I have had problems before with calls to any function that asorbs and blocks th thread (eg. dotNet calls that force the calling thread to block). Of course, removing the timed loop doesn't change the beahviour I described in my previous post; only how tst was trying to detect and prove it existed. I don't care how indeterministic Windows is (and it certainly can't claim to be quasi-realtime), I'm still looking for a logical argument that explains this:

 


@tyk007 wrote:

For example, if I set Time 1 to 10000ms, then the second loop blocks for that time minus the setting of the "Time 2" control which is when the second loop is executing ie the Frame Duration indicator is 9900 ms (10s - 100ms setting in Time 2). How does OS jitter and Thread execution explain this block for almost 10 seconds when the TImeout input is only 1000ms?


.. or, if I continue this train of thought, no matter what value I set Time 1 to (try 60 seconds for a laugh) the second loop blocks for that time minus 100ms. This behaviour, unlike OS scheduling as you pointed out, is a pattern and therefore not random. What, the scheduler never comes back to checking this Enqueue Element blocked call at all in 60 seconds? Sounds unlikely to me. Hey, I even tried setting it to 5 minutes. Guess what happened. I'd be happy if it was just a bug somewhere in my code (at least then I'd know I'd done wrong). 

 

There's definitely a "bug" going on here somewhere, and I suspect it's root is retrieving a queue refnum out of a DVR. This would also explain why the Queue Status vi doesn't even show any pending insert requests in this blocked time. Thats right, in my last test of 5 minutes the #pending insert requests was 0 the entire time.

 

(Yeah, I love bolding my comments to emphasise since my posts tend to drag on. I'm trying to kick the habit, honest).

0 Kudos
Message 25 of 28
(2,479 Views)

If you're getting the queue refnum out of the DVR, are you doing the Dequeue inside or outside of the In Place Element structure? If you're doing it inside then the DVR itself is locked and keeping anyone else from getting the refnum until the current operation times out. Could that be the issue?

0 Kudos
Message 26 of 28
(2,461 Views)

@AristosQueue (NI) wrote:

@tst wrote:

@AristosQueue (NI) wrote:
If you have multiple readers from a queue and a single enqueue, the readers are serviced in random order

Really? I would have expected them to work in the order in which they started executing. It would seem to make more sense, precisely to avoid a case like this.


And as a senior software engineer talking to my new hire self, I express the same shocked expectations. And my new hire self looks back at me and asks, "And where were you 13 years ago?" No one caught it until this year.


I just want to add something to this specific discussion:

I expect LV to handle the dequeue functions within different threads. So on multicore systems, those threads could arbitrarily collide on this shared resource. To even make things more complicated, dequeue could be placed in threads with different priorities, essentially messing up any order (which might have been "present") because of pre-emtpion......

This is the major reason why i always insist that a queue only has to have a single reader. EVER.

 

Norbert

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
0 Kudos
Message 27 of 28
(2,454 Views)

@AristosQueue (NI) wrote:

If you're getting the queue refnum out of the DVR, are you doing the Dequeue inside or outside of the In Place Element structure? If you're doing it inside then the DVR itself is locked and keeping anyone else from getting the refnum until the current operation times out. Could that be the issue?


I totally agree, on face value it sounds exactly like that. In this case though I'm obtaining the Queue RefNum via a Property Node rather than putting the entire code into an IPE. I'm guessing the Property Node is syntactic sugar for "IPE the DVR, call the Property Node VI on the object, return the results and then unlock the DVR via exiting the IPE structure". Below is the code that I posted previously, modified to remove the Timed Structure and add back in the Property Node call that obtains the queue refnum from the private data of the RS485Network class:

 

temp.png

 

The nice part of using the enqueue as you see here is that, in one atomic operation, I have assumed a lock over the resource and indicated who locked it (via that Device name property).

 

As Norbert points out in his response, there is no guarantee that one thread (perhaps of a low priority) wouldn't be starved using this basic lock technique (though there is only ever one item in the queue, not several since the queue is a fixed size of 1; the order is not therefore not important). However this is true of any non-deterministic locking based system. I believe the DVR lock mechanism with IPE structures and the Semaphores would have the same issue. A seperate, abstracted queue-based system would be needed to maintain a list of clients and then service them in turn if we wanted to avoid the starvation factor. Of course in my example I only have two "clients".

 

I feel like I'm fighting a losing battle here, but the problem here is not that the second loop is starved. I expect this to happen - no, I'm deliberately causing this to happen because I want to see the second loop time-out. That's right, I want to see that "timed out?" terminal of the Enqueue element actually operate for a change, which is what tst has been trying to demonstrate doesn't happen in this example (thank you tst for helping here btw). No matter what I try, the second loop will not time-out and the only lock mechanisms that exist in the code above are the property node and the Enqueue element. And the Enqueue element has a time-out built into it, right?

 

Yeah, I could ditch all this and just run with an IPE in each client call as several have suggested - of course the problem with that is IPE nodes do not time-out, blocking all client calls to the resource if one call goes rogue inside an IPE, so I'm stuck with the old SEQ method we're all familiar with (albeit enqueuing rather than dequeing) which leverages the queue timeouts and relinquishing cpu time.

 

But I'm also interested in other approaches that don't block their callers indefinitely (ie have time-out capability). Using a SEQ is a round-about, though supposedly tried and true, way of achieving that.

 

And yeah I was supposed to come up with a simpler example than this ... working on it Smiley Happy 

0 Kudos
Message 28 of 28
(2,432 Views)