04-24-2012 08:33 AM
Hi there,
finally I'm a bit confused.... I'm trying to do implement an object based "test sequencer" (no TestStand untill now ) on a quad core cpu. My basic idea was to define a test step as class, instantiate up to four objects and call Execute VIs operating on these objects in parallel:
Actually this is, what the execute VI does:
What I see playing with reentrancy settings of the Execute method is:
Execute NOT reentrant --> ~53900ms run time @ ~27% cpu load (--> only one core)
Execute reentrant (share clones since dynamic dispatch terminals) --> ~38400ms run time @ 100% cpu load (all four cores)
My interpretation is that switching from a single to four cores, performance only increases by 30%. Is this possible or am I missing something fundamentally important?
Thanks for your comments!
Oli
PS: running on LV2011
Solved! Go to Solution.
04-24-2012 09:30 AM
I might be wrong but is it because you are sharing clones - i have a feeling that labview will still only alow one of the vi's to execute at a time - but you get a little bit of advantage because it is always already in memory. (i might be wrong here). However, i guess that doesn't explain why it is using all 4 cores - unless it allows the cores to wait for the vi to become free in this mode(?).
I guess you could check this out if you select the non dynamic nodes option - although presumably you wanted this for some other reason?
04-24-2012 09:41 AM
Thanks for the hint!
For this simple case, I set the terminals to static and went for the Pre-Allocate option. Performance gain was ~3% compared to the dynamic version (which is what I'd like to use). So I must be doing sth else wrong
04-24-2012 09:45 AM
Do you havee an event structure where all 4 elements are going against the same event?
04-24-2012 09:48 AM
Ok. Sorry for the bad suggestion. Another thought. Perhaps the 'in place' node doesn't like to be run in multiple places - haven't used this before myself. Maybe try replacing it with the read out node and read in clusters might work(?).
I say this because although - like i say i haven't used the in place thing - I have seen this behaviour before when i had a sub vi which wasn't set for reentrency.
04-24-2012 10:13 AM
It is possible that the square root operator itself is not reentrant and that is your bottle neck. I would also recommend that you put a Wait 0 in your loop. This allows the sheduler to see if other stuff is ready to run. As written once in your loop the scheduler cannot do anything else.
04-24-2012 12:46 PM
@wideofthemark wrote:
I might be wrong but is it because you are sharing clones - i have a feeling that labview will still only alow one of the vi's to execute at a time - but you get a little bit of advantage because it is always already in memory. (i might be wrong here). However, i guess that doesn't explain why it is using all 4 cores - unless it allows the cores to wait for the vi to become free in this mode(?).
Unfortunately your choice of screen name is accurate here 😉
You've misunderstood reentrancy and shared clones. Reentrant VIs can always run at the same time. The difference is whether LabVIEW pre-allocates copies, or creates new copies on demand, of the reentrant subVI. If you have a reentrant VI that's called from 20 different places in your code, but you know that at most only three of them will ever need to execute at the same time, then shared clones will be more memory efficient. The first call to the subVI will allocate one copy. The next time that subVI is called, if there is not already an idle copy in memory, a new clone will be created and run. If you pre-allocate clones, 20 copies will be created when the top-level VI starts, but most of them time they'll be sitting in memory unused. In this case, pre-allocate clones makes more sense because you know you want 4 copies, but even with shared clones they should run in parallel.
There is one situation where you might want to pre-allocate clones even if you don't expect multiple copies to run in parallel: if you use an uninitialized shift register inside the reentrant VI, pre-allocated clones will always use the same shift register value in any given instance, whereas with a shared clone you don't know which instance you'll get so you won't know what value is in the shift register.
You might experiment with the threadconfig utility or the INI file tokens that allow you to adjust the number of threads that are allocated (the threadconfig utility might just be a nicer interface to the INI tokens, I haven't checked). Also, for benchmarking purposes you may want to disable debugging, and you should probably run the subVIs in some thread other than the user interface thread.
04-24-2012 12:57 PM
Fair enouigh. That makes it clearer.
I have to say that i mostly either have reentrent + preallocate, or not re-enterent at all - mostly when i've needed something to run fast in a couple of places its a small bit of code - so i haven't worried much about memory. But worth bearing in mind for the future.
JP
04-24-2012 01:15 PM
Q:
Is there any difference if you use 4 sepearte class constants instead of a single constant feeding all?
Just curious,
Ben
04-24-2012 07:43 PM