LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Darren's Weekly Nugget 08/10/2009

The Parallel For Loop is one of my favorite new features of LabVIEW 2009.  With most desktop computers containing two or more processors these days, it's nice that LabVIEW already enables us to write parallel code without a second thought.  But the Parallel For Loop takes things one step further by allowing us to configure some For Loops to run their iterations in parallel across multiple processors.  You can check out the link above for a complete description of the feature, but I figured I'd focus on one specific tip for using the Parallel For Loop that I think needs extra attention.  When you run the Parallel For Loop Detector (Tools > Profile > Find Parallelizable Loops...), it will detect all For Loops in your code that could potentially be parallelized.  You can run this option from the toolbar of a project to analyze all VIs in the project, or from the toolbar of a VI to analyze just that VI.  When you browse the results of this analysis, be careful about nested loops.  The Parallel For Loop Detector will detect *any* loop that could be parallelized, but as a general rule, you'll want to avoid parallelizing a loop that resides within another loop that can be parallelized.  This is because there is some extra overhead before and after the loop execution that is responsible for setting up the parallel iterations, then combining the results once the iterations are complete.  This overhead is negligible for a top-level loop because it only happens once.  But for a nested loop, that overhead will occur every time the nested loop runs, and the parallelism overhead may end up taking more execution time than just letting the loop run serially in the first place.  So I recommend sticking with parallelizing only top-level loops, or at the very least, benchmarking your code if you decide to parallelize nested loops.

Message Edited by Support on 08-10-2009 02:09 PM
Message 1 of 12
(8,063 Views)

Darren wrote:

But the Parallel For Loop takes things one step further by allowing us to configure some For Loops to run their interations in parallel across multiple processors. 


I'm not sure I understood this correctly.

I understand that if you have two loops running in parallel, and have two processors, you can have each loop run on a different processor.

But are you saying you can have multiple processors working on a single loop?

I didn't even think that was possible.... but then again I dont know a lot of things Smiley Very Happy

Cory K
Message 2 of 12
(7,998 Views)

Nested loops:

Well, the issue with overhead should have something to do with the number of iterations. So if I would have a outer loop with n=4 and an inner loop with n=incredible large number, it would be advisable to paralleize the inner loop? The same would go for any loop that only iterates a small number (can you give some benchmarks on this, please), the overhaed would not get us the speed increase of the muti-core performance.

 

Anyhow, it is really amazing how NI is trying to get most out of modern multicore PC's.

 

Felix

0 Kudos
Message 3 of 12
(7,989 Views)

Cory:  Yes, in LabVIEW 2009, you can have multiple processors running different iterations of the same For Loop at the same time.  Pretty cool, huh?  🙂

 

Felix:  For most cases, it would still be preferable to only parallelize the outer loop, as you would still see the benefits of parallelism because each of the instances of the nested loop would be running on a different processor.  Now for an extreme case, like if you have an outer loop that only runs twice, an inner loop that runs millions of times, and an octal-core PC, then yes, I think it would make more sense to parallelize the nested loop.  But I would argue that this is a corner case, and would be easily recognized with the benchmarking I recommended in my original post.  🙂

Message 4 of 12
(7,974 Views)

Cory K wrote:

Darren wrote:

But the Parallel For Loop takes things one step further by allowing us to configure some For Loops to run their interations in parallel across multiple processors. 


I'm not sure I understood this correctly.

I understand that if you have two loops running in parallel, and have two processors, you can have each loop run on a different processor.

But are you saying you can have multiple processors working on a single loop?

I didn't even think that was possible.... but then again I dont know a lot of things Smiley Very Happy


Bingo!

 

Yes in older version of LV you could split your huge array into multiple parts and pass each peice to a seprate For loop. By doing this you could get all of the CPU crunching the same pile. But with LV 2009, LV will do this For you!

 

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel
Message 5 of 12
(7,971 Views)

Cory K wrote:

But are you saying you can have multiple processors working on a single loop?

I didn't even think that was possible.... 


YES!  If iterations of the loop do not depend on previous values it can be parallelized.  Thus it can have one CPU working on the i=0 case and another on the I=1 case if they don't depend on each other.  These are not loops executing in parallel, but the frames of the single FOR loop executing in parallel.
It is a good question about unwrapping code in multiple parallel For loops.  I can see how the multiple unwrapping could cause problems with the optimization.  It is a good question about the general rules for nested loops.  For example if a loop of count N is inside a loop of count M what is the time for a large range of N and M in the 4 cases
1. both N and M serial
2. N parallel M serial
3. N serial M Parallel
4. both N and M parallel
For the student,
build a test harness for all 4 cases taking an input integer P and doing some significant calculation in a nested parallel FOR loop.
produce 3 color image plots for N and M from 2 to 2^p by factors of 2, normalize the times for case 2, 3, and 4 to the times from case 1.  Plotted as a function of N and M all on the same Z scale for comparison.
Bonus credit, run the code while stepping the number of CPUs used from 1 to 8....

LabVIEW ChampionLabVIEW Channel Wires

0 Kudos
Message 6 of 12
(7,968 Views)

Darren wrote:

Cory:  Yes, in LabVIEW 2009, you can have multiple processors running different iterations of the same For Loop at the same time.  Pretty cool, huh?  🙂


.... awesome

Cory K
0 Kudos
Message 7 of 12
(7,964 Views)

sth wrote:

Cory K wrote:

But are you saying you can have multiple processors working on a single loop?

I didn't even think that was possible.... 


YES!  If iterations of the loop do not depend on previous values it can be parallelized.  Thus it can have one CPU working on the i=0 case and another on the I=1 case if they don't depend on each other.  These are not loops executing in parallel, but the frames of the single FOR loop executing in parallel.


 

That is a VERY IMPORTANT fact to consider!  You should only use this new feature if there it does not depend on previous values.    I'll have to experiment with this to wrap my mind around it's functionality, because I can picture many caveats with such a feature...  I'm sure it's not dor the weak of heart to play with.... 

0 Kudos
Message 8 of 12
(7,734 Views)

Ray.R wrote:

sth wrote:
YES!  If iterations of the loop do not depend on previous values it can be parallelized.  Thus it can have one CPU working on the i=0 case and another on the I=1 case if they don't depend on each other.  These are not loops executing in parallel, but the frames of the single FOR loop executing in parallel.


 

That is a VERY IMPORTANT fact to consider!  You should only use this new feature if there it does not depend on previous values.    I'll have to experiment with this to wrap my mind around it's functionality, because I can picture many caveats with such a feature...  I'm sure it's not dor the weak of heart to play with.... 


From what I recall in the discussion of the implementation in the beta program, this is a fairly smart feature.  If you throw it at a loop that cannot be parallelized it will not give correct results.  It may try to parallelize it by multi-threading and then dead lock processors to handle the dependency so there would be no gain but a loss in the overhead of setting up the parallel processors.  So you won't get the wrong answer but you may get poor performance.
Fortunately you can control the amount of parallelism on the loop so it should be easy to benchmark.
sth (running LV 8.5 until all my PPC systems die off) 

 

LabVIEW ChampionLabVIEW Channel Wires

0 Kudos
Message 9 of 12
(7,716 Views)

Thanks sth,

 

So that is what Darren was explaining in the original post.

 

R

0 Kudos
Message 10 of 12
(7,671 Views)