LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Do Parallel For Loops run on dedicated CPU cores?

Solved!
Go to solution

I don't believe this is the case; my confusion is due to the fact that the literature with regards to this structure discusses using the maximum # of cores the computer in which this codemay run in the For Loop Iteration Parallelism dialog box as well as using the CPU Information function to determine how many instances to parallelize. I suspect parallel For Loops will first be decomposed into iteration clumps/tasks which will either be cooperatively multitasked &/or multithreaded within the same prioritized execution system before they ever get to be mapped to dedicated cores per instance. Is this correct? - the literature I refer to is Improving Performance with Parallel Loops http://zone.ni.com/devzone/cda/tut/p/id/9393, under Performance Tips or the LabVIEW Help (Multiprocessing and Hyperthreading in LabVIEW). I just don't see how this information applies to how the loop instances are parallelized and executed. I hope someone can enlighten me and perhaps others reading these docs.

Thank you !

0 Kudos
Message 1 of 11
(4,811 Views)

Hi jorgeinSD,

The literature you provided describes the parallelization of the iterations of the for loop. You are right in understanding that the for loop is first decomposed into single iterations and the loop with different iteration numbers are run in parallel on multiple cores.

Suppose you have a for loop with 10 iterations and 2 cores, and you wish to run the for loop in parallel, then the Core 1 will run the iterations 0,2,4,6,8 and Core 2 will run the iterations 1,3,5,7,9.

 

 

0 Kudos
Message 2 of 11
(4,807 Views)

Right click the For loop and select "Configure Iteration Parallelism..."

 

Configure it to your computers capabilities. Done....

0 Kudos
Message 3 of 11
(4,804 Views)

Hello ritesh024,

Thanks for the illustration; that helps with my question actually. I'm not convinced the loop iterations (once decomposed or clumped) execute on dedicated cores the way you describe. I think they map to either the same thread (created by an execution system with a defined priority level) in such case LabVIEW resorts to cooperatively multitask across all these iterations or perhaps each clump is mapped to dedicated threads (multithreaded).

To use your illustration of 10 iterations, a Parallel For Loop running in a VI set to run in the Standard Execution System with Normal Priority will likely spawn up to 4 threads:

First possibility: one of these threads is used by all 10 iterations, then all these will multitask (the execution system will share processor time equally across all clumps/iterations).

Second possibility: two or more threads are assigned to the Parallel For Loop iteration clumps. All these threads will work to maintain load balance across all cores assigned to the LabVIEW process. In other words, no specific cores will be negotiated to run specific clumps.

The Third possibility is probably the one you describe; however, I can't see how this is implemented since the configuration options it offers are not "a la Timed Loop". So I'm just trying to get my head around this possibility since the literature appears to use it to describe the benefits and functionality of the Parallel For Loop.

I hope I didn't lose you and my explanation makes some sense.

Regards

0 Kudos
Message 4 of 11
(4,796 Views)

jorgeinSD,



I think they map to either the same thread (created by an execution system with a defined priority level) in such case LabVIEW resorts to cooperatively multitask across all these iterations or perhaps each clump is mapped to dedicated threads (multithreaded).


You are Right in saying that each clump is mapped to dedicated threads, but the catch is that the threads are running on different cores. The purpose of running the iterations of the loop in parallel will be defeated if LabVIEW is simply creating threads on a single core and the threads are executed on the basis of priority or time stamps. 

 



To use your illustration of 10 iterations, a Parallel For Loop running in a VI set to run in the Standard Execution System with Normal Priority will likely spawn up to 4 threads:

First possibility: one of these threads is used by all 10 iterations, then all these will multitask (the execution system will share processor time equally across all clumps/iterations).


If I understood it properly I think this is same as running the loop sequentially. Iterations running sequentially or the iterations being multi-tasked seems same to me as it doesn't parallelize anything.

 



Second possibility: two or more threads are assigned to the Parallel For Loop iteration clumps. All these threads will work to maintain load balance across all cores assigned to the LabVIEW process. In other words, no specific cores will be negotiated to run specific clumps.

The Third possibility is probably the one you describe; however, I can't see how this is implemented since the configuration options it offers are not "a la Timed Loop". So I'm just trying to get my head around this possibility since the literature appears to use it to describe the benefits and functionality of the Parallel For Loop.


Basically what the system does is the load balances the iterations on the number of processors. When I gave the example in my previous post, I just wanted to illustrate that the system equally distribute the work among all the processors. It is possible that the Core 1 executes iter 0,1,2,3,4 and Core 2 executes iter 5,6,7,8,9. The for loop configuration doesn't gives the option to the user to specify which core will run which iterations. The complier must be running a default algorithm to distribute the work equally.

 

 

 

Message 5 of 11
(4,788 Views)

On a side note: If you have LabVIEW 2011, you can configure the P terminal to output the parallel instance ID. A useful tool to study how the iterations are scheduled, depending on the options you pick.

 

 

0 Kudos
Message 6 of 11
(4,766 Views)
Solution
Accepted by topic author jorgeinSD

jorgeinSD, thanks for the insightful question. Your second possibility is correct. Each parallel loop instance is in a different clump, and the execution system schedules those clumps among the available threads. LabVIEW does not bind threads to cores for the Parallel For Loop, so the threads may switch between cores as they execute.

 

Mary Fletcher

Software Engineer

LabVIEW R&D

Message 7 of 11
(4,726 Views)

Thanks for the response and clarification Mary.

Do you know how many threads are spawn by a prioritized Execution system? - is the answer "it depends"?

Regards

0 Kudos
Message 8 of 11
(4,694 Views)

@mfletcher wrote:

Each parallel loop instance is in a different clump, and the execution system schedules those clumps among the available threads.


Does that mean that the loop diagram of a parallel loop cannot have multiple clumps? What if each iteration has several large sections without data dependency and we use fewer parallel instances than we have CPU cores? 

0 Kudos
Message 9 of 11
(4,687 Views)

Altenbach, each loop instance begins its own clump, and the diagram of the loop instance can split into multiple clumps. It doesn't depend on how many parallel instances or cores you have, since the clumping is done at compile-time.

0 Kudos
Message 10 of 11
(4,643 Views)