Showing results for 
Search instead for 
Did you mean: 

How does it iteration parallelism works?

Hi all,

         How iteration parallelism it's works?

         Can anyone give brief explanation about it?


0 Kudos
Message 1 of 16

 The parallel instances terminal specifies the number of loop instances LabVIEW uses to run parallel loop iterations. If you leave the input of the parallel instances terminal unwired, LabVIEW automatically detects the number of logical processors in the machine and uses it as the default parallel instances terminal value.

You can use the input of the parallel instances terminal and the Number of generated parallel loop instances in the For Loop Iteration Parallelism dialog box to improve For Loop performance by oversubscribing or undersubscribing.


G# - Award winning reference based OOP for LV, for free! ADDQ VIPM Now on GitHub
"Only dead fish swim downstream" - "My life for Kudos!" - "Dumb people repeat old mistakes - smart ones create new ones."
0 Kudos
Message 2 of 16

Do note that there is overhead associated with parallelizing the loop.  So make sure you get some accurate benchmarks before finalizing on using this feature.  We have seen a lot of people use this feature but it actually caused a performance decrease.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 3 of 16

Also note that performance is not the only use case.


When you have an array of classes, you can execute their methods, doing whatever is in them, in parallel too. Loop Parallelism is a very clean way to do that, and the alternatives (e.g. dynamically starting  VI's with call and collect) are much more work.

Message 4 of 16

A FOR loop typically repeats the same calculation N times, each time with different data. If none of the data depends on results of previous iterations, all iteration can be done at the same time (or in random order), each on a different processor core, and the results combined at the end.


If the code depends on previous iterations (e.g. contains a shift register that cannot be unfolded) or contains a termination terminal, parallelism is not allowed. There is a tool to tell you which loops are or can be parallelized.


If the FOR loop contains many expensive subVIs that are not reentrant, parallelism does not give you any benefit, because the various iterations can only use one instance of these subVIs so all other scheduled parallel iterations need to wait for these resources most of the time. While most of the heavy lifting should be done with reentrant subVIs, you might need a critical section that is non-reentrant. no problem as long as it is cheap.


There is certainly an overhead with parallelism. The input data needs to be split up, all loop code is duplicated P times in memory, and the data needs to be reassembled at the end. You need to carefully benchmark to see if the extra effort is worth it. If done wrong, your code slows down. If done right, you can get a speed improvement proportional to the number of processor cores (example)


By default, you can have up to 64 parallel instances, but this limit can be improved to 256.


Recent development in processors can get close to the 256 limit, but you probably don't have hundreds of thousands of dollars to buy a system with 8 x 28 cores ($~13000 each CPU) for a total of 224 cores. Note that these are hyperthreaded cores, so we actually have 448 virtual cores.. 😮

0 Kudos
Message 5 of 16





0 Kudos
Message 6 of 16

While I would have to think far more than I am willing to think this morning to offer an insightful addition to this discussion I can offer two tidbits.


With "hyper-threading" CPUs are already executing instruction before the program counter gets to them.  The wonders of hyper-threading selects which result to keep.


One of my rookies was teaching a LV basics class and came to me during a break and asked how a For Loop could sum values from an input array in a shift register and order did not impact the results. he wondered "Why it worked?" This gave me the opportunity (that is rare in most lives) to respond;




"Addition is commutative." and then smiled.




Of course the family did think that story was nearly as funny as I did and there were grumbles of Grampa doing math jokes again.






Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel
0 Kudos
Message 7 of 16

I just don't get the limit at all... The CPU nr. of cores just doesn't make sense. Other threads are already occupying an arbitrary nr. of cores\threads\whatever. Not saying there's not a good reason, I just don't see one.


All workload is divided in chunks. Max. 64. But When I copy a 64 loops 10 times, I can execute 640 chunks in parallel. So why the limit?


Have to talk to someone at NI about this... Maybe next week at the CLA summit. If I remember to do it (there will be self inflicted beer trauma and not enough sleep), I'll post my findings.


BTW1. The configuration UI limits the nr. to 64. But you can wire anything to the "P" on the diagram. However, only 64 iterations will be executed in parallel. So wiring 65 in my little example, will take 10 seconds. 5 seconds for the first 64, 5 seconds for the 65th.


BTW2. Posted a link to this thread on the idea exchange idea.

0 Kudos
Message 8 of 16

I didn't forget!


It came up during the CLA summit. The "compiler team" also did not have an answer ready. I forgot about the question, but got an answer by email just now...


The for loop actually creates parallel instances, and because this consumes memory, a limit seemed reasonable\desirable. 64 is just an arbitrary "not too large, not too small" limit. It's not related to nr. of cores, threads or whatsoever...


Adding "ParallelLoop.MaxNumLoopInstances=10000000" to LabVIEW.ini virtually removes any limit.

Message 9 of 16

Sadly, ParallelLoop.MaxNumLoopInstances is read once, and then stored internally. Using LV Config Write Numeric (I32).vi changes the ini file, but sadly the old value is used in for loops... So a restart is needed.

0 Kudos
Message 10 of 16