02-05-2018 01:33 AM
Hi all,
How iteration parallelism it's works?
Can anyone give brief explanation about it?
02-05-2018 01:45 AM
The parallel instances terminal specifies the number of loop instances LabVIEW uses to run parallel loop iterations. If you leave the input of the parallel instances terminal unwired, LabVIEW automatically detects the number of logical processors in the machine and uses it as the default parallel instances terminal value.
You can use the input of the parallel instances terminal and the Number of generated parallel loop instances in the For Loop Iteration Parallelism dialog box to improve For Loop performance by oversubscribing or undersubscribing.
/Y
02-05-2018 05:18 AM
Do note that there is overhead associated with parallelizing the loop. So make sure you get some accurate benchmarks before finalizing on using this feature. We have seen a lot of people use this feature but it actually caused a performance decrease.
02-05-2018 06:23 AM
Also note that performance is not the only use case.
When you have an array of classes, you can execute their methods, doing whatever is in them, in parallel too. Loop Parallelism is a very clean way to do that, and the alternatives (e.g. dynamically starting VI's with call and collect) are much more work.
LabVIEW Programming ((make LV more popular, read this)
02-05-2018 10:53 AM
A FOR loop typically repeats the same calculation N times, each time with different data. If none of the data depends on results of previous iterations, all iteration can be done at the same time (or in random order), each on a different processor core, and the results combined at the end.
If the code depends on previous iterations (e.g. contains a shift register that cannot be unfolded) or contains a termination terminal, parallelism is not allowed. There is a tool to tell you which loops are or can be parallelized.
If the FOR loop contains many expensive subVIs that are not reentrant, parallelism does not give you any benefit, because the various iterations can only use one instance of these subVIs so all other scheduled parallel iterations need to wait for these resources most of the time. While most of the heavy lifting should be done with reentrant subVIs, you might need a critical section that is non-reentrant. no problem as long as it is cheap.
There is certainly an overhead with parallelism. The input data needs to be split up, all loop code is duplicated P times in memory, and the data needs to be reassembled at the end. You need to carefully benchmark to see if the extra effort is worth it. If done wrong, your code slows down. If done right, you can get a speed improvement proportional to the number of processor cores (example)
By default, you can have up to 64 parallel instances, but this limit can be improved to 256.
Recent development in processors can get close to the 256 limit, but you probably don't have hundreds of thousands of dollars to buy a system with 8 x 28 cores ($~13000 each CPU) for a total of 224 cores. Note that these are hyperthreaded cores, so we actually have 448 virtual cores.. 😮
02-06-2018 04:41 AM
To me it's weird that this parallelism is so much tight to the CPU. After all, we can start thousands of parallel VI's dynamically. Automatic parallelism like that is one of LV's favourite features. When you want absolute top notch performance, the CPU link makes sense. However, shouldn't there be an option to tell the for loop to simply start the loops in parallel? Or better: shouldn't LabVIEW be able to detect that?
Here (Support-more-parallel-instances-in-the-parallel-FOR-loop) you describe the use case of simply starting things in parallel. OO is perfect for that, you can simply call object methods and they magically run in parallel. However, the 64 is an artificial limit in that case. It simply does not make sense (from a user PoV). Why is it there?
One of the replies suggest that the limit is needed for nested loops. So I figured an inner loop at 64 parallel instances nested in one with 2 would give me 128 parallel processes. However, LabVIEW (2013) has problems compiling that. It's a memory bomb. Even when values of N and P are given dynamically, LabVIEW hangs the entire system (mouse, sound etc.) and builds up memory until at least GB's (did not wait for it). At some point task manager can get between the CPU and you can kill LV...
There is absolutely no reason why LV can't perform the loop more then 256 times in parallel. As a proof, you can copy the for loop above 5 times... Execution time is still 5 seconds (even with the unchanged 64 limit).
It's really sad LabVIEW doesn't simply deal with it. I'll deal with the "overhead"...
LabVIEW Programming ((make LV more popular, read this)
02-06-2018 08:07 AM
While I would have to think far more than I am willing to think this morning to offer an insightful addition to this discussion I can offer two tidbits.
With "hyper-threading" CPUs are already executing instruction before the program counter gets to them. The wonders of hyper-threading selects which result to keep.
One of my rookies was teaching a LV basics class and came to me during a break and asked how a For Loop could sum values from an input array in a shift register and order did not impact the results. he wondered "Why it worked?" This gave me the opportunity (that is rare in most lives) to respond;
"Addition is commutative." and then smiled.
Of course the family did think that story was nearly as funny as I did and there were grumbles of Grampa doing math jokes again.
Ben
02-07-2018 02:45 AM
I just don't get the limit at all... The CPU nr. of cores just doesn't make sense. Other threads are already occupying an arbitrary nr. of cores\threads\whatever. Not saying there's not a good reason, I just don't see one.
All workload is divided in chunks. Max. 64. But When I copy a 64 loops 10 times, I can execute 640 chunks in parallel. So why the limit?
Have to talk to someone at NI about this... Maybe next week at the CLA summit. If I remember to do it (there will be self inflicted beer trauma and not enough sleep), I'll post my findings.
BTW1. The configuration UI limits the nr. to 64. But you can wire anything to the "P" on the diagram. However, only 64 iterations will be executed in parallel. So wiring 65 in my little example, will take 10 seconds. 5 seconds for the first 64, 5 seconds for the 65th.
BTW2. Posted a link to this thread on the idea exchange idea.
LabVIEW Programming ((make LV more popular, read this)
02-28-2018 09:03 AM
I didn't forget!
It came up during the CLA summit. The "compiler team" also did not have an answer ready. I forgot about the question, but got an answer by email just now...
The for loop actually creates parallel instances, and because this consumes memory, a limit seemed reasonable\desirable. 64 is just an arbitrary "not too large, not too small" limit. It's not related to nr. of cores, threads or whatsoever...
Adding "ParallelLoop.MaxNumLoopInstances=10000000" to LabVIEW.ini virtually removes any limit.
LabVIEW Programming ((make LV more popular, read this)
02-28-2018 09:48 AM
Sadly, ParallelLoop.MaxNumLoopInstances is read once, and then stored internally. Using LV Config Write Numeric (I32).vi changes the ini file, but sadly the old value is used in for loops... So a restart is needed.
LabVIEW Programming ((make LV more popular, read this)