Incomprehension about parallel for loops.

Naunaut · ‎11-27-2023

When I activate the parallelism of a for loop with a number of generated instances equal to 1, I notice that it's faster than without parallelism. It's supposed to be the same thing 🤔, I don't understand.

santo_13 · ‎11-27-2023

Using a Parallel for loop incurs additional overhead in spawning and collecting the multiple threads. You will reap the benefits of the parallel execution only if the content of the loop takes longer to execute than the overhead.

Note, that you are assuming that both your parallel pieces under test start at the same time, but they don't!

use a separate start time for each of the sections.

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

Kevin_Price · ‎11-27-2023

There may be several *additional* problems about using the posted code for benchmarking -- I'll let others with more in-depth knowledge discuss constant-folding, memory allocations, CPU caches, etc.

The first main thing I'll point out is that your 2 methods are set to run in parallel to one another, each interfering with the benchmarking you try to do on the other.

Fix that problem first, then try to generate random values to avoid constant-folding and caching optimizations, and (probably) also avoid array splitting and re-building. Try something more along the lines of repeatedly generating a large # of random values and applying an analysis function to them, and try the two methods one at a time using fresh random values.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Yamaeda · ‎11-27-2023

You're trying to measure them in parallell. Don't. Use the time output of the 1st loop as start time of the 2nd.

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

rolfk · ‎11-27-2023

Once more the proverb in German applies very accurately:

Wer misst, misst Mist!

Translated:

Who measures, measures bullshit!

It's not meant to say that you shouldn't measure things, but that if you measure things without VERY carefully thinking about what you are measuring and how you are measuring it, you can indeed better not rely on your measurements.

This is both true for physical measurements (you can't for instance measure both voltage and current at the same time in a circuit with full accuracy, since depending in what order you apply the instruments one will cause an error that is measured by the other) but it applies maybe even more to performance measurements in computers, especially if multithreading and multitasking comes into play!

I'm sure Christian Altenbach has a lot more to say about the specifics of the measurements in this specific example, if he sees this. 😀

Rolf Kalbermatter
My Blog

altenbach · ‎11-27-2023

Your VI has debugging enabled, but enabling parallelism disables debugging for the parallel FOR loop code (unless you click that checkbox). You are comparing apples and oranges.

Besides, as has been mentioned, your benchmarking code is completely meaningless because both code sections run in parallel and step on each others toes, timing wise.

Then we have that crazy loop code and no outputs, so once you disable debugging (as happening in the parallel FOR loop!), the entire things turns into nothing via dead code elimination. Also all your inputs are constants, so constant folding might even play a role if you are not careful.

Please start over and tell us what you are actually trying to do. Obviously you want to optimize some algorithm and we can help with that. Current code is pure Rube Goldberg! It seem you want to turn an array into an array of clusters containing unevenly sized 1D array subsections. You don't need any shift registers or split functions for that!

LabVIEW Champion.

Novgorod · ‎11-27-2023

The benchmark setup is extremely janky, as was pointed out already, but the result is quite interesting regardless. There's no CPU on the planet which could give you 4 orders of magnitude of multi-threading advantage, and that's also why it's irrelevant in this case that both tests run simultaneously.

If I had to guess, I'd say it's most likely a compiler/debugging thing. The output data is not used so the code wouldn't be executed with debugging off. But debugging must be on, otherwise the non-parallel loop wouldn't take 6 seconds to run. Also, the outer loop repeats exactly the same action with the exact same inputs and output values. Maybe for parallel loops (no matter how many threads) the compiler throws out identical iterations or removes debugging altogether. It's very easy to test by wiring the output to an indicator (outside the sequence) and use random data for each iteration.

EDIT: Dang, altenbach was 2 minutes faster.

altenbach · ‎11-27-2023

@altenbach wrote:

It seem you want to turn an array into an array of clusters containing unevenly sized 1D array subsections. You don't need any shift registers or split functions for that!

Quick draft for your inner code:

LabVIEW Champion.

Naunaut · ‎11-28-2023

Thank you for your reply. I chose to display everything on a single Vi, but I can assure you that by deactivating one or other of the loops, the time is indeed different. So that was the debugging.

LabVIEW

Incomprehension about parallel for loops.

Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.

Re: Incomprehension about parallel for loops.