11-27-2023 07:58 AM - edited 11-27-2023 08:11 AM
When I activate the parallelism of a for loop with a number of generated instances equal to 1, I notice that it's faster than without parallelism. It's supposed to be the same thing 🤔, I don't understand.
Solved! Go to Solution.
11-27-2023 08:50 AM
Using a Parallel for loop incurs additional overhead in spawning and collecting the multiple threads. You will reap the benefits of the parallel execution only if the content of the loop takes longer to execute than the overhead.
Note, that you are assuming that both your parallel pieces under test start at the same time, but they don't!
use a separate start time for each of the sections.
11-27-2023 08:54 AM
There may be several *additional* problems about using the posted code for benchmarking -- I'll let others with more in-depth knowledge discuss constant-folding, memory allocations, CPU caches, etc.
The first main thing I'll point out is that your 2 methods are set to run in parallel to one another, each interfering with the benchmarking you try to do on the other.
Fix that problem first, then try to generate random values to avoid constant-folding and caching optimizations, and (probably) also avoid array splitting and re-building. Try something more along the lines of repeatedly generating a large # of random values and applying an analysis function to them, and try the two methods one at a time using fresh random values.
-Kevin P
11-27-2023 09:42 AM
You're trying to measure them in parallell. Don't. Use the time output of the 1st loop as start time of the 2nd.
11-27-2023 10:49 AM - edited 11-27-2023 10:51 AM
Once more the proverb in German applies very accurately:
Wer misst, misst Mist!
Translated:
Who measures, measures bullshit!
It's not meant to say that you shouldn't measure things, but that if you measure things without VERY carefully thinking about what you are measuring and how you are measuring it, you can indeed better not rely on your measurements.
This is both true for physical measurements (you can't for instance measure both voltage and current at the same time in a circuit with full accuracy, since depending in what order you apply the instruments one will cause an error that is measured by the other) but it applies maybe even more to performance measurements in computers, especially if multithreading and multitasking comes into play!
I'm sure Christian Altenbach has a lot more to say about the specifics of the measurements in this specific example, if he sees this. 😀
11-27-2023 11:26 AM - edited 11-27-2023 11:35 AM
Your VI has debugging enabled, but enabling parallelism disables debugging for the parallel FOR loop code (unless you click that checkbox). You are comparing apples and oranges.
Besides, as has been mentioned, your benchmarking code is completely meaningless because both code sections run in parallel and step on each others toes, timing wise.
Then we have that crazy loop code and no outputs, so once you disable debugging (as happening in the parallel FOR loop!), the entire things turns into nothing via dead code elimination. Also all your inputs are constants, so constant folding might even play a role if you are not careful.
Please start over and tell us what you are actually trying to do. Obviously you want to optimize some algorithm and we can help with that. Current code is pure Rube Goldberg! It seem you want to turn an array into an array of clusters containing unevenly sized 1D array subsections. You don't need any shift registers or split functions for that!
11-27-2023 11:37 AM - edited 11-27-2023 11:38 AM
The benchmark setup is extremely janky, as was pointed out already, but the result is quite interesting regardless. There's no CPU on the planet which could give you 4 orders of magnitude of multi-threading advantage, and that's also why it's irrelevant in this case that both tests run simultaneously.
If I had to guess, I'd say it's most likely a compiler/debugging thing. The output data is not used so the code wouldn't be executed with debugging off. But debugging must be on, otherwise the non-parallel loop wouldn't take 6 seconds to run. Also, the outer loop repeats exactly the same action with the exact same inputs and output values. Maybe for parallel loops (no matter how many threads) the compiler throws out identical iterations or removes debugging altogether. It's very easy to test by wiring the output to an indicator (outside the sequence) and use random data for each iteration.
EDIT: Dang, altenbach was 2 minutes faster.
11-27-2023 12:00 PM
@altenbach wrote:
It seem you want to turn an array into an array of clusters containing unevenly sized 1D array subsections. You don't need any shift registers or split functions for that!
Quick draft for your inner code:
11-28-2023 01:52 AM
Thank you for your reply. I chose to display everything on a single Vi, but I can assure you that by deactivating one or other of the loops, the time is indeed different. So that was the debugging.