LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Dataflow problems with FOR loop parallelism / LabView 2009

Hi

 

I'm having a bit of bother with FOR loop parallelism - in particular, the dataflow model seems to break in a subtle way if parallelism is enabled. The loop outputs "appear" as soon as the loop starts, even though it hasn't done what it needs to do.

 

Please see attached VI. Originally I was doing some FFTs but I've simplified the question to this.

 

The top loop is simple enough and does what I'd expect.

The middle loop shows what I struggled with for ages. Why is the time difference zero?

The bottom loop shows one way to fix it - but I don't understand why it fixes it. Surely the middle loop shouldn't "complete" until all the iterations are complete?

 

Many thanks

 

John

 

0 Kudos
Message 1 of 8
(3,862 Views)

Hi John,

 

you should use proper structures with forced dataflow for correct time measurement, see the attachment...

 

You have to ensure the correct sequence of program steps - and here's the only reason for using a sequence structure. You have to measure time before the loop starts and after the loop. Don't do this in parallel. To avoid constant folding you should not use constants in the loop - use a RandomNumber instead...

Best regards,
GerdW


using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019
0 Kudos
Message 2 of 8
(3,855 Views)

Hi

 

Thanks for the reply. Really I'm only using the timings to indicate execution order, not for precise timings per se.

 

Your fix is equivalent to the bottom of the three loops in my example, and yes it works. However, what I'm getting at is this: why doesn't the array passed in my middle example prevent the sequence executing, and thus the time value being read, until the loop is complete? In particular, if loop parallelism is disabled, it then does work as I expect.

 

I suspect the answer is some subtle optimisation that goes on. Attached is another example. In my view, the two time values should be the same (OK give or take a millisecond), as the flat sequences shouldn't be able to execute untill the associated FOR loops have finished and passed their output arrays on. However in the bottom case, the sequence runs right away. In the top case, adding the extra indicator has forced LabView to wait for the loop to complete before running the sequence - which is what wiring the array to the border of the sequence should have forced it to do anyway, in my opinion.

 

The point is, changing from a simple FOR loop to a parallel one has changed how dataflow works, which seems a bit wrong to me.

 

cheers

 

John

0 Kudos
Message 3 of 8
(3,845 Views)

Hi John,

 

it seems to be one of the wonders of compiler optimization. Making one wire a little bit longer helps to exorcise the ghosts Smiley Wink

 

And why do you convert all the timestamps to I32?

Best regards,
GerdW


using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019
0 Kudos
Message 4 of 8
(3,835 Views)

Hi

 

Maybe it is one of the "wonders" of optimisation. It means one has to be careful when using the parallelism feature.

To be fair the Profile->Find Parallelizable Loops dialog does come up with:

"This For Loop may or may not be safe to parallelize. Warning(s):
- A node in the For Loop may have side effects."

though it doesn't specify which node, or what side effects. Removing the delay (which I only put in to highlight execution order anyway) removes the warning.

 

Why did I cast the timer into I32 instead of U32? I had some problem in the past, but I forget what right now.

 

Ho hum. I'll be more wary of the handy parallel FOR loop feature in future - maybe just do it the hard way!

 

John

0 Kudos
Message 5 of 8
(3,807 Views)

In case anybody is reading: I finally got around to sorting this out.

Apparently it's a known bug in LabView 2009. Fixed in the next version.

 

John

 

0 Kudos
Message 6 of 8
(3,730 Views)

 


@camtest wrote:

 

Your fix is equivalent to the bottom of the three loops in my example


No, it's not. When benchmarking, you must wrap the initial timer in a sequence structure prior to the code you want to benchmark. In your examples, it's essentially paralleled with the process under test, and it could be called before that code executes, or even after.

 

0 Kudos
Message 7 of 8
(3,722 Views)

Back to trying to optimise parallel for loops, and now with LV2010. It's a bit strange. See attached VIs. They're not useful as such; just a much reduced version of what I'm trying to do.

The "outer for parallelism" just runs the parallelism VI and tells how long it took. This is the VI I run each time.

 

On my machine, if I open both and also both their diagrams, and then run the "outer", it reports around 530ms.

The times in "parallelism" itself do what I'd expect: i.e. time 1 + time 2 = time 3; also time 3 = that reported by the outer.

 

Now configure iteration parallelism in the "parallelism" VI, with one instance and one worker. Run the outer VI - it's about the same (good!).

 

Two instances and one worker - run the outer VI - a bit slower? (say, 610ms)

More interestingly, time 1 + time 2 != time 3? If time 1 and time 2 are reported, what more is there to do in the middle frame before moving on to the next frame in the sequence?

 

Two instances and two workers - slower again (800ms) and it gets slower the more times I run it (up to 1600ms after ten runs). Sometimes it slows down quite quickly, other times it runs fine for a few runs.

 

Turn off parallelism, run the outer VI once more. Now it takes 1300ms, but at least the times add up once more.

 

Save to disk, close LabView (which takes an age - maybe 15 seconds). Reload, rerun, and we're back to 600ms again.

 

The point of all this? I'm trying to optimise some number crunching. However I'm struggling to separate the difference I may make by changing the code, and the difference I get from running the code several times. Any suggestions how to get some more repeatable results?

 

thanks

John

 

Machine is dual core centrino Dell laptop. 4gb RAM (3gb accessible). Win7 / 32bit. LV2010.

I've seen similar behaviour on another - desktop - machine (core i7, 4gb, Win7/32bit, LV2010).

 

Download All
0 Kudos
Message 8 of 8
(2,992 Views)