LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

parallel for-lopp configuration not possible

Solved!
Go to solution

Hi

I have problems in running a parallel for-loop. The task manager indicates only 8% workload, showing that only one of the 12 cores is running. I need that for image processing of a directory of about 20 000 tiff-images (U16, greyscale)

 

I first thought it is a problem of Vision-vis, as they use pointers as image references. Pointers would cause troubles in a parallel for-loop. Therefore I briefly rewrote the morphological filter vis without using the vis from the vision-toolbox. but it does not help. The programm is still running with 8% workload, indicating only 1 core is active.

 

A small test showed that labview is able to run a task with all 12 cores. It is not a windows configuration problem.

 

Maybe someone has an idea why this doesn't work so far. The main programm ist called 'StreifenWegmachenAlleBilderv11Multicore'. All other vis are sub-vis. At the moment it would take ~1 week to process all images. I want to speed it up for an overnight-operation.

 

Greetings

b

 

 

0 Kudos
Message 1 of 11
(2,868 Views)

Hi,

First at all: It's some awesome programming you have done. You seem to have mastered multicore-threading with LabVIEW.

 

 

However there are very little comments in you program and it is quite huge. I find it rather difficult to understand. Could you kindly abstract the problem to a small VI for some testing.

 

Please also refer to LabVIEW style Guide: http://www.ni.com/white-paper/4434/en

 

greetings

Johannes

 

 

 

 

 

 

0 Kudos
Message 2 of 11
(2,828 Views)

Hi johannes

Thanks for the reply. I briefly removed all unnecessary overhead in the program. I think it is now easy to understand.

 

It's a typical problem in prototyping, that programs appear clear for the coder, but difficult for others to catch. I hope it's easier now.

 

Greetings

B

0 Kudos
Message 3 of 11
(2,819 Views)

GaussFilterOhneVision isn't reentrant, that'll effectively stop parallelization.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 4 of 11
(2,796 Views)
Solution
Accepted by topic author werner456

@Yamaeda wrote:

GaussFilterOhneVision isn't reentrant, that'll effectively stop parallelization.


Yes, the missing reentrancy is the main problem. You could even inline it.

 

I also get more than 2x faster result by parallelizing the outer, instead of the inner FOR loop. (~ 1.2seconds on my 16 core Xeon)

Have you tried the 2D convolution directly. Probably would save you quite some programming effort. Even has the padding built-in 😉

 

Also, your generation of the Gaussian kernel is a bit convoluted. You could use the outer product of a 1D gaussian. The results agree within ~1e-18 (see snippet, probably irrelevant after converting back to U16)

 

Message 5 of 11
(2,787 Views)

Thanks, I played a bit around today. I came around with it by configuring the for-loops in the sub-vi as parallelizable. I calculated the remaining time. It decreased from 280 hours to 19 hours. Thats acceptable.

 

I didn't know about that reentrant-option in the vi-properties-configuration. With it it works fine. That was the problem.

 

I also didn't know about 2D convolution. That looks cool. However, labview is just a prototype. The gauss-filter generation is a 3 minute work, without any claims on performance or mathematical filter correctness 😉

 

When I have some free time, I can optimize the type of variables and remove some unnecessary conversions, but it is fine for now.

 

Thanks again for the reentrant-hint. now it's fine

Werner

0 Kudos
Message 6 of 11
(2,774 Views)

Also note that you have only configured 4 parallel instances. You should set that to at least 12 on your 12 core machine.

0 Kudos
Message 7 of 11
(2,773 Views)

There's a CPU-information block which you can use to set up as many loops as logical processors, regardless of system.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 8 of 11
(2,743 Views)

@Yamaeda wrote:

There's a CPU-information block which you can use to set up as many loops as logical processors, regardless of system.

/Y


That is incorrect. If you configure parallelism on the loop, you need to manually set it to the maximum number of parallel instances you expect on the target sytems. If you only configure it for 4 here, you will not be able to use more than that, no matter what you later wire to it.

 

Read the details here

 

 

In addition, if you have sufficient parallel instances configured and want to use the maximum number for the current system, you leave P unwired. No "block" needed.  You can also wire special numbers (0, -1, etc.) to P for specific scenarios (table 1 here).

 

 

 

(There is also currently an upper limit of 64 parallel instances, but you can increase that with an ini entry.)

 

 

0 Kudos
Message 9 of 11
(2,736 Views)

"The Number of generated parallel loop instances setting specifies the number of loop instances to generate at compile time. At run-time, the loop will use the minimum of the value entered in the dialog box and the value wired to (P), so enter the maximum amount of parallelism you ever expect to use in the dialog. For example, if you expect to execute this application on an eight-core computer in the future, enter eight for this setting."

 

Isn't this design tricking the users? By default new programs will only parallellize to 4 threads, will LV2013 be 8 as a performance upgrade?

 

Why wouldn't you just use 64 all the time (can the default be changed?) if it's the lowest of threads and this number ...

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 10 of 11
(2,702 Views)