LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

allocation woes in parallelized loop?

I have a For Loop, calling a certain function inside a DLL, which I wrote. I'm observing something strange when I parallelize the execution of the loop, and I wonder if that rings any bell to anyone - a specific bug or caveat of LV parallelization, for instance, or if I have to look elsewhere.

The platform is win7, LV2010SP1, problem observed on two different computers, a quad and an oct core.

My DLL function (I have the source C code) interfaces with certain openCV 1.0 routines, allocates its own buffers, does its own things, returns its results and deallocates its storage. I think it is not too important to enter in details here; what is relevant is the following:

 

  • if I call repeatedly, in a single thread, this routine, even tens of thousands of times, I observe no memory leak (process memory size monitored in the task manager)
  • if I call this routine in an unparallelized loop (like in the image below, "parallel" off), all is fine, no matter the "number of workers";
  • if I activate parallelization, and "number of workers" is up to 3, all is fine - no error reported, execution times and cpu usage hint at that the job is being run in parallel;
  • if I activate parallelization, and "number of workers" is 4,5 or 6, some of the instances return, at random, error 1097, some others not; however, no memory leak appears, and the resulting data is sound;
  • if "number of workers">6 (on both machines), the routine immediately halts irrecoverably LV, with OpenCV guile dialogs popping up, complaining about null pointers and negative matrix dimensions.

Screenshot from 2013-04-14 16:30:43.png
I had no a-priori statement about the thread safety of the underlying openCV 1.0 routines, but the fact that they DO run in parallel without problems for up to 3 threads, and that execution times scales inverse-proportionally for up to 6 threads when the job is split across instances, is a hint that they hopefully are.
I vaguely remember that the number of threads which LV reserves for some of its subprocesses may be 4 or 8 (e.g. the GUI thread, configurable somehow IIRC), and I wonder if that has anything to do with what I observe.

Any hint?

TIA, Enrico

0 Kudos
Message 1 of 6
(2,913 Views)

Hi Enrico,

 

The default number of threads in LabVIEW is 4.

1 UI thread and 3 others, that is why you can get an error when the number is more than 3. Have a look at this knowledge base which shows how you can increase the number of threads.

 

Regards

 

Arham H
Applications Engineer
National Instruments
0 Kudos
Message 2 of 6
(2,859 Views)

@Arham-H wrote:

The default number of threads in LabVIEW is 4.

1 UI thread and 3 others, that is why you can get an error when the number is more than 3. Have a look at this knowledge base which shows how you can increase the number of threads.


This statement disagrees with "How Many Threads Does LabVIEW Allocate?", which say the number is much higher and depends on the number of CPU cores. Can you clarify your statement?

I've been running up to 32 parallel instances of a parallel for loop without the need of any special thread configuration setting on a system with 16 hyperthreaded cores (32 virtual cores).

0 Kudos
Message 3 of 6
(2,844 Views)

@Enrico_Segre wrote:

Any hint?


You said you made the dll yourself. I am not familiar with openCV. Is the dll compiled so it includes everything in it or does it call external dlls itself? does your dll include any internal parallelizations (e.g. parallel_for). If you parallelize in LabVIEW, you should probably make sure that the dll does not include parallel code (just guessing).

 

You did not say how many parallel instances of the FOR loop you have configured. What is the exact models of your CPUs?

 

LabVIEW 2010 seems a bit old. Have you tried in 2012 (download the free evaluation to test if you can).

 

(I have a dll written in fortran that cannot be called concurrently. I ended up dynamically creating N unique copies of the DLL at startup, then calling it using the parallel instance ID, indexing in an array of dll names. Configure the clfn to specify the path on the diagram. I don't think LabVIEW 2010 has the parallel instace ID output, so you might be out of luck. I do get a 17x  parallelization speedup on a system with 16 hyperthreaded codes and there was no need to tweak the thread configuration at all)

0 Kudos
Message 4 of 6
(2,840 Views)

Thanks to both for the replies. Here I have to admit that I have reported a little too early, before making sufficient analysis. I apologize for that and plan to carry out further tests.

By now I have discovered that the issue of " if N>6 the routine halts irrecoverably" was due to a bug of mine, involving passing a misdimensioned matrix to the routine in that case, rather than to a LV fault. Still, I confirm that when errors 1097 pop up, they do it seemingly at random. Now it may be that the innards of my routine (certainly calling other stuff in OpenCV dlls, no idea about parallelization pragmas) do violate allocations somewhere, possibly also depending on input data; but I still wonder then if there is some issue when errors are generated in parallel threads, so that they are not always concurrently reported. Will see if I can shed more light.


As for looking into more recent LV versions, yes, I have access to them; LV2010 was only the one installed on the target systems where I became aware of the problem.

Enrico

0 Kudos
Message 5 of 6
(2,814 Views)

I stand corrected, the number of threads does depend on the number of CPU cores. I stated 4 threads as this is the number that is by default when a for loop is configured for parallelism.

 

Regards

Arham H
Applications Engineer
National Instruments
0 Kudos
Message 6 of 6
(2,792 Views)