07-17-2014 01:50 AM - edited 07-17-2014 01:55 AM
Hi,
we do some image processing in a parallel loop. Our process is still not fast enough so we are thinking about buying a board with four CPU-sockets with the best available Xeon processor. That would mean 48 "real" cores or 96 thread. So I would like to know if the number of threads that can be used by LV is limited?
best regards
lesnah
PS
We use LV 2013 SP1
07-17-2014 06:17 AM
This article is quite old, but it will give you an idea: http://digital.ni.com/public.nsf/allkb/84eca015aa496b23862565bc006c0f19
07-17-2014 07:08 AM
Thanks for your answer. Your link deals with the threads used by the OS. Unfortunately it doesn't say anything about parallel programming working on several cores (or at least virtual cores) of the CPU.
It would be helpful to know about any experience or limits by LV to evaluate the use of such an expensive "super computer".
07-17-2014 07:14 AM
Once a program makes threads, it is up to the OS to put the different threads on different CPU cores. Ideally, the OS will spread the threads across the many cores, which has been my experience when I have some heavy processing.
07-17-2014 08:48 AM
Win 7 (64bit) supports up to 256 cores (Link MS). So if Windows is able to handle it - LV does as well? It would be interesting whether there's a limit of inefficiency where (almost) linear scaling effects stop working due to increasing administration effort.
07-17-2014 09:07 AM
That depends heavily on the nature of the type of processing and the algorithms used.
Lynn
07-17-2014 11:10 AM
As seens from the link already quoted, tthe number of allocated threads is dependent on the number of avaialble CPU cores. The parallel FOR loop can be set for up to 256 parallel instances with the following ini setting:
ParallelLoop.MaxNumLoopInstances=256 (see here)
If programmed correctly, LabVIEW can use all cores you throw at it. Of course if the code forces sequential execution due to poor architecture, extensive critical sections, and artificial data dependencies, the hands of the LabVIEW compier are tied.
I have a Dual Xeon with 32 virtual cores (=16 hyperthreaded cores) and I can keep them all busy, gettting a 17x speedup over the single core performance (details). The code is highly optimized for parallel execution.
If you need to do highly parallel matrix operations, you might also want to look into the MASM toolkit.
07-18-2014 02:50 AM - edited 07-18-2014 03:17 AM
That is very interesting. The table on your site gives a nice overview about the different processors. To sum it up I would say that (if you use an Intel processor) the speed really increases with the number of cores. Hyperthreading doesn't help that much but it leads to a speedup larger than the number of real cores.
Of course the performance depends on the algorithm. RIght now we are using an Intel i7-3770k and when running our program the 8 single virtual cores are almost at a 100% usage. So we are looking forward to extrapolating that performance to a larger (but more expensive...) system.
Concerning matrix operations we have the problem that our algorithms include a lot of non-linear equations and that's why matrix operations cannot be used. There are some steps containing standard operations like FFT - but to accelerate them we have not been thinking about the MASM toolkit yet. I am going to have a closer look on it. Until now we have been playing with the GPU analysis toolkit (Nvidia CUDA GPU) but it was not that promising especially because of a lot of initializing/allocating trouble.