01-16-2013 08:45 AM
rothloup wrote:
I guess it's one graph per logical processor, right? So i've got 4 cores but each core is shared across two CPUs?
With hyperthreading, each core is logically 2 CPUs. So 4 cores is logically 8 CPUs when you have hyperthreading.
01-16-2013 09:17 AM - edited 01-16-2013 09:19 AM
rothloup wrote:If so, then it seems that the problem might be related to how well the OS and/or Labview is hyper-threading the work.
Well, a hyperthreaded core (two logical processors) can only do the real work of one processor. Each logical processor can keep some state information so switching between the processes is faster, but you won't get twice the speed. (details)
All that said, I don't know how exactly the CPU usage is defined for these cases, because I can keep all my hyperthreaded processors at 100% if needed, but I get a parallelization speedup of only the number of real cores, which is half the number of hyperthreaded cores (~16x in my case).
01-16-2013 09:23 AM
Instead of looking at the CPU usage, you should try to compare serialized versus parallelized code to determine the parallelization advantage.
Do a good benchmark and run it with and without parallelization. Look at the speed ratio.
I can't really tell from your snippets what your code is actually doing. How much data is involved?
01-17-2013 11:42 PM
ok, so I did some more research and found an app note which suggested that I profile the executable code using "Intel VTune Amplifer". I did that, and the "Thread Concurrency" analysis showed that my program spends "a significant portion of CPU time in synchronization or threading overhead".
It also reported that more than half the CPU time is spent calling a "EventLoggingEnumLogEntry" function, and about 1/5 of the time is spent calling a "LvVariantCStrSetUI8Attr" function, both of which are contained in the run time engine (lvrt.dll). But I have no idea what in my code causes those functions to be called - I have no event logging of any type setup. I do have an event structure, but I have it placed in a case structure which bypasses the event structure when the program is in the intensive computation mode (i.e. after all the user input is done, which is where the event structure matters...).
Any ideas what those functions are? I also posted a new thread with the same question, since it seems different enough from the main thrust of this discussion that someone who is not following this thread might have an answer.
01-18-2013 02:30 AM
You hanve't really showed us what's inside any of the subVIs that you have in the parallel FOR loops, so we are really completely in the dark. Make sure that you only parallelize the outermost FOR loop, for example you would not want to parallelize anything inside these subVIs.
Can you show us some if the inner code? Are these your own or from a toolkit?
01-18-2013 08:32 AM
yeah, unfortunately, posting snippets of all of my code would be cumbersome because it's a bunch of case structures and increments, other sub VIs, etc.
So I've created a source distribution attached here. I've PW protected a few of the sub-VIs for proprietary reasons. Those sub-VIs contain no parrallized code.
Note that the event structure in the main outer loop is bypassed during the processor intensive portion of the application, because the case structure will execute the "false" case. The code snippets I posted earlier in this thread are from the case structure in the upper right.
Open the "Simulation.vi" VI as the top-level VI.
Let me know if you have any questions regarding the code.