04-24-2012 09:51 PM
I get the feeling this may have a lot to do with thread swapping.
While it's always theoretically faster to process things on multiple cores, the actual implementation gets more difficult. Getting software to run well across multiple cores with the thread swaping is very difficult.
As such, I would certainly expect to see a performance increase when using multiple cores (as you have) however, I would not expect the performance to increase 4 fold (again, what you're seeing). It's simply because there is a large amount of overhead to split tasks among different physical processors, synchronize them when needed, and get data between them as necessary.
04-25-2012 03:26 AM
Thanks for all the input, I did some experimenting on your suggestions:
1.) Replace square root operator by increment operator: gain in absolute processing time; savings non-reentr. / reentr 44%
2.) Integrate Wait 0ms into SubVIs; processing time increases as expected; savings non-reentr. / reentr 33%
3.) SubVI @ different Execution system (standard); gain in absolute processing time; savings non-reentr. / reentr 47%
4.) Use four object instances instead of implicit copies via wire branch incl. Execution system as in 3.) : same as 3.)
5.) Setup as in 4.) but replacing the in place structure in the SubVI with an ordinary unbundle / bundle: increase in Execution time; savings non-reentr. / reentr ~42%
I did not expect to reach savings of 75% but hoped for sth 65%ish. BTW no event structures in the code.
Any more suggestions? I'll be happy to try them out
04-25-2012 04:17 AM
Put MORE code in the benchmark. What you're essentially doing here (because the actual workload is almost zero per iteration) is measuring the overhead of multithreading as others have pointed out.
Put an interpolation step or something slightly more taxing in the code and you'll likely see a larger increase when spread over 4 cores. This is always an important benchmarking thing - overhead vs. code execution.
04-25-2012 05:46 AM
Shane you're right!
I played a little bit with array based functions in order to produce some processor load and found that the gain of using reentrant SubVIs is strongly depending on the type of operation you execute. Using a large (initialized) array and the Linear Fit.vi, I managed to speed up the reentrant version by ~ 69% compared to the non-reentrant one.
It really seems to be an issue of "the right operation" to minimize overhead.
I'm exited how it will turn out in the real world application
04-25-2012 06:33 AM - edited 04-25-2012 06:36 AM
Please remember one thing......
Overhead stays constant no matter how complex the code is in your class assuming all other factors remain the same. Increasing the execution time of your class operation will make the overhead:actual code ratio drop in your favour. Only then will you see good scaleability over multiple cores. The larger the difference between code execution time and overhead, the better your results will be. Regarding the abolute value of the overhead, I'll let someone more informed on the subject answer that one.
Glad to be of help.
Ps can you post the execution time numbers too? I would actually expect more than 69% for a 4x core difference given the right environment. I think you mean that the 4 Core example only takes a bit less than a third of the time of the original, right? That would be kind of OK.
04-25-2012 08:16 AM
So here is what I changed the reentrant SubVI to:
For test I used an array size of 9000000
Reentrant I had 7797ms (mean value) Non-reentrant 27967ms. This is a ratio I can live with rather well!
04-26-2012 04:36 AM
Well that looks kinf od good.
7797 ms for 4 cores versus (perfect scaling) 27967ms / 4 =6991 ms for perfect scaling. Nice. Factor 3.5x Speed-up for 4 cores.
Way to go LabVIEW. (And Oli of course).