12-19-2012 01:50 AM
I didn't have the possibility the test core affinity and/or bios settings, but a quick test with a i5 show very interesting results:
i5 4 parallell:
i5 2 parallell:
i5 parallell disabled:
These numbers show that your rewrite is the fastest all the way from double loops. Which leads to the question: is it a NI problem, Windows or AMD that the Phenom is "bad" in comparison? Most multithreading tests (divx compile, 3d studio and stuff) shows very good results for AMD. Could the SSE2 code be a factor, is that used in these instances?
/Y
12-19-2012 03:07 AM
@Yamaeda wrote:
These numbers show that your rewrite is the fastest all the way from double loops. Which leads to the question: is it a NI problem, Windows or AMD that the Phenom is "bad" in comparison? Most multithreading tests (divx compile, 3d studio and stuff) shows very good results for AMD.
I also got much better results running my Fortran programs on Intel compared to the newer AMDs.
For a while I had a 64 core AMD machine (4 x Opteron 6274 with 16 cores each). On fully parallelized LabVIEW code it was roughly equivalent to a 6 core I7. The parallelization was good, but the performance per core was horrible. Also the "cached" performance was bad, probably due to contention in the critical section where all 64 CPUs need to access the shared data.
Now I have a dual Xeon E5-2687W with 16 cores (32 when counting hyperthreading) and it is significantly faster than the AMD 64 core system. (See also my footnote at the bottom of this page). Of course I built a I7 for 10% of the cost of the dual xeon and it is less than 3x slower than the dual xeon (even less if I overclock it ;)).
AMDs had other problems and incompatibilities, so maybe it is just that the software infrastructure has not been optimized yet.
I am polishing up my new code for release and it will include a detailed benchmarking feature. We will have a lot of data once my users submit their results. 😄
12-19-2012 11:03 AM
Alright, turning off C1E in bios did improve things slightly, to 9.5x slower from 10.5x (on no parallell instance on rewrite). Changing LV core affinity didn't change anything. Is the SSE2-optimization active in development environment, and can i disable that?
/Y
12-20-2012 03:40 AM
I did an exe with and without sse2-optimization and it didn't make any difference.