11-17-2014 02:54 PM
I was playing with large matrices and was testing execution speed on my laptop (the code is destined to run on RT).
The code below tests using the Matrix Multiply function vs just straight array math. Unsurprisingly the Matrix Multiply function executed faster. Then when I ran it on RT both methods took the same amount of time? And Matrix Multiply takes LONGER on RT than PC? Is there some windows-optimized code under the hood that is not available on RT?
I was hoping it would be faster since my laptop is several years old and this is NI's super duper 8135 PXI controller. I did deploy this as a startup app to ensure running from the IDE did not affect timing when running on the RT.
Solved! Go to Solution.
11-17-2014 03:02 PM
I don't trust your benchmark. You have the two processes happening in parallel. That ruins any kind of benchmarking you may be hoping to do. They need to happen in series with a timer in between.
11-17-2014 03:15 PM - edited 11-17-2014 03:16 PM
I agree with Tim that your benchmark is flawed. If you are lucky, both code fragments execute on a different core in parallel, but there is no guarantee how things are scheduled. You need to isolate the two code paths to make sure they don't step on each other's toes.
To speed things up you could change the upper code to a parallel FOR loop. See if it makes a difference. (there is also the multicore and sparse matrix toolkit that can parallelize the operation).
11-17-2014 03:17 PM
You are correct sir, I shouldn't have slapped that code together so fast. The fun and danger of rapid prototyping in LabVIEW.
All results (both methods and both platforms) now report ~4.3ms. I sure wish I could have that <2ms back...
11-17-2014 03:18 PM - edited 11-17-2014 03:23 PM
@Jeremy_Marquis wrote:
I sure wish I could have that <2ms back...
Have you tried using a parallel FOR loop?
It also looks like you have debugging enabled. Disable that.
Can you attach your actual VI so I can play around with it?
11-17-2014 03:35 PM
You mean iteration parallelism? I tried that and it didn't seem to make a difference. But here is the VI, knock yourself out, Christian!
11-17-2014 04:05 PM - edited 11-17-2014 04:26 PM
I get the following values (note that you get negative times (or wrapped U32) times because you subtract wrong ;)).
Serial FOR loop: 3.8ms
Parallel FOR loop (4 cores): 0.9ms. Yes, we get a proportional speedup!
Parallel FOR loop (32 cores): 0.45ms
Stock: 1.2ms
Multicore toolkit: 0.04ms to 0.3ms (High jitter)
(this is on a 16 core Xeon machine (Dual E5-2687w, 3.1GHz), 32 virtual cores))
11-17-2014 04:27 PM
When I plot the time differences I see large spikes early in the process but only on the Multiply and Sum loop. This pattern persists even if loop parallelism is turned off.
I have no idea about what is going on. When I increase the number of iterations I often see more spikes in the timing but the largest is almost always early in the process. With larger iteration counts, I sometimes see regions where the matrix multiply loop time also jump up. Not nearly as high and not consistently early.
Lynn
11-17-2014 04:29 PM
johnsold wrote:I have no idea about what is going on.
Is there anything els running on the computer?
11-17-2014 04:32 PM
Several things, but why would they tend to delay one loop over the other and only early in each run?
Lynn