LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Matrix Multiply slower on RT than PC?

Solved!
Go to solution

I was playing with large matrices and was testing execution speed on my laptop (the code is destined to run on RT). 

 

The code below tests using the Matrix Multiply function vs just straight array math.  Unsurprisingly the Matrix Multiply function executed faster.  Then when I ran it on RT both methods took the same amount of time? And Matrix Multiply takes LONGER on RT than PC?  Is there some windows-optimized code under the hood that is not available on RT?

 

I was hoping it would be faster since my laptop is several years old and this is NI's super duper 8135 PXI controller. I did deploy this as a startup app to ensure running from the IDE did not affect timing when running on the RT.

 

 

2014-11-17 14_22_05-Matrix-Vector Benchmark.vi Block Diagram.png

0 Kudos
Message 1 of 14
(3,516 Views)
Solution
Accepted by Jeremy_Marquis

I don't trust your benchmark.  You have the two processes happening in parallel.  That ruins any kind of benchmarking you may be hoping to do.  They need to happen in series with a timer in between.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
Message 2 of 14
(3,512 Views)

I agree with Tim that your benchmark is flawed. If you are lucky, both code fragments execute on a different core in parallel, but there is no guarantee how things are scheduled. You need to isolate the two code paths to make sure they don't step on each other's toes.

 

To speed things up you could change the upper code to a parallel FOR loop. See if it makes a difference. (there is also the multicore and sparse matrix toolkit that can parallelize the operation).

0 Kudos
Message 3 of 14
(3,503 Views)

You are correct sir, I shouldn't have slapped that code together so fast.  The fun and danger of rapid prototyping in LabVIEW.

 

All results (both methods and both platforms) now report ~4.3ms.  I sure wish I could have that <2ms back...

 

2014-11-17 15_15_34-Matrix-Vector Benchmark.vi Block Diagram.png

0 Kudos
Message 4 of 14
(3,499 Views)

@Jeremy_Marquis wrote:

 I sure wish I could have that <2ms back...


Have you tried using a parallel FOR loop?

It also looks like you have debugging enabled. Disable that.

 

Can you attach your actual VI so I can play around with it?

0 Kudos
Message 5 of 14
(3,495 Views)

You mean iteration parallelism? I tried that and it didn't seem to make a difference.  But here is the VI, knock yourself out, Christian!

 

 

0 Kudos
Message 6 of 14
(3,477 Views)

I get the following values (note that you get negative times (or wrapped U32) times because you subtract wrong ;)).

 

Serial FOR loop: 3.8ms

Parallel FOR loop (4 cores): 0.9ms.  Yes, we get a proportional speedup!

Parallel FOR loop (32 cores): 0.45ms

Stock: 1.2ms

Multicore toolkit: 0.04ms to 0.3ms (High jitter)

 

(this is on a 16 core Xeon machine (Dual E5-2687w, 3.1GHz), 32 virtual cores))

Message 7 of 14
(3,456 Views)

When I plot the time differences I see large spikes early in the process but only on the Multiply and Sum loop. This pattern persists even if loop parallelism is turned off. 

 

Times.png

 

I have no idea about what is going on.  When I increase the number of iterations I often see more spikes in the timing but the largest is almost always early in the process.  With larger iteration counts, I sometimes see regions where the matrix multiply loop time also jump up. Not nearly as high and not consistently early.

 

Lynn

 

0 Kudos
Message 8 of 14
(3,441 Views)

johnsold wrote:

I have no idea about what is going on. 


Is there anything els running on the computer?

0 Kudos
Message 9 of 14
(3,438 Views)

Several things, but why would they tend to delay one loop over the other and only early in each run?

 

Lynn

0 Kudos
Message 10 of 14
(3,435 Views)