LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Why is RT matrix multiplication so much slower than Windows?

Solved!
Go to solution

Hello,

I ran the attached VI on both a cRIO-9035 and my Windows machine (A Lenovo ThinkPad) running LabVIEW 2020. All it does is find the average the timing cost of multiplying a random vector with a random square matrix over 20,000 times. With a vector size of 32, on the cRIO the average time was 21.88 microseconds, while on my Windows machine the cost was 2.04 microseconds. Does anyone know why there's such a huge disparity between the two times or if there's any way to speed up matrix multiplication on the RT target? The cRIO-9035 has a clock speed of 1.33 GHz while my Windows machine has a clock speed of 2.60 GHz, so certainly it isn't just the clock speeds.

VI.PNG

Thanks,

Daniel

0 Kudos
Message 1 of 8
(1,829 Views)

Hi Daniel,

 


@dncoble wrote:

The cRIO-9035 has a clock speed of 1.33 GHz while my Windows machine has a clock speed of 2.60 GHz, so certainly it isn't just the clock speeds.


Yes, it's not just clock speed!

Which CPU does your Windows computer use? Which kind/type/spec of RAM?

The cRIO uses an old Atom E3825 with just two cores…

Best regards,
GerdW


using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019
Message 2 of 8
(1,820 Views)

Thanks for the reponse!

 

To answer your question, the Windows machine has an Intel i7-6600U (2 cores), with 16 GB of DDR3 RAM. I expected that the laptop would be faster, but I doubt that hardware could account for a 10x difference in speed.

I'm interested in improving the performance on the real-time system, and it doesn't seem like the Intel Atom E3825 on the cRIO-9035 is performing as fast as I expect it could. So I guess my question was whether there is some library or setting that exists for LabVIEW windows which is accelerating matrix multiplication that doesn't exist for LabVIEW Real-Time. If so, is there a way I can change the real-time target to improve it's performance by maybe installing some module or changing some setting, or is this type of slow-down inherent to the real-time system?

 

Thanks,

Daniel

0 Kudos
Message 3 of 8
(1,754 Views)

Atom processors are simpler. Optimized for low power, and in this case only has 25% of the cache. 

 

Comparing the two shows the huge difference:

 

Screenshot_20221123-204650.png

While I don't have benchmarks for exactly these CPUs, here my list of benchmarks showing the dramatic differences.

 

 

0 Kudos
Message 4 of 8
(1,723 Views)
I wanted to benchmark the performance of my cRIO-9035 by running a 32-element matrix-vector multiply (see the attached picture). Using this code, an average multiply took 22.88 microseconds. However, if the Intel Atom inside the cRIO was working as fast as possible and performing one multiply per clock cycle, at 1.33 GHz it would take 32^2/1.33 GHz = 0.77 microseconds. So compared to this value, the speed from the program was 30 times slower (one multiply per 30 clock cycles). I doubt that it could actually ever run that fast, but when I ran the same code on my laptop, it was only ~5.6 times slower than the theoretical value, when using the PC's clock speed. I'm wondering what produces this disparity, is there linear algebra package on LabVIEW Windows that isn't on LabVIEW Real-Time, or is this slow-down an inherent quality of how NI Linux guarantees determinism?
 
This question is a repost for clarity.
f25a76e1-a384-4b2c-8f32-31a0236c75b0.png
Download All
0 Kudos
Message 5 of 8
(1,695 Views)

The CPU in the cRIO is a low power Atom chip and quite old (launched 2013). Clock speed is mostly useful for comparing CPUs of the same family or successors to see how instructions per cycle changed.

 

Atom CPUs lack a lot of the Core family CPU features, most notably AVX. If you want fast matrix multiplications you need vector instructions.

 

So until you run your benchmark on a Windows PC with a similar CPU and show that it is faster than the cRIO I will believe the CPU is to blame.

0 Kudos
Message 6 of 8
(1,672 Views)

Did you forget your last query with the same title that you made last week? https://forums.ni.com/t5/LabVIEW/Why-is-RT-matrix-multiplication-so-much-slower-than-Windows/m-p/426...

The facts still are the same!

Rolf Kalbermatter
My Blog
0 Kudos
Message 7 of 8
(1,646 Views)
Solution
Accepted by topic author dncoble

Thankfully somebody combined the duplicate threads.

 

There is nothing magic going on, one of the processors is simply much more powerful. Intel learned the hard way back in the days that clock speed is just a meaningless marketing term. When they came out with the 3GHz P4, people were wowed by the GHz, but performance was not that great. Later they came out with the core architecture which was significantly more powerful at lower clocks. (Analogy with cars: You cannot say that a 50cc motorcycle at 12000rpm is 3x more powerful than a 7 liter seventies muscle car at 4000rpm)

 

A modern processor takes about 7 CPU cycles for a multiply. But there is much more to it. My benchmarks have a column "S/GHz" which shows the single core performance normalized to the clock speed.

 

  • An old Intel Atom N450 can do about 3
  • A Intel P4 can do about 5.5
  • Older Intel core can do about 12-20
  • Modern Intel and AMD processors can do over 30!
Message 8 of 8
(1,582 Views)