From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How fast does Labview perform integer, float and matrix calculations?

What is the speed of a modern CPU? Ok it is 3Ghz, one to four cores etc. but how much work can I get done in one clock cycle at this speed?

 

This kind of questions have been in the back of my head for some time, so I decided to test it. Of course using Labview since it so easy to to use!

I wrote a really simple program that performed 10000 adds 3 to an 32 bit integer in a loop. It then repeats this until 1000ms has elapsed and calculates the MegaInts (or whatever I should call it). I then did the same thing by adding 3 to a matrix of 10000 32bit ints.

 

On and dual core 3Ghz PC from 2005 I got 261 Mints in a loop, and 1400 Mints in matrix. Only one core seems to be used, CPU load aorund 50%. So every two clock cycles or so I get some work done in a matrix calculation. Also older 1,8Ghz Celeron computers performs about the same in this test, even though they feel very slow in normal use.

 

On a newer 4/8 core 2,8Ghz PC from this year I got 830 Mints in a loop, and 5900 Mints in matrix. Only one core seems to be used, CPU load aorund 12%. So actually more than one add per clock cycle!

 

This indicates that either the compiler is smarter than me and does not to all integer adds, or the CPU is smart and do them very fast!

 

See this as not the complete benchmark of Labview in integer, float and matrix calculations, but as a teaser to make you do your own tests.

 

Play around and have fun!

 

 

Message 1 of 8
(3,115 Views)

I suggest you read up on the changes on Microarchitecture of modern processors over tha last few years.

 

Multiple adds per cycle has been around for a while on the processor level I believe.  I certainly know that modern CPUs can multiply TWO doubles or FOUR singles in a single operation.

 

This is more a test of CPU architecture than a test of LabVIEW although the fact that the CPU differences are visible hints at a decent compilation behind the scenes.

 

Shane.

0 Kudos
Message 2 of 8
(3,101 Views)

A standard loop only runs in 1 thread, did you activate loop parallellization?

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 3 of 8
(3,078 Views)

There are a few benchmark utilities out there for LabVIEW.

 

Here is one http://www.ni.com/white-paper/10341/en/

Machine Vision, Robotics, Embedded Systems, Surveillance

www.movimed.com - Custom Imaging Solutions
0 Kudos
Message 4 of 8
(3,065 Views)

Yameda, nice suggestion to try loop parallellism! To try a hands on test program is always fun.

 

I found som interesting things regarding loop parallellization, if I configure it with right-click on the loop I get the expected behaviour, about double speed (actually a little more) on a dual core CPU. But when I use the P to set the number of parallell loops I get a different behaviour, my timing code stops working and the result looks as the CPU speed is very high. Probably because I have not written a true parallell loop. Also activating parallellizm and chose 1 as the number of loops seems to give the same result. The 'i' in the outside look gets very high for some reason.

 

If you like to comment on the code I attach a screenshot.

0 Kudos
Message 5 of 8
(3,031 Views)

@Yamaeda wrote:

A standard loop only runs in 1 thread,[...]


That is not true.

A standard loop follows the clumping algorithm during compilation. Clumps can be distributed on any number of threads.

 

So, a standard loop CAN be distributed between several threads.

There are, however, some settings and structures which will result in a single thread per loop:

- Setting the calling VI to "subroutine" priority (not recommended)

- Using a Timed Loop instead of a standard one

- AFAIK: Containing the loop in an InPlace Element Structure with "Data Value Reference" border nodes

 

There is at least one additional option, but it is so unlikely (as it messes up with most of LV!) that i will not point it out here.....

 

Norbert

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
Message 6 of 8
(3,013 Views)

@Norbert_B wrote:

@Yamaeda wrote:

A standard loop only runs in 1 thread,[...]


That is not true.

A standard loop follows the clumping algorithm during compilation. Clumps can be distributed on any number of threads.

 

So, a standard loop CAN be distributed between several threads.

There are, however, some settings and structures which will result in a single thread per loop:

- Setting the calling VI to "subroutine" priority (not recommended)

- Using a Timed Loop instead of a standard one

- AFAIK: Containing the loop in an InPlace Element Structure with "Data Value Reference" border nodes

 

There is at least one additional option, but it is so unlikely (as it messes up with most of LV!) that i will not point it out here.....

 

Norbert


Why would we have parallellization options for a loop if it's already multithreaded? I'd love if loops would automatically parallellize.

If you mean the content inside the loop, then it'll follow ordinary optimization and use several threads if possible. In this case OP mentioned 1 cpu was at 100%. 🙂

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 7 of 8
(3,004 Views)

Clumping can distribute parts of the algorithms which are already "distinctive" available. It does not duplicate code.

 

Parallizable For Loops duplicate code and chunk data to increase parallelism of the loop in order to improve CPU usage and to decrease overall execution time. So this is only possible if the data can be seperated in individual chunks.

Using shift registers makes the data depending on previous iterations, so chunking is not possible; For Loop cannot be parallized hence.

 

hope that this is understandable at all (reads a little like marsian i feel.......)

Norbert

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
Message 8 of 8
(2,997 Views)