LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

32 bit vs 64 bit for loops


altenbach wrote:
Since both FOR loops depend entire on diagram constants, they will probably be folded at compile time and benchmarking might again be impossible.

Sure, you're right. It was just too simple illustration.

 

Andrey.

 

0 Kudos
Message 11 of 16
(1,806 Views)

so the question remains, how does labview handle the different datatypes with regard to performance?

 

The difference in architecture is bound to make a difference, the difference in memory allocation is bound to make a difference.

-Regards

eximo
_______________________________________________
UofL Bioengineering M.S.
Neuronetrix

"I had rather be right than be president" -Henry Clay
0 Kudos
Message 12 of 16
(1,791 Views)

eximo wrote:

so the question remains, how does labview handle the different datatypes with regard to performance?

 


Sorry, I was misunderstood your initial question. If you interested what is difference between 32 / 64 bit handling regard to performance, then the first answer - in general you will get penalties with 64 bit. But of course it depends from the task. If you really need 64 bit integer arithmetic, then you may get advantages in comparison with your own solution. But in general 64 bit will slow down your code (as you can see in the  benchmarks above).

 

To be honest, for deep understanding we should explore LabVIEW - generated code.

 

Before continue I'll recommend to you to read the following articles:

Inside LabVIEW - How the Compiler Works

Assembly Review 

 

 

Well, for example, I will prepare pretty simple VI with 32 bit increment:

 

bild 1.png

Now I will disassemble the generated code (I will not explain how I'll do that, but hopefully I'll not break the license agreement, because its my code):

 

bild 2.png

 

in ebp+430h — pointer to my I32 variable, inc edi - is increment. Pretty easy.

 

Now I'll change type to 64 bit:

 

bild 3.png

The following code was generated:

 

 bild 4.png

 

As you can see, LabVIEW need more commands for increment 64 bit variable. inc command was replaced with add command, and LabVIEW will handle 64 bit variable in two steps: because additional 4 bytes will be transferred in mov [esi+4], eax follow to mov [esi], eax.

 

What about performance? No magic:

 

bild 5.png

 

64 bit code more than twice slow in comparison with 32 bit.

 

One more point about LabVIEW code generator. Usually the code generated with "traditional" C compiler faster in comparison with LabVIEW. I using this fact for optimizing some bottlenecks in my applications.

 

For example, I will add one more increment here:

 bild 6.png

What is expected? One more inc edi command? No, see below: 

 

bild 7.png

 

at address 000000BC we have in edi our variable. Then value will be moved out, then moved in back, therefore three mov commands between two increments.

You will not get much penalties with code above (at least on my PC this code executed nearby the same time as single increment), but I believe, that more optimal assembler code is possible here.

 

Now back to your cycles with I32 / I64 counters.

 

The while loop like this:

 

bild 8.png 

 

will be compiled to code like this (I hope, the body of the cycle between AC - F7 extracted correctly):

 

bild 9.png

 

and 64 bit code like this:

 

bild 10.png

 

will be compiled to

 

bild 11.png

 

 

Just compare amount of commands passed to CPU and you will understand why "32 bit" loop is faster than "64 bit".

 

And finally, if you need to speeding up your application, then first determine the "bottlenecks" in your code, and then look for faster solution.

General solutions for performance optimization:

1. Rewrite part of LabVIEW code with C (using good compiler such as Intel C++ compiler), then call this code from DLL

2. Look for the parts of the code which can be executed in parallel ans use multi core advantages

3. Change algorithm (for example, if you have sorting, look for different solutions, etc)

 

Other questions? 

 

Andrey.

 

PS

The code above was generated under WinXP 32 bit/LabVIEW 32 bit. Pretty sure that the code will be the same under Windows 7 64 bit/LabVIEW 64 bit, but it was not tested yet. Probably will do that at weekend.

 

Message 13 of 16
(1,765 Views)

I know this is a long-deceased thread, but I've been looking into the differences between 32-bit and 64-bit LabVIEW and was curious as to the final result of the disassembled code for 64-bit LabVIEW.

 

I was under the impression that part of the reason why the 64-bit calculations done in 32-bit LabVIEW were more than two times slower than the 32-bit calculations is because for every single 64-bit calculation, LabVIEW has to queue up two 32-bit calculations (it can't reference more than 32 bits at a time, so it has to split it up into two 32-bit calculations). Under this reasoning, the 32-bit and 64-bit calculations should take approximately the same amount of time in 64-bit LabVIEW.

 

When I tried this on 64-bit LabVIEW 2015 (64-bit Windows 7, Intel Xeon CPU E5-1620 v3 @3.50GHz, 8.00 GB RAM), this theory didn't hold up: the 64-bit increment loop was still taking just above twice the time that the 32-bit increment loop! (Please see the default values saved in the attached 64-bit LV2015 VI.) Does anyone have any input as to why this might be? Is it just that 64-bit LabVIEW only makes you able to access more memory, and that the calculations themselves don't utilize the extra bit width?

0 Kudos
Message 14 of 16
(1,108 Views)

This particular application might be (most likely is) memory bandwith limited. This means in order for this algorithme to work the memory size which has to be accessed and copied is the most limiting factor of speed. Since a 64 bit value is double the size of a 32 bit value, a doubling of execution time doesn't seem that strange! Possibly an Inplace Structure inside the loop might make a difference here, though I would not bet on that, the LabVIEW DFIR optimizer is pretty good in optimizing simple shift register operations into the equivalent of an inplace operation.

 

It would be different if the entire code could be fully located within the CPU registers, but while an empty loop theoretically could be optimized to such code, it is a very academic exercise and I don't see why the LabVIEW developers should spend lots of development time to optimize such a theoretical use case. If you intend to do an empty loop you probably just as well could use a simple delay, which will cost less CPU time and also burn less energy to do nothing.

Rolf Kalbermatter
My Blog
0 Kudos
Message 15 of 16
(1,055 Views)

Thanks for humoring me, Rolf. As you say, this whole thing is just an academic exercise for me to try to better understand 64-bit computing with the LabVIEW platform, and what potential benefits the 64-bit version may have over the 32-bit version. For now, I'll just be satisfied with the answer that the reason for the behavior observed in this particular exercise is due to the bottleneck being in memory access, not calculation.

0 Kudos
Message 16 of 16
(1,036 Views)