Performance Double versus Single Precision

Michael.Proctor · ‎04-27-2017

Maybe a very naive question

If LabVIEW Real Time is 32 bit but running on a 64 bit processor such as the Atom in a NI-9039, what is the performance of SGL calculation versus DBL.

Initially we coded all in SGL to have maximum performance but several native NI vi such as those for matrix manipulation are DBL only, so we have conversions SGL to DBL and back.

So now I am thinking if the CPU is 64 bit perhaps there is not such a bit performance loss running all the calculation intensive algorithms in DBL precision.

It would be great to have some clarification on 32 bit OS versus 64 bit CPU when it comes to using SGL or DBL.

thanks

Michael

tyk007 · ‎04-27-2017

@Michael.Proctor wrote:

Maybe a very naive question

If LabVIEW Real Time is 32 bit but running on a 64 bit processor such as the Atom in a NI-9039, what is the performance of SGL calculation versus DBL.

Initially we coded all in SGL to have maximum performance but several native NI vi such as those for matrix manipulation are DBL only, so we have conversions SGL to DBL and back.

So now I am thinking if the CPU is 64 bit perhaps there is not such a bit performance loss running all the calculation intensive algorithms in DBL precision.

It would be great to have some clarification on 32 bit OS versus 64 bit CPU when it comes to using SGL or DBL.

thanks

Michael

EDIT: Hopefully this clarifies your question better:

I don't have any reference handy but my assumptions is that the 32-bit LV version would have been compiled based on using 32-bit CPU registers to perform all arithmetic operations. For a 32-bit CPU, data would need to be moved in and out of these registers more than once to perform the calculations. If you are running 32-bit LV on a 64-bit CPU, then likely the words are expanded to fit the 64-bit width; thus nothing is gained by using SGL versus DBL.

Moving to a 64-bit processor and running a 64-bit OS has other implications - your available virtual memory increases per process but processes typically consume more memory anyway (eg. now we have 64-bit pointers); this can sometimes reduce the performance in other areas.

The bigger issue may be loss of precision over multiple data mutations (calculations) with the reduced bit resolution of 32-bit, regardless of the CPU or OS running the application.

Premature optimization is the root of all evil, as the saying goes, and application bottlenecks are often in places you didn't think of. My suggestion would be not to concern yourself with this until you have an issue - then use the profiling tools to see whether this really is a problem.

nanocyte · ‎04-27-2017

FYI, the 32 bit vs 64 bit is referring to memory addressing. I don't think it's directly related to the size of the math registers.

You're likely right that there's some performance benefit to using sgl. But, yeah, if you have to convert it back, there's a small penalty there. So, I'd keep doing what you're doing unless there's a compelling performance problem and then, write the code both ways and compare which is faster and by how much. Here's some general benchmarks I found: http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/

ks30 · ‎05-09-2017

Hi Michael,

From a high-level perspective if you aren’t encountering any performance issues with your current configuration then I would not recommend making any changes. To my knowledge, we currently do not have specific benchmarks comparing single and double precision performance on our ARM targets.

In regards to benchmarking, although general benchmarks could serve as a guide, benchmarking execution using your specific algorithm would probably provide you with the most value. Would it be feasible for you to benchmark your algorithm(s) using SGL and DBL precision for a comparison?

As a general guideline, we recommend a maximum 80% CPU usage when a Real-time application is running on the target. Are you consistently seeing CPU usage in excess of 80%?

Regards,

Kyle S.

Michael.Proctor · ‎05-09-2017

Hi all,

thank you all for your replies.

The solution to my question is that data register sizes and memory allocation are different.

So I have marked that answer as the solution.

BTW the linux used on the cRIO 9030 and cRIO 9039 is 64 bit if one runs the linux command.

uname -a

The processors are 64 bit Intel Atoms.

I am guessing the Labview RT is still 32 bit. And that would determine the addressing space available.

Our application on the 2 core cRIO 9030 is at 65 - 75 % . Good to know we are just below the considered maximum of 80%. The cRIO 9039 runs at about 20 % so its usage is low, however the main SCTL - the core of the application, is only 33% faster at executing - from the last iteration time metric, so we do not get all the advantage of the extra two cores, but given the way it is programmed, this is expected.

We have some transformations that can benefit from the precision of 64 bit and also it seems all the matrix transformation primitives supplied by NI are 64 bit, so I converted most of the vis to be DBL to avoid too much time wasting converting between SGL and DBL.

Most of the application speed up I observe has been to set the calculation intensive VIs to be at 'subroutine' priority.

Thanks again for all your input.

Michael

Real-Time Measurement and Control

Performance Double versus Single Precision

Performance Double versus Single Precision

Re: Performance Double versus Single Precision

Re: Performance Double versus Single Precision

Re: Performance Double versus Single Precision

Re: Performance Double versus Single Precision