What is faster? Dividing x modulo 2^n oder using bitshift x by n?

crossrulz · ‎01-20-2018

I did a similar benchmark to GerdW. Turned off debugging and made sure the indicators were not updated until after everything was done.

For the Quotient, the logical shift was about 7x faster. Granted, the shift will take longer the more bits you have to shift.

For the Remainder, the AND was about 3x faster.

Surprisingly, doing the Quotient and Remainder was only slightly slower than just getting the Remainder. I expected more of a jump.

Attached VI saved in 2016.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

JÞB · ‎01-20-2018

Sometimes you have to be very careful with benchmarks.

Now for the brain damage. Run vs Run Continuous shows a difference for the execution time of each Q&R operation. that tells me that as long as the vi does not go idle the buffer allocations remain allocated. Prove it with a request de-allocation node in the final FSS frame. Essentially all three Q&R operations become equivalent in time unless you specifically ask for deallocation or perform deallocation by returning to Idle state.

Be careful on those little gotchas.

you can even have a bit of fun inlining the code and calling it from a level up

watch that optimizer destroy everything but the hi-res timer calls 🙂

EDIT: Sorry Natasftw! but we can find the answer

"Should be" isn't "Is" -Jay

natasftw · ‎01-20-2018

1) I doubt it. Even if it did, I doubt it'll try to deconstruct the internals to find the parts that only matter to the one (which honestly, probably aren't any different than finding the other output. The two tie together rather well).

Probably makes a difference? Not likely. The margins you're seeing are OS jitter. That'll be your difference.

rolfk · ‎01-20-2018

@crossrulz wrote:

For the Quotient, the logical shift was about 7x faster. Granted, the shift will take longer the more bits you have to shift.

Actually I'm pretty sure that the execution time of a logical shift assembly instruction is independent of the number of bits it needs to shift. According to my 80386 programmers reference manual the SHL, SHR, SAR, SAL, ROL and ROR take one clock cycle when the operand is in a register and 3 when it is in memory.

Things get a little more complicated with the RCR and RCL operations when the shift is not 1.

I doubt that the LabVIEW to DFIR compiler will optimize Quotient calculations with a binary divisor into a logical shift right operation. What the LabVIEW to DFIR compiler however likely will do is to only generate the necessary code that is needed for the wired Q&R outputs. The underlying LLVM compiler on the other hand may then decide that a Quotient calculation with a constant binary divisor (obviously this can not be done for a parametric divisor that can change at runtime into any value including non-binary values) may then indeed decide to generate an SHR instruction. If it does or not may however depend on the used LLVM version, so not something that is easily definable with any authority.

But all that said, worrying about such micro optimizations if you aren't writing the actual LLVM compiler or another compiler, is definitely burning up time uselessly. If you finally have written your code and then determine that it is to slow for what it needs to do and determined with 99% of certainity that the culprit is exactly at this operation, then you can go and try to optimize the code further. But typically a single string resize done to often burns up lots and lots more CPU performance than several thousend times calculating the Quotient with the default division operation rather than an optimized shift operation, which only can be done if the divisor is a binary value anyways.

The gain you can get from this types of optimizations only really counts up if you perform this operations several million times inside a loop over and over again. For a few thousend executions the difference is almost impossible to measure and consequently totally irrelevant for the actual execution time of a program.

Rolf Kalbermatter
My Blog

crossrulz · ‎01-20-2018

@rolfk wrote:

@crossrulz wrote:

For the Quotient, the logical shift was about 7x faster. Granted, the shift will take longer the more bits you have to shift.

Actually I'm pretty sure that the execution time of a logical shift assembly instruction is independent of the number of bits it needs to shift.

You are likely correct. Thinking back, I saw the jump from 0 to 8 in the shift values. The 0 case was likely optimized to a simple wire.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

LabVIEW

What is faster? Dividing x modulo 2^n oder using bitshift x by n?

Re: What is faster? Dividing x modulo 2^n oder using bitshift x by n?

Re: What is faster? Dividing x modulo 2^n oder using bitshift x by n?

Re: What is faster? Dividing x modulo 2^n oder using bitshift x by n?

Re: What is faster? Dividing x modulo 2^n oder using bitshift x by n?

Re: What is faster? Dividing x modulo 2^n oder using bitshift x by n?