Questions about the High Throughput Math Functions

westgate · ‎02-09-2011

Hello,

i am just trying to unterstand what advandages the High Troughput Math Funtions have. So i have to ask some Questions.

I always talk about beeing inside a SCTL.

1. In the Image you can see four Add Functions. One with U32 which should use more Ressouces than the one with the U16Datatype which uses mor than the U8. But does my FXP High Througput Math Function use less Ressources than the U8 Version?

2. Which of this four Add Functions will take less time for Execution?

3. If I would Add two 32bit Numbers one with the normal Add and one with the High Throughput Add. Which of the Functions will use less Resources and which will be faster?

4. How would it bee if i had a Multiplication instead? When i unterstand the concept of a Multiplication right it will be done with a DSP48E. This Logic Block is able of Multiplying a 25bit Number with a 18bit Number. So the U32 Multiply will use 2 DSP48Es and the other three Functions would use one DSP48E.

I guess the U32 Version will have the slowest Execution?

Whats about the other three will their execution speed be equal or will the Versions with smaller Datatypes be faster?

With kind regards

Westgate

westgate · ‎02-11-2011

Hello,

maybe it is important that we use a PXIe-7962R Flex-Rio Card.

westgate

JLewis · ‎02-16-2011

I don't see a big rush to answer this, so I'll give it a shot:

1. The HT version uses less resources, but only because it is configured with the smallest data types. You should get exactly the same results with the same data types and an Add function. The only difference with the HT version is the ability to specify an output register, and the handshaking signals that account for that delay. IF the add is implemented in a DSP48, the integrated register can result in better timing, but in practice it is usually equivalent to an Add function followed by a feedback node.

2. The actual delay through an add is proportional to the number of bits, where the critical path is the sequentially computed carry chain. So you could run the last one at the highest clock rate. The FPGA has dedicated fast carry logic, so the difference isn't too significant.

3. The first one will be VERY slightly smaller and faster, just because you're computing one extra output bit on the second one.

4. I would expect the speed to depend only on the number of DSP48s used, so the last 3 should be similar. You'd be likely to see different results in practice, though, due to routing differing numbers of bits to registers for the indicators. This assumes you're not taking advantage of any of the pipelining configuration options in the HT Multiply. Those options, and the associated handshaking signals, are really what differentiates the HT versions from the regular numeric functions. They allow you to achieve higher clock rates and throughput at the expense of latency (ie, it will take more clock cycles to produce a valid result but you can get more data through the function in a given amount of time).

Caveats: All your examples have constant inputs, so the LabVIEW compiler and/or Xilinx tools can and will optimize them to no ops. Small multiplies, multiplies with one constant input, or those just larger than 25x18 may also use some non-DSP48 logic for all or part of the implementation. Note that the HT palettes provide a DSP48E function in case you want control over exactly how a multiply and/or add gets implemented. Placing and routing can result in unexpected behaviors, so estimating timing is much more difficult than simply adding up component delays.

westgate · ‎03-14-2011

@JLewis: thank you for your reply it helped us alot to unterstand the HT functions!

LabVIEW

Questions about the High Throughput Math Functions

Questions about the High Throughput Math Functions

Re: Questions about the High Throughput Math Functions

Re: Questions about the High Throughput Math Functions

Re: Questions about the High Throughput Math Functions