Feedback nodes / delays and Resource Usage on FPGA

Intaris · ‎11-18-2014

Again it's time for an exotic FPGA semi-noob question from myself.

This has been bugging me for a long time:

When implementing a delay stage on a Virtex-5 target, we have a few options available.

Feedback nodes : Uses LUTs. Virtex 5 has 6-input LUTs. Does this mean that a Feedback node with delay 1 requires the same resources as a Feedback node with delay 6 and a Feedback node with delay 7 requires double the LUTs as one with delay 6?
Example: A single unit delay feedback node for a U16 requires 16 LUTs. What is the LUT usage for 6, 7, 9 delay?
BRAM : Uses few LUTs and Registers. I reckon I understand this one.
Discrete Delay : Can't be used as feedback but is more efficient than feedback nodes? It is written in the help that feedback nodes with the reset support disabled CAN be implemented as SRLs allowing the compiler to choose th ebest option whereas the Discrete Delay primitive forces an SRL Is the SRL implemented using LUTs?.

Which of these options is recommended for which purpose. We're really filling our chip and need to start considering such aspects of number storage.

Sorry for the over-reaching vague questions again.

On the other hand, being on a steep learning curve is actually almost thrilling. Every bit of information helps me learn so thanks for that in advance.

Shane

Intaris · ‎11-18-2014

Here's an image to illustrate the comparisons I'm interested in.

For very large delays obviously the BRAM solution is very good but in order to judge the best solution for a given data size/delay it would be nice to have some numbers or rules of thumb for the other methods.

JLewis · ‎11-19-2014

Hi Shane,

Great questions!

The number of inputs is only indirectly related to the supported delay. The V-5 and above CLBs (Configurable Logic Blocks) can be configured as dedicated shift registers with delays up to 32 in a single LUT per bit. The main restriction is that these shift registers are not resettable, so you only get this implementation when configured without an initialization value. Delays above 32 can be efficiently implemented in multiple LUTs (ie, 1 LUT per 32 delay). These shift registers are known as SRL16 or SRL32, depending on the target family.
Correct. Sometimes clock rate may be a concern here.
Discrete Delay maps to the same shift register implementation as feedback nodes if the reset condition is met. Otherwise, the main difference is that the Discrete Delay exposes the dynamic delay feature available in the hardware shift registers and, as you noted, can't be used in a feedback cycle. If neither of those considerations is a factor in your design, it's just a matter of preference.

This document from Xilinx contains the keys to the kingdom, as far as what hardware capabilities are available: http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf

Jim

Intaris · ‎11-19-2014

@JLewis wrote:

The number of inputs is only indirectly related to the supported delay. The V-5 and above CLBs (Configurable Logic Blocks) can be configured as dedicated shift registers with delays up to 32 in a single LUT per bit. The main restriction is that these shift registers are not resettable, so you only get this implementation when configured without an initialization value. Delays above 32 can be efficiently implemented in multiple LUTs (ie, 1 LUT per 32 delay). These shift registers are known as SRL16 or SRL32, depending on the target family.

So does this mean that on a LUT-basis, a shift register (with the reset conditions met) with a delay between 1 and 32 costs the same amount of resources? 33-64 delay costs twise that of a single delay? Is this correct? I think I need some benchmarking code.....

@JLewis wrote:

Discrete Delay maps to the same shift register implementation as feedback nodes if the reset condition is met. Otherwise, the main difference is that the Discrete Delay exposes the dynamic delay feature available in the hardware shift registers and, as you noted, can't be used in a feedback cycle. If neither of those considerations is a factor in your design, it's just a matter of preference.
This document from Xilinx contains the keys to the kingdom, as far as what hardware capabilities are available: http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf

That's kind of what I thought.

@JLewis wrote:

Hi Shane,

Great questions!

Well thank you, Thanks for the answers.

JLewis · ‎11-19-2014

@Intaris wrote:

So does this mean that on a LUT-basis, a shift register (with the reset conditions met) with a delay between 1 and 32 costs the same amount of resources? 33-64 delay costs twise that of a single delay? Is this correct? I think I need some benchmarking code.....

That's correct. The typical pattern would be to use uninitialized feedback nodes in all data paths, saving the initialized nodes for control logic that handles the data valid computations to deal with priming and flushing of those shift registers on startup or reset.

WalterRebsch · ‎01-16-2019

Sorry for the stupid question, but I'm trying to figure out what that the 2 little elements in the block diagram are in the post above. I put red arrows pointing at the elements I'm interested in:

What are those little guys? I can't seem to find them in the palette.

Thanks!

Intaris · ‎01-16-2019

Feedback nodes. On the same pallette as while loops and such.

WalterRebsch · ‎01-16-2019

Ok, thanks. I've never tried reversing them and adding a delay. That's why they looked different. Duhhh... I knew it was simple and right in front of me ... 🙂

LabVIEW

Feedback nodes / delays and Resource Usage on FPGA

Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA

Re: Feedback nodes / delays and Resource Usage on FPGA