11-18-2014 05:17 AM
Again it's time for an exotic FPGA semi-noob question from myself.
This has been bugging me for a long time:
When implementing a delay stage on a Virtex-5 target, we have a few options available.
Which of these options is recommended for which purpose. We're really filling our chip and need to start considering such aspects of number storage.
Sorry for the over-reaching vague questions again.
On the other hand, being on a steep learning curve is actually almost thrilling. Every bit of information helps me learn so thanks for that in advance.
Shane
11-18-2014 05:54 AM - edited 11-18-2014 05:56 AM
Here's an image to illustrate the comparisons I'm interested in.
For very large delays obviously the BRAM solution is very good but in order to judge the best solution for a given data size/delay it would be nice to have some numbers or rules of thumb for the other methods.
11-19-2014 09:16 AM
Hi Shane,
Great questions!
This document from Xilinx contains the keys to the kingdom, as far as what hardware capabilities are available: http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf
Jim
11-19-2014 09:33 AM
@JLewis wrote:
- The number of inputs is only indirectly related to the supported delay. The V-5 and above CLBs (Configurable Logic Blocks) can be configured as dedicated shift registers with delays up to 32 in a single LUT per bit. The main restriction is that these shift registers are not resettable, so you only get this implementation when configured without an initialization value. Delays above 32 can be efficiently implemented in multiple LUTs (ie, 1 LUT per 32 delay). These shift registers are known as SRL16 or SRL32, depending on the target family.
So does this mean that on a LUT-basis, a shift register (with the reset conditions met) with a delay between 1 and 32 costs the same amount of resources? 33-64 delay costs twise that of a single delay? Is this correct? I think I need some benchmarking code.....
@JLewis wrote:
Discrete Delay maps to the same shift register implementation as feedback nodes if the reset condition is met. Otherwise, the main difference is that the Discrete Delay exposes the dynamic delay feature available in the hardware shift registers and, as you noted, can't be used in a feedback cycle. If neither of those considerations is a factor in your design, it's just a matter of preference.This document from Xilinx contains the keys to the kingdom, as far as what hardware capabilities are available: http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf
That's kind of what I thought.
@JLewis wrote:
Hi Shane,
Great questions!
Well thank you, Thanks for the answers.
11-19-2014 10:03 AM
@Intaris wrote:
So does this mean that on a LUT-basis, a shift register (with the reset conditions met) with a delay between 1 and 32 costs the same amount of resources? 33-64 delay costs twise that of a single delay? Is this correct? I think I need some benchmarking code.....
That's correct. The typical pattern would be to use uninitialized feedback nodes in all data paths, saving the initialized nodes for control logic that handles the data valid computations to deal with priming and flushing of those shift registers on startup or reset.
01-16-2019 03:49 PM
Sorry for the stupid question, but I'm trying to figure out what that the 2 little elements in the block diagram are in the post above. I put red arrows pointing at the elements I'm interested in:
What are those little guys? I can't seem to find them in the palette.
Thanks!
01-16-2019 04:53 PM
Feedback nodes. On the same pallette as while loops and such.
01-16-2019 04:59 PM
Ok, thanks. I've never tried reversing them and adding a delay. That's why they looked different. Duhhh... I knew it was simple and right in front of me ... 🙂