LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA Block ram data width vs resource usage (Virtex-5)

Time for another FPGA question.

 

According to the Virtex-5 User guide from Xilinx, the Block RAM implementation has a maximal data width of 72 bits.  Data widths wider than this require multiple units of Block RAM to be used in parallel.

 

If I have a data set which needs to store 424 bits of data per cycle, does this mean I need a minimum of 6 BRAM  (424 / 72 = 5.9) units?  I'm referring to a target-only BRAM memory, so data width is freely defineable beyond the 64-bit limit (using a typedef).

 

I have a single read and write for the Block RAM, so the basic data width is 72-bits, right?

 

Shane.

0 Kudos
Message 1 of 6
(4,392 Views)

Hi Shane?

 

I don´t know if I get what you mean, but the interface to BRAM that I think of is like creating a Memory Item in the Project, configuring it, for example

with a custom datatype boolean array with 424 bits, and n elements of this type, and then read/write this item in the code.

 

Is this working for you too?

If not, please tell me what exactly you need

 

 

with kind regards

 

Marco Brauner NIG AES

0 Kudos
Message 2 of 6
(4,361 Views)

Hey Shane,

 

I believe the simple answer to your question is yes. In the background, it looks like we just get the width of the data-type you're using, and plug that into the LogiCORE IP Block Memory Genorator with the Minimum Area Algorithm selected.

 

From the documentation:

Minimum Area Algorithm: The memory is generated using the minimum number of block RAM primitives. Both data and parity bits are utilized.

 

Minimum Area Algorithm
The minimum area algorithm provides a highly optimized solution, resulting in a minimum
number of block RAM primitives used, while reducing output multiplexing. Figure 3-6
shows two examples of memories built using the minimum area algorithm.

 

min area alg.PNG

 

Note: In Spartan-6 devices, two 9K block RAMs are used for one 1Kx18.
In the first example, a 3kx16 memory is implemented using three block RAMs. While it may
have been possible to concatenate three 1kx18 block RAMs in depth, this would require
more output multiplexing. The minimum area algorithm maximizes performance in this way
while maintaining minimum block RAM usage.
In the second example, a 5kx17 memory, further demonstrates how the algorithm can pack
block RAMs efficiently to use the fewest resources while maximizing performance by
reducing output multiplexing.

 

The Block Memory Generator generates memories with widths from 1 to 4096 bits, and with depths of two or more words. The memory is built by concatenating block RAM primitives, and total memory size is limited only by the number of block RAMs on the target device.

 

Hope that answers your question...

Cheers!

TJ G
Message 3 of 6
(4,346 Views)

Yup, that answers my question indeed.

 

This explains why I've had higher BRAM usage than expected int he past.  I hadn't paid attention to the data width and thus used more units of BRAM than I had originally anticipated.

 

Another thing I've learned.  I'm getting there.

 

So a side-effect of this is that moving to an arbitrarily large data width MAY cause some timing issues when used at high clock speeds?  Or is this overhead in the area of "Theoretically, but not really".

 

Thanks.

 

Shane.

0 Kudos
Message 4 of 6
(4,332 Views)

Yeah, I think this falls into the "theoretically". It looks like there will always be at least some output MUXing that happens for larger types, and the wider your input, the more need there is for MUXing. That means increased routing, and potential for increased congestion around the resources... but I don't know practiacally how it will affect your clock rates.

Cheers!

TJ G
0 Kudos
Message 5 of 6
(4,312 Views)

A pretty good rule of thumb when using fixed resources like I/O, Memory, and DSP blocks (multipliers) is to put in a few pipeline registers after the operation to give the compiler room to move logic around the chip. This is especially important when you think the size of the resource like the Memory in your example might require resources across separate columns of the FPGA because routing will start becoming an issue very quickly.

0 Kudos
Message 6 of 6
(4,269 Views)