From 04:00 PM CDT – 08:00 PM CDT (09:00 PM UTC – 01:00 AM UTC) Tuesday, April 16, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA Block Ram FIFO - Resource usage

I'm modifying code running on a Virtex-5 FPGA card (PXIe-7965R).

 

We are approaching saturation of the device with our current functionality and I'm spending a little time looking through the code tryingt o find places where we can juggle resources in order to fit more on the device.  One area we are really under-using at the moment is Block RAM.  Our design uses only 45 of 244 Block RAM units.  We also only use 65 of 640 DSPs.

 

While the use of DSPs is reserved for very specific operations, I was thinking I could use the Block RAM strategically to implement certain pipelining operations we utilise in a different way and offload data to block RAM isntead of using Registers and LUTs for that.  The problem is that such FIFOs would be really shallow (maybe 8x 48 bit or so) meaning that the overhead of the FIFO function is an important factor to bear in mind when considering such a change.

 

The only document known to me which lists the resource requirements of various functions of this card (http://www.ni.com/white-paper/7727/en/) tells me that a simple Block RAM FIFO costs 300 Registers and 331 LUTs (Values were measured with LabVIEW 8.5 !!).  My problem is that this seems already to be very close to what my data requires when implemented in fabric.  I'm also aware that each Block RAM has a limited width so that exceeding this will probably end up using more than one FIFO and multiply the resource requirement accordingly.

 

I know that the Virtex 5 has built-in FIFO circuitry which can be utilised under certain conditions.

 

What is the resource requirement for a Block RAM FIFO implemented with and without the built-in FIFO circuitry and how do I make sure my code makes use of this?

 

Shane.

Message 1 of 10
(6,700 Views)

Hmm, I'm trying to work up an example so that I can benchmark this but I'm seeing weird results.

 

I have generated sub-VI as illustrated below.  This should theoretically use 4 BRAM FIFOs (1023 element U32 no arbitration).  I place up to 32 of these in my main VI and compile.

 

BRAM FIFO Sub-VI.png

 

The iteration counter propagates through all the FIFOs (4 in total) until it gets to a Register which is read on the top-level diagram and all values are put on the FP.

 

The code compiles, and it runs.

 

The weird part is the resource usage.

 

BRAM FIFO Compilation results.png

 

Either it's not counting the BRAM FIFOs as Block RAM or something's being really really well optimised to remove my nonsensical operations.

 

I know the vhd files are being created for the correct number of built-in FIFOs, but the resource usage simply cannot be correct here.

 

Any comments?

Message 2 of 10
(6,678 Views)

Whoah, I just found something interesting.....

 

I decided that the most likely cause was that the Xilinx compiler was counting the Block RAMs incorrectly.  I increased the number of instances of my silly FIFO operations to a point where it was actually above the number of available units.

 

It wouldn't compile.  The Compilation window says that I have used 2 of 244 Block RAMs but a peek in the log file says that the compilation failed because there weren't enough Block RAMs.

 

ERROR: Pack:2310 - Too many comps of type "FIFO36_EXP" found to fit this device.
ERROR:Map:237 - The design is too large to fit the device.  Please check the
   Design Summary section to see which resource requirement for your design
   exceeds the resources available in the device. Note that the number of slices
   reported may not be reflected accurately as their packing might not have been
   completed

 

 

This would seem to be a bug in the Xilinx compiler where the number of used Block RAMs is not taking all forms of Block RAM usage into account.  I'm pretty sure it's not a LabVIEW bug because I can find the exact same "2 of 244 Block RAM" info int he resource utilisation statistics within the Xilinx log itself.

 

Oh, I'm using LV 2012 SP1.

Message 3 of 10
(6,655 Views)

Reading the deocumentation for the LogicCore FIFO generator 8.4 (The version supplied with my version of LV FPGA) it references the following document:

 

http://www.xilinx.com/support/documentation/ip_documentation/fifo_generator_ds317.pdf

 

where several FIFO configurations are listed with SIGNIFICANTLY lower resource usage than I thought.

 

For example, a built-in Block RAM FIFO requires for a 512 deep 72-bit wide FIFO36 FIFO with built-in implementation requires a mere 0 LUTs and 2 FFs and 1 Block RAM (versus 300 LUT and 331 FFs and 1 Block RAM according to the NI document mentioned earlier).

 

I really wish these resource utilisation statistics were more up to date because meking design choices based on really outdated information is error-prone and really inefficient.

 

Shane.

0 Kudos
Message 4 of 10
(6,632 Views)

Hi Shane,

 

sorry that it took so long to get back to you.

First of all: Thanks for the clearly structured analysis and nice documentation.

 

I filed a CAR (corrective action request) asking to update the documenation. If you want to get back to that, just answer to the thread then I should be informed automatically about the thread activity.

 

One thing: Could you upload the test project, where you did the testing in case someone wants to have a look at your code?

 

Best regards

Christoph

Staff Applications Engineer
National Instruments
Certified LabVIEW Developer (CLD), Certified LabVIEW Embedded Systems Developer (CLED)


Don't forget Kudos for Good Answers, and Mark a solution if your problem is solved
0 Kudos
Message 5 of 10
(6,558 Views)

I'll pass on the test VIs later.

 

Shane.

0 Kudos
Message 6 of 10
(6,555 Views)

Here's the file.

 

Just try compiling the x64 instance test.

 

In the cub-VI I had a clock defined in my project which was simply double the base clock to make sure that the Xilinx compiler wasn't optimising away my FIFOs.  Seems like that fear was unwarranted, but still....  It may be neccessary to re-assign a clock within the sub-VI.

 

I tried compiling on a VIrtex 5 Target with 244 Block Ram units.  This failed due to NRAM_FIFO Overmapping even though the official Xilinx resource usage was kind of small (not over 100% anywhere).

 

Shane.

0 Kudos
Message 7 of 10
(6,548 Views)

OK, a bit of thread necro required.  Due to other compilation problems I've been having I started digging into the Xilinx Logs for other information and found out that the number of BRAMs used is actually reported correctly int he Xilinx log, but LV displays numbers which I can't fathom the source of.

 

I think this needs to be escalated to a bug, because it does seem to be a mistake of LV's parsing of the Xilinx log after all.

0 Kudos
Message 8 of 10
(4,868 Views)

Reported.

 

CAR created: 186707

0 Kudos
Message 9 of 10
(4,832 Views)

Apparently, the CAR has been rejected because it's not viewed as a bug....  Well, I just got a compilation error telling me that my design couldn't place 3 instances of BRAM.  LV wants me to think that only 79 of 244 BRAMS are actually being used.  Somehow I'm not sure that's correct.

 

LV takes values from an XML file produced by Xilinx and apparently that's where false values are reported.  If, however, I look into the Xilinx.log file (Which is displayed during compilation) I find the following:

 

Given the following, what is my BRAM usage really?

 

Device Utilization Summary:

   Number of BUFGs                          20 out of 32     62%
   Number of LOCed BUFGs                     1 out of 20      5%

   Number of BUFGCTRLs                       5 out of 32     15%
   Number of DCIRESETs                       1 out of 1     100%
   Number of DCM_ADVs                        7 out of 12     58%
   Number of LOCed DCM_ADVs                  1 out of 7      14%

   Number of DSP48Es                        65 out of 640    10%
   Number of FIFO36_72_EXPs                 14 out of 244     5%
   Number of FIFO36_EXPs                     3 out of 244     1%
   Number of IDELAYCTRLs                     1 out of 22      4%
   Number of LOCed IDELAYCTRLs               1 out of 1     100%

   Number of ILOGICs                        41 out of 800     5%
   Number of External IOBs                 260 out of 640    40%
   Number of LOCed IOBs                    260 out of 260   100%

   Number of External IOBMs                 19 out of 320     5%
   Number of LOCed IOBMs                    19 out of 19    100%

   Number of External IOBSs                 19 out of 320     5%
   Number of LOCed IOBSs                    19 out of 19    100%

   Number of IODELAYs                       18 out of 800     2%
   Number of ISERDESs                       17 out of 800     2%
   Number of OLOGICs                       110 out of 800    13%
   Number of PLL_ADVs                        1 out of 6      16%
   Number of RAMB18X2s                      22 out of 244     9%
   Number of RAMB18X2SDPs                   25 out of 244    10%
   Number of RAMB36SDP_EXPs                 12 out of 244     4%
   Number of RAMB36_EXPs                    43 out of 244    17%
   Number of RAMBFIFO18_36s                  5 out of 244     2%
   Number of STARTUPs                        1 out of 1     100%
   Number of Slices                      12718 out of 14720  86%
   Number of Slice Registers             26342 out of 58880  44%
   Number used as Flip Flops             26339
   Number used as Latches                    0
   Number used as LatchThrus                 3

   Number of Slice LUTS                  26218 out of 58880  44%
   Number of Slice LUT-Flip Flop pairs   36820 out of 58880  62%
Message 10 of 10
(4,690 Views)