LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Not sure I am understanding FPGA FIFO resource allocation

Solved!
Go to solution

Hello all,

 

I am having an issue with FPGA FIFOs.  I am performing a Target-to-Host DMA transfer and I have two different configurations that use the same FIFO but have dramatically different resource allocation.  In the first case shown below, the FIFO resides in a loop that will be called 1k times every time the while loop executes.  In this case, when mapping the slice LUTs used are about 15%

 

less resources.png

 

In the second case shown below, the FIFO is called once each time the while loop executes.  In this case, the estimated resource utilization with respect to slice LUTs is 107%. I guess I don't know why this behavior should be expected.  Can anyone explain this to me?  Thanks, Matt

 

more resources.png

0 Kudos
Message 1 of 11
(3,683 Views)

What's going on in the "A -> |A|" VI? In the first case, the compiler can see that the output is never used, and doesn't generate any code for that function. In the second case, it does need to generate code for that function because the output is wired to the FIFO. My guess is that it has nothing to do with the FIFO.

0 Kudos
Message 2 of 11
(3,662 Views)

Ah....there you go.  I keep being caught by the compiler.  I will have to check this when I get a chance.  I understand why the compiler would remove a case that might never execute, but why would it remove code that is in the loop?  The FPGA compiler is outsmarting me!

 

That being said, what is happening is the calculation of a determinant of a 3x3 matrix.  This would also explain why this jumps to something > 150% when the output of the bottom loop is wired to a FIFO. 

 

It does seem odd to me that these might consume so many resources given that it is all just simple integer math.  I guess I will have to turn my energy to streamlining these VIs.

0 Kudos
Message 3 of 11
(3,651 Views)

Often the best thing you can do to reduce FPGA utilization is move as much code as possible into single-cycle timed loops. Even if those loops only run for a single iteration within the larger loop, it will probably still help. You may want to see the Resource Utilization Statistics for FPGA VIs.

0 Kudos
Message 4 of 11
(3,643 Views)

At some point, I will post these VIs for comment.  They are pretty primitive, but at the time I wasn't really trying to save space because, to be honest, the VIs are just elementary integer math (determining the cofactor and determinant of a 3x3 matrix).  Maybe swapping the array constants with a register would be better?

0 Kudos
Message 5 of 11
(3,627 Views)
Solution
Accepted by topic author cirrusio

Again, take a look at the resource utilization statistics link. You're doing most or all of the math in 64 bits, and the resource utilization grows with increasing bit widths. In particular, logical shift, which you use a lot, is an especially expensive operation, and for a 64-bit value it requires more than twice as many lookup tables as the same operation on a 32-bit value. I suspect the implementation really is a huge lookup table - for each possible input there's a corresponding output. This allows it to be very fast at the cost of FPGA fabric. Other mathematical operations (add, subtract, multiply) are more space-efficient.

 

The entire block of logic in the lower-left corner of your VI is an easy candidate to wrap in a single-cycle timed loop, which may save you some space, although it will save mostly flip-flops and not look-up tables.

Message 6 of 11
(3,617 Views)

OK - that link is awesome!  This is exactly what I was looking for!  So....what I am doing in the bottom left hand corner is a divide by 3.  Now, the question is whether I could improve performance via fixed point math.  I suspect that I would pay a big penalty but it is not clear to me.  Unfortunately, I think that I would not benefit from putting the code into a SCTL given that I am getting killed on LUTs and not Flip Flops.  But, there is a possibility I can knock down the precision on the divide operation with all of the shifts, but I need the precision for the cofactoring and determinant as th numbers that are coming out of the top loop with the memory block are quite large (adding a lot of squares).  But definitely something to look into.

0 Kudos
Message 7 of 11
(3,611 Views)

Unfortunately I haven't seen an updated version of the FPGA Resource Utilization guides that includes functions that were introduced in more recent versions, but it's possible the fixed-point math won't be as problematic as you think. Depending on the algorithm, it may take more cycles but less logic. Your current algorithm can probably execute in a single clock cycle, but you don't need that level of performance. Your loop has a wait of 1000 something units (can't tell from the image). Have you considered moving some of the computation to the host computer, which may be able to do it fast enough?

0 Kudos
Message 8 of 11
(3,587 Views)

Well....I have code that offloads everything to the host, but we have moved down to a smaller board (sbRIO) and I am having issues running the code on board.  If necessary, I will move it up, but this is an attempt to get the FPGA to do some of the heavy lifting so we don't have to have everything on the host. 

 

I will try the high throughput division to see if we can knock down the usage of LUTs....

0 Kudos
Message 9 of 11
(3,565 Views)

I completely understand this; we've just started using a sbRIO and while it's a great board for our application, we're running into the limits of the on-board processor.

 

You might consider trying the standard divide as well as the high-throughput one; you don't seem to need high throughput here.

 

Out of curiousity, what's the units for the timing? Ticks, usec? I wonder if there's some clever way you could make use of all those clock cycles to do a divide by three in less logic. Repeated subtraction is too many cycles for a 64-bit value (do you really need 64 bits?) but maybe there's an algorithm out there that gets you close, and then you can do repeated subtraction for the remainder. I'm thinking for example that you can do both a divide by 2 and a divide by 4 easily, and divide by three will be somewhere between them. If the wait is in usec, you have a lot of clock cycles available to do a repeated operation, especially inside a single-cycle loop.

0 Kudos
Message 10 of 11
(3,518 Views)