LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA: why does my loop run so slow?

I have a loop executing on my FPGA (7854R card) that is running much slower than I would expect. Looking at the code, I would expect the limiting factor to be an analog output; that block takes around 42-43 ticks to execute on its own, and I have pipelined the loop to isolate that block. I have taken the loop apart and benchmarked many of the separate parts, finding that all of the other parts take much less than 40 ticks to execute. But when I put all of the parts together, the loop takes around 133 ticks to execute. Can you help me figure out what might be causing this slowdown?

 

I have attached the .vi in question. You may notice that it is incomplete, since I have eliminated the rest of the function of my program to make it as simple to understand as possible.

0 Kudos
Message 1 of 6
(3,918 Views)

You can have lot of your logic and math inside a SCTL http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/using_sctl_optimize_fpga/

That should help.

 

There are other optimisation techniques as well, these links may be useful if you havent seen them already:

http://www.ni.com/white-paper/3749/en

http://digital.ni.com/public.nsf/allkb/311C18E2D635FA338625714700664816?OpenDocument

http://digital.ni.com/public.nsf/allkb/722A9451AE4E23A586257212007DC5FD

 

 


CLA CTAChampionI'm attending the GLA Summit!
Subscribe to the Test Automation user group: UK Test Automation Group
0 Kudos
Message 2 of 6
(3,891 Views)

Thanks, I am trying the SCTLs right now. I am also taking a look at the links you sent. Will let you know if that helps.

 

Here's the part that I don't get: I don't see why that logic should take more than the 40-ish ticks required by the analog output block, much less the 133 clock ticks it currently takes. If I break the code down into smaller chunks, they all seem to run at reasonable speed. It seems that it's only when I put them all together that the slowdown happens. Any insight into that behavior would be appreciated too.

 

Thanks again for the assistance.

0 Kudos
Message 3 of 6
(3,857 Views)

I read through all of your links... I had already read the first two, but I picked up a new piece of information somewhere along the way.... A SCTL not only makes code run faster, it also saves FPGA fabric. I didn't realize that.

 

My issue seems to be the divide function. It takes 60+ cycles to execute. I had two of them cascaded, resulting in 133 cycles for the whole loop. I could swear I had benchmarked the divide function separately, but when I benchmarked it today it came out at 60+ cycles. The high speed divide function is about twice the speed; I've switched to that function.

 

I have pipelined the two divide functions (that will require some adaptation of adjacent code, but that's OK). I'm also playing with how much of the other logic I can put in one SCTL; I know I can break it into two SCTLs and it will compile.

 

My loop is currently running at 44 cycles/iteration, so there's not much room for optimization in speed. I will make a few more attempts with SCTLs to save some fabric, but I'm just about where I wanted to be.

 

Thanks for the help!

0 Kudos
Message 4 of 6
(3,853 Views)

If you want to save fabric the SCTL is definitely a good choice. In addition you can also save some by:

- checking the FXP number representations (for the divisions this will also save ticks). E.g. the 40020000 constant does not need to be 30 or even 50 bits wide.

- avoid divisions when possible. One of your divisions divides by a constant. You can easily change that into a multiplication. Also, you can specify the output type of a FXP calculation.

 

Also, pipelining makes sense, but if you do so, try splitting your critical path (the 2 divisions) as well. For instance the second division could probably be in parallel to the first one (also when switching to multiplication this might make sense).

 

0 Kudos
Message 5 of 6
(3,834 Views)

A quick way to reduce the types of constants is to right-click on the constant and select "Adapt to Entered Data". This will give you the optimal fixed-point range that fits the number automatically. The reduced ranges will automatically propagate through the diagram, reducing bits needed for downstream operations as well.

0 Kudos
Message 6 of 6
(3,782 Views)