FPGA: why does my loop run so slow?

dvt729 · ‎04-19-2013

I have a loop executing on my FPGA (7854R card) that is running much slower than I would expect. Looking at the code, I would expect the limiting factor to be an analog output; that block takes around 42-43 ticks to execute on its own, and I have pipelined the loop to isolate that block. I have taken the loop apart and benchmarked many of the separate parts, finding that all of the other parts take much less than 40 ticks to execute. But when I put all of the parts together, the loop takes around 133 ticks to execute. Can you help me figure out what might be causing this slowdown?

I have attached the .vi in question. You may notice that it is incomplete, since I have eliminated the rest of the function of my program to make it as simple to understand as possible.

.aCe. · ‎04-20-2013

You can have lot of your logic and math inside a SCTL http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/using_sctl_optimize_fpga/

That should help.

There are other optimisation techniques as well, these links may be useful if you havent seen them already:

http://www.ni.com/white-paper/3749/en

http://digital.ni.com/public.nsf/allkb/311C18E2D635FA338625714700664816?OpenDocument

http://digital.ni.com/public.nsf/allkb/722A9451AE4E23A586257212007DC5FD

Subscribe to the Test Automation user group: UK Test Automation Group

dvt729 · ‎04-21-2013

Thanks, I am trying the SCTLs right now. I am also taking a look at the links you sent. Will let you know if that helps.

Here's the part that I don't get: I don't see why that logic should take more than the 40-ish ticks required by the analog output block, much less the 133 clock ticks it currently takes. If I break the code down into smaller chunks, they all seem to run at reasonable speed. It seems that it's only when I put them all together that the slowdown happens. Any insight into that behavior would be appreciated too.

Thanks again for the assistance.

dvt729 · ‎04-21-2013

I read through all of your links... I had already read the first two, but I picked up a new piece of information somewhere along the way.... A SCTL not only makes code run faster, it also saves FPGA fabric. I didn't realize that.

My issue seems to be the divide function. It takes 60+ cycles to execute. I had two of them cascaded, resulting in 133 cycles for the whole loop. I could swear I had benchmarked the divide function separately, but when I benchmarked it today it came out at 60+ cycles. The high speed divide function is about twice the speed; I've switched to that function.

I have pipelined the two divide functions (that will require some adaptation of adjacent code, but that's OK). I'm also playing with how much of the other logic I can put in one SCTL; I know I can break it into two SCTLs and it will compile.

My loop is currently running at 44 cycles/iteration, so there's not much room for optimization in speed. I will make a few more attempts with SCTLs to save some fabric, but I'm just about where I wanted to be.

Thanks for the help!

dan_u · ‎04-22-2013

If you want to save fabric the SCTL is definitely a good choice. In addition you can also save some by:

- checking the FXP number representations (for the divisions this will also save ticks). E.g. the 40020000 constant does not need to be 30 or even 50 bits wide.

- avoid divisions when possible. One of your divisions divides by a constant. You can easily change that into a multiplication. Also, you can specify the output type of a FXP calculation.

Also, pipelining makes sense, but if you do so, try splitting your critical path (the 2 divisions) as well. For instance the second division could probably be in parallel to the first one (also when switching to multiplication this might make sense).

JLewis · ‎04-23-2013

A quick way to reduce the types of constants is to right-click on the constant and select "Adapt to Entered Data". This will give you the optimal fixed-point range that fits the number automatically. The reduced ranges will automatically propagate through the diagram, reducing bits needed for downstream operations as well.

LabVIEW

FPGA: why does my loop run so slow?

FPGA: why does my loop run so slow?

Re: FPGA: why does my loop run so slow?

Re: FPGA: why does my loop run so slow?

Re: FPGA: why does my loop run so slow?

Re: FPGA: why does my loop run so slow?

Re: FPGA: why does my loop run so slow?