06-26-2017 11:26 PM
Hi all,
I am trying to do linear algebra on an FPGA, as I need a fast computation speed to have an accurate observer for control applications. I need to compute at >=100kHz; my FPGA target has a 40MHz clock, which gives me 400 clock cycles to compute.
I have broken down my matrices into 1D row arrays, and I am using the Dot Product VI. (My FPGA target only supports 1D arrays; and I am using the 2013 version.)
The computation flow is as follows: Case 0 would compute a1*x1, a2*x2, a3*x2, and a4*x4, and store them in y1, y2, y3, and y4. Something would trigger the case to advance. Then, Case 1 would compute a5*x5, a6*x6, a7*x7, and a8*x8, and store them in y5, y6, y7, and y8; and so on.
Using 8 Dot Product VIs, I can do all necessary computations in four cases. (Ten of the computations are size 1x8 * 8x1; sixteen computations are size 1x2 * 2x1.) I cannot use more than 8 Dot Product VIs because I run out of resources on the FPGA, so I am using case structures to "schedule shifts" of inputs/outputs to a set of 8 Dot Product VIs. (See screenshot below.)
However, I am struggling with getting the computed result of the Dot Product VI to go into the right variable. What I mean by that is that there is a >=1 cycle computational delay from the Dot Product VI that seems to be unpredictable, and this is causing my computations to get stored in the wrong outputs (i.e. a1*x1 gets stored in y5).
Does anyone have any advice on how to make this work better? Here are some thoughts I have had:
1. Use a boolean timing flag in conjunction with the "Ready for Output" input on the Dot Product VI in order to use them like a register, triggering them to "flush" when the flag is activated. This sort of works, except if the flag stays on for more than one cycle or doesn't get detected when triggered, things can go wonky.
2. Offset the later cycle by some fixed number of cycles. This only works if the Dot Product VIs have a constant cycle delay. I've had mixed results with this; under some conditions it seems to work, under others it does not. I don't fully understand the cycle delay.
3. Try to use the "ready for input" output flag from the Dot Product VI to trigger the case to advance. Again, I can't quite get consistent results. It seems like everything triggers at different times. I also don't fully understand what the "ready for output" and "ready for input" flags cause the Dot Product VI to do (does it cache its computed output until the downstream signals it is ready?).
Code is attached (runs on an FPGA target, mine is a cRIO-9030), and here's a screenshot showing the idea I've been working with.
I'm quite tired of waiting 10 minutes to compile only to find out it doesn't behave in a way that makes sense over and over! I would appreciate any kind of insight FPGA experts out there might have. Thanks 🙂
Solved! Go to Solution.
06-27-2017 10:00 AM
@phototr0pe wrote:
(...)
I'm quite tired of waiting 10 minutes to compile only to find out it doesn't behave in a way that makes sense over and over! I would appreciate any kind of insight FPGA experts out there might have. Thanks 🙂
One quick tip: if you want to test VI, you don't have to compile the FPGA. You can simply run this VI on My Computer target (add it in the project). There are also FPGA simulation options available (but this is quick tip, so I don't have time to elaborate about them now 😉
06-27-2017 10:18 AM - edited 06-27-2017 10:18 AM
In addition to that quick tip, here's a link describing checking FPGA code without compiling:
06-27-2017 11:41 AM - edited 06-27-2017 12:07 PM
Thanks for the suggestions, all. Unfortunately both Simulation mode on the FPGA target and trying to run the VI on another target e.g. the host PC do not work, at least using my cRIO-9030 and LabVIEW 2014. The VI will run but the Dot Product computation always returns zero. Perhaps the single cycle timed loop is not implemented for these features?
EDIT: Ok, after clearing the VI from memory and trying the Simulation mode again, it does actually work with the SCTL. I'm not sure why it didn't before! Good tip.
06-27-2017 03:33 PM
Ok, for anyone else out there who wants to use this technique, I've figured out a few things.
How I solved the problem:
It's not strictly deterministic, as not all eight states update at the same time; but I can update all elements at >=500kHz, which should be fast enough to appear deterministic to my 1kHz control loop.
06-28-2017 12:46 AM
@phototr0pe, I'm glad you got things working, although you can make the code a bit more robust with a few changes.
First, I would look over the documentation for the Dot Product function. It's possible some of the issues you are seeing are due to differing configurations of the nodes, specifically the pipeline stages. This function uses a high-speed handshake protocol for operation to ensure data can be pipelined through the design safely. If designed properly, you should not need to time anything in the function but can rather just let the data flow through the system using the control flow handshake signals.
Like you mentioned, blocks using the handshake protocol only maintain their output value for the cycle the Output Valid signal goes high. If you don't capture the value on that cycle, it will be lost. However, the output will only be delivered if Ready for Output is asserted.
I'm not sure exactly how the application will service inputs and outputs, but using arrays to hold all the values can consume a lot of resources. If possible, it would be good to store the values in a memory and read one row or column at a time. If the values are coming through a DMA channel, there are a number of schemes to use to partition the data efficiently.
Again, the handshake signals are helpful here as you have many options to partition the data across the Dot Product functions. For instance, if you increase the pipeline depth of the Dot Product function, and decrease the number of elements per cycle from memory, you may be able to increase the clock rate of the loop up to 120 MHz or higher and actually get better throughput, possibly with fewer Dot Product functions. You should have no problem hitting a 1 kHz response time for an application like this!