From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Labview FPGA - high speed matrix calculation

Hello,

 

Trying to do a [1 x 10] * [10 x 10] matrix calculation in a 1MHz loop on a cRIO-9039.

 

Base clock is set to 80 MHz.

 

Made a SubVI that handles all the multiplication with high speed math.

Each row are calculated in a for loop.

On picture:

1: Row calculations

2: Selector, which matrix gain row to use, to be able to change matrix in one 1MHZ cycle.

3: Gain input

 

Should have 80 cycles in each loop, but I get timing violation. Any inputs?

 

 

 

 

0 Kudos
Message 1 of 7
(2,986 Views)

What timing violation are you getting? e.g. are you at 101% of loop time or 1000%? Then we know how much to optimise.

Can you upload your VI? Or even better, a project with all the clocks etc set up? (I don't have FPGA right now, but others can open it). Or at the very minimum, a picture of the whole loop where the timing violation occurs? Maybe a screenshot of the timing violation information?

 

Some points:

1) If this is inside a timed loop (can't see from your pic), then you don't think about having 80 cycles of a base clock. Inside the loop everything runs as fast as it can, and the compiler just makes sure there is enough time to get to the end (1ms loop time in this case). Base clock only applies outside a SCTL.

2) Using the high throughput functions might not particularly help here. Add and multiply (multiply is lots of adding) are fairly fast functions anyway, and your loop is slower than the base clock. However, if you do use the HT functions, maybe you will need to use the handshaking terminals too to ensure valid data. How many cycles is the multiply function set up to take? I expect at 1MHz it could execute in one cycle.

3) Using a for loop, your multiply functions all happen in series (e.g. 10 in a row). Why not do the multiplying in parallel, then just add all the numbers at the end? This will use less time and might solve your timing issue. Your code is equivalent to just using multiply function on the array, which does element by element multiplication, then doing add array elements (I think these functions exist in FPGA?)

 

Hope that helps a bit

Ian
LabVIEW since 2012
0 Kudos
Message 2 of 7
(2,960 Views)

Hello,

 

Thanks for the prompt reply!

Will upload project soon!

 

Just says in "Final Timing" report:

Clocks          |    Requested MHz         |      Maximum (MHz)

80MHz         |     80.00                        |               55.33

 

But I do not know why exactly, just this error.

 

1) Inside for loop with 80 ticks for loop time using express timer with ticks. (PS. 1 uSec is 1 MHZ, not 1 mSec 😉 )

 

2) HT Multiply is set up to 1 cycles.

 

3) Yes use series to reduce fabric, paralleled rows instead, only showing 2 of the 10 rows. Should be able to run in the 80 cycles available? I expect calculation loop to use only 30 cycles of the 80.

 

 

 

0 Kudos
Message 3 of 7
(2,931 Views)

Zip of model - inputs are welcome.

0 Kudos
Message 4 of 7
(2,923 Views)

Hi Mauritius,

 

Not all native functions will necessarily support the increase in base-clock speed. On your timing violations screen there should be a button to investigate timing violations which should tell you what is failing.

 

Because it is probably a native block failing you may not be able to do anything about it. Instead you may have to look at using single cycle time loops (SCTLs) to optimize just this part of the code at a higher clock rate.

 

I'm just loading up a compatible version of LabVIEW to look at your project but this is my suspicion.

 

Cheers,

James

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are at devs.wiresmithtech.com
Message 5 of 7
(2,902 Views)

Question before proceeding: Do you have a reason to run everything at 80MHz base clock? Both your loops have a loop timer which slows them down. Bottom loop says 10kHz rate, and the top loop just has to keep up with the data, unless you have 80000 iterations of that loop compared to the bottom loop. Why not run at the normal rate of 40MHz where everything always compiles fine? If there are particular sections of code that need a higher throughput you can use a single cycle timed loop, and even can run this loop faster if needed.

Might save us all a lot of time (pun unintended)

 

If you do actually need to go 80MHz:

 

Seeing as nothing here is in a SCTL, it means one of the functions is failing to compile. Could be block memory, these have options to take multiple clock cycles you can set. Could be your multiply function, try setting to 2 cycles. Does the compilation window not tell you more details? I can't remember how good it is at being informative.

Ian
LabVIEW since 2012
Message 6 of 7
(2,900 Views)

Hello,

 

Sorry for digging up this, but new problems have come, so I have still not solved my problems.

The SCTL was the solution for the matrix calc. 

 

Main FPGA loop is running 1 Mhz, so have 80 cycles in each iteration. They are all needed, so 80 MHz is a must.

 

Have made SCTL for the matrix calc now, which works perfect, or at least does what it should, when used by itself.

.vi and picture included. A good hint to use that, thanks!

 

Each channel of the input is iterated over and summed. Matrix * Array = Pout   (10x10 * 10x1 = 10x1)

Internally is a 200 element array where start index can be switched, so one can update Matrix elements in low or high while calculating using the other. This is needed, as the switching needs to be done in one instance, handled outside SCTL.

Block memory is used to feed this internal array.

I have 2 of these matrix modules, and some SCTL subvi with array[10] in and out.

 

It fails if I do not choose "Optimize congestion" which leads me to that is the issue.

 

Any inputs how to make this design better for the FPGA congestion issues?

It sometimes fails the routing or is very sensitive to changes in code, and then fails with timing violations in code (subvi's) that I have tested and validated on the target already.

Is it better not to use SCTL in subvi, to allow compiler more freedom?

 

The FPGA is not running out of resources, as usage is around 70%.

Any information needed?

0 Kudos
Message 7 of 7
(2,747 Views)