FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

etgohomeok · ‎05-15-2014

Hi, I am attempting to use the Linear Algebra Matrix Multiply function in Labview 2013. I am attempting to multiply a 288x576 matrix by a 576x1 vector. The matrix is loaded from a file on the host computer and sent via a DMA FIFO to the target, and the vector is (for now) just filled with constants. In the future, the vector will be populated by data from an I/O module attached to the FPGA unit.

I am wondering if anyone knows about/has any experience with using the Linear Algebra Matrix Multiply function to multiply a constant 2D matrix by a vector.

I have attached a screenshot of my current attempt at a solution to this post. Because FPGA only supports 1D arrays, I have the matrix coming into my timed loop as a 1D array with a length of 288 * 576 = 165,888. I then take the first 576 elements using the Array Subset function and feed them into the Matrix Multiply function. The next iteration takes the second 576 elements, then the third, etc. Each of these sets corresponds to a row of my matrix, so I am effectively feeding the matrix in one row at a time, which is the type of input I have the Matrix Multiply function configured to accept (see the screenshot of the dialog attached).

The issue with my current solution is that the Array Subset function does not return a fixed-size array unless the "index" input is wired to a constant (http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgahelp/returning_fixedsize_arrays/). For my application, the index at which I take the subset needs to change, and so the type of the wire coming out is labeled as "bounded size" instead of "fixed size." The code I attached a screenshot of performs as I would like it to when I run it on the development computer, however when attempting to compile to the FPGA I get an error about the small bit of wire (circled in the picture) that is "bounded size" instead of "fixed size."

If anyone knows any alternative methods to perform the type of matrix multiplication I described above, or a fix for the problem I am having with my code, it would be greatly appreciated.

MrQuestion · ‎05-15-2014

That is a lot of resources you will be eating up.

FYI: Last I looked you can't have a Quotient & Remainder Function inside a SCTL. Also, if each segment is 576 elements long, why are you using a quotient & Remainder function? Shouldn't you be multiplying by 288 instead?

Taking an array segment like this will not work because LabVIEW FPGA in it's current state isn't aware that the constant that you are feeding into the Array subset is a multiple of the fixed array size, and there are no checks to prevent the start location to be within bounds.

What happens when you want to take the 289th segment?

In regards to breaking up a complex array I have come across this issue before.

There are two ways (architecturally) to do this.

Both ways share a common technique.

Since you have a 1D array that essentially represents a 2D array, you need to have a mechanism to break up this large array into smaller 576 element chunks. You cannot do this with the Array Subset.

You have a choice to make for architecture.

1.) Have your array creation mechanism work in parallel with your Matrix Multiply, using parallel pipelining techniques, where you must have timing/syncing in place to ensure clean data out. (This is not for faint of heart, and suggest you don’t do this)

2.) Create a state type architecture (much easier)

In one state you will have a “Create Array”, and in the other you will have a “Multiply”

All arrays will be fixed size. In the “Create Array” state you will extract a single element from your “multiply” array and inject it into your defined array that is 576 elements long. After this array is filled you then transition to the “Multiply” state. You will then monitor the “Output Valid” boolean. Once this boolean is true you go back to the “Create Array” state. Repeat 288 times.

I hope that explains things. It might sound complex, but it isn’t really that hard.

I don’t have LabVIEW FPGA installed on my PC at the moment, or else I would show you a picture. If you need more help, I can make this on my home PC tonight.

Engineering - The art of applied creativity ~Theo Sutton

GregSands · ‎05-15-2014

Have you looked at the Matrix*Vector VI which is in the Math/Control palette? It looks like a simpler way to do what you want, given that your Matrix is constant.

GregSands · ‎05-15-2014

Oops, just noted that the maximum array size for Matrix*Vector is 50x50. That won't work for you. 😞 Unless perhaps you can break both your Matrix and Vector up into 288x50 and 50x1 segments, and recombine the results after?

MrQuestion · ‎05-16-2014

Have you made any progress on this yet?

Engineering - The art of applied creativity ~Theo Sutton

Dragis · ‎05-16-2014

Have you tried using the Row-wise or Column-wise element interfaces? Since the matrix is coming from the host one element at a time using the elemental interface will allow the Matrix Multiply to buffer things internally in the most efficient manner.

http://zone.ni.com/reference/en-XX/help/371599J-01/lvfpga/la_matrixmultiply/#DBox

etgohomeok · ‎05-20-2014

Hi, I'm not sure as to how I would extract the single element from the multiply array if the array is being stored on the FPGA target. Would that not require the use of the Index Array or Array Subset functions, which return bounded-size or variable-size arrays unless the index input is a constant?

I'm also worried that switching between states in such an archecture would slow the processing. For this particular application, the large matrix will be a fixed constant for the duration of the program (it is a "calibration" matrix of sorts that is loaded from a file on the host computer's disk at the beginning of the program's execution and does not change until a new calibration is done). The 1D array that is being multiplied by it will be populated by data from a sensor. As such, I would ideally like to only create the calibration array once then stay in a state where I'm constantly reading data from my sensor and multiplying it by my calibration matrix. I believe that if the program is frequently switching to a new state that reads bits of the calibration matrix from the DMA FIFO, I wouldn't be able to achieve the rapid processing that I'm aiming for with the FPGA.

GregS, I was initially trying to use that particular VI, but as you said, it has the 50x50 limitation.

Dragis, I've looked into those interfaces rather than the Row Vector and Column Vector interfacs, however the latency jumps from ~1500 to over 300,000 clock cycles, which is too slow for this application.

I've considered the possibility of having 288 separate arrays and wiring them manually through switches so that I can cycle through them. Would that perhaps be the best (although also the most painful) solution that will retain the processing speed I'm hoping to achieve?

MrQuestion · ‎05-20-2014

When you replace array element in a fixed array the output is a fixed array.

If timing is a consideration you can always create a new clock, and do your Multiplication in a different time domain.

There are many ways to optimize designs depending on your needs. The linear algabra matrix that you are using is based off of fabric; and a lot of it, so you will probably hit an upper clock limit fairly fast. I'm kinda surprised that you aren't hitting clock timing violations already in a 40Mhz time domain.

Of course once you create the calibration matrix you wouldn't create it again. It would be stored in a shift register. You would never go back to that state again, unless you want to.

What type of FPGA are you using? There are many tricks to manage and manipulate large datasets in LabVIEW FPGA.

Depending on your FPGA you can really take advantage of the DSP48e Multiply and Accumulate function with a Dot Product algorithim. The DSP48e has no problems chunking away at data at a much higher clock speed.

Also, are you familer with the LabVIEW FPGA IP Builder? If you do take the split up in 288 separate array approach the FPGA IP Builder will save you alot of time.

Search for Figure 21 at http://www.ni.com/white-paper/14036/en/

Engineering - The art of applied creativity ~Theo Sutton

dfjuggler · ‎08-11-2014

I'm attempting a similar thing. I wrote the following code which doesn't work as expected?

I was expecting this to work and I can't see why the first result is just repeated. I guess it has something to do with the hand-shaking, but I have no clue how to make it work.

Can someone help out? Thanks!

Matt_L · ‎08-12-2014

Hi dfjuggler,

So this part stood out to me:

Can you try just passing the array out? I don't think you should need to index. Also, since you're indexing at 0 each time, you're passing the same value into the new array.

LabVIEW

FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply

Re: FPGA constant matrix times a vector using Linear Algebra Matrix Multiply