10-10-2017 07:01 AM
Hi there,
I currently use CUDA in LabVIEW for matrix multiplication as shown in the attachment
the GPU is used to compute large amount of data.
so I wounder how can I multiply a large no. of vectors (each of one) by a N*N matrix to produce a new set of vectors???
like this
x0,y0,z0 1st vector
x1,y1,z1 2nd vector
:
:
:
x1000,y1000,z1000
these vectors are stored as a 3*1000 array.
how can these operations of mul (vector by N*N) are distributed in the cores of GPU??
any advise please.
Solved! Go to Solution.
10-10-2017 07:59 AM
no idea on specifics of CUDA,
but for splitting the problem in smaller parts,
you could just let different parts of the vector multiplication run on different cores/processes/magic.
by which i mean, the matrix multiplication rule is "line by column" (loosely translated),
and then you just let the multiplication of line1-by-column1 run on the first core,
and line2-by-column1 on the next and so forth.
(assuming right-sided vector multiplication, but left-sided multiplication would be similar)
10-10-2017 10:32 AM
@jwscs wrote:
no idea on specifics of CUDA,
but for splitting the problem in smaller parts,
you could just let different parts of the vector multiplication run on different cores/processes/magic.
by which i mean, the matrix multiplication rule is "line by column" (loosely translated),
and then you just let the multiplication of line1-by-column1 run on the first core,
and line2-by-column1 on the next and so forth.
(assuming right-sided vector multiplication, but left-sided multiplication would be similar)
but the vector-matrix multiplication is done internally in GPU and i cant divided it!
I want to mul each vector with the specified matrix such as A, and store the result back in vector.
10-10-2017 11:21 AM - edited 10-10-2017 11:22 AM
Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?
10-10-2017 11:55 AM - edited 10-10-2017 11:55 AM
@Blokk wrote:
Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?
oh im sorry for that , i attached the vi here.
please help me
10-10-2017 12:24 PM - edited 10-10-2017 12:52 PM
Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
edit2: "these vectors are stored as a 3*1000 array." ok, so you already have the vertex matrix. So just multiply the 2 matrices, and you get the results in the columns...
10-10-2017 03:01 PM - edited 10-10-2017 03:06 PM
@Blokk wrote:
Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.
oh yes, by build array, thank you for your help.
@Blokk wrote:
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed I have a 3D object with hundred of thousand of vertices, and the A matrix isnt constant, it is produced after many operations. why i dont need GPU ???? I want to compare with GPU and CPU, can i get a segnificant difference in execution times?
secondly I would ask you about the real time operation in GPU, is it supported in GPU ??? If i import a real moved object or even if i change any parameter in the matrix???
10-10-2017 03:31 PM
oh yes by build array, thank u for your help .
Blokk wrote:Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???
secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???
10-10-2017 03:51 PM
@ssara wrote:
oh yes by build array, thank u for your help .
Blokk wrote:Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???
secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???
Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.
I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?
I am not really familiar with LabVIEW Cuda, so I can't really help further. I imagine you can hold the two 2D arrays on the GPU memory, and only update their content as needed. But since I have no idea what is your actual procedure, I cannot really help. One important thing is to minimize the frequency of data copies between the CPU RAM and the GPU RAM, this slows things down...
10-10-2017 04:26 PM
Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.
ok, i understood that, the vectors are not in a 2D as l need.
I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?
sorry for misunderstanding, I need to upscale.
thank you,