LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

multiply vectors by matrix in gpu

Solved!
Go to solution

If I find some time later, I can put together some benchmarking VI, CPU vs GPU. For sure, up to a certain size, you are better with the CPU, since data copies between the video card and the host RAM also takes time.

Besides, modern CPUs are multicore. So you could even parallelize your matrix multiplications on your CPU, I guess LabVIEW can even do this automatically in some cases. 

So I will try to find time, and I will put together some benchmarking, I got interested 🙂 I have a massive gaming nvidia card, and an Intel i7 CPU, so I just need time and install CUDA drivers...

0 Kudos
Message 11 of 16
(2,077 Views)

Ok, I have installed CUDA on my home PC. If I find time in the evening, I will put together some test code...

 

Example_VI_FP.png

0 Kudos
Message 12 of 16
(2,073 Views)

By the way, do you use LabVIEW 32 bit or 64 bit version?

 

I just realized this still existing issue with the GPU Toolkit:

https://forums.ni.com/t5/LabVIEW/CUDA-Matrix-Multiplication-Fails/m-p/3262686/highlight/true#M951836

 

Since I have LV 32 bit installed on a 64 bit Windows10 OS, along with Cuda 9.0 (which has only 64 bit support), I cannot use the libraries from CUDA (no error shown, but the CUBLAS version comes back as 0.0, indicating LabVIEW cannot handle the x64 Cublas dll). So if the VI you posted does not work (result is an array with zeroes), this explains it. If you use LV 64 bit, CUDA GPU Toolkit should function fine. So sorry, I cannot do the benchmarking yet, first I will need to install LV 64 bit version, when I find more time... (I do not want to follow the "hack" mentioned in the above link, that manually copying 32 bit DLLs from and older CUDA Toolkit version).

 

 

 

 

 

 

 

 

 

 

 

 

0 Kudos
Message 13 of 16
(2,056 Views)

I installed LV 2017 64bit, now it work fine with CUDA 9.0. I got cuBLAS work fine too.

So to recall our discussion, the point is with GPU calcs is that, it takes time to upload/download data to/from the GPU. I just made two simple test VIs, but I am not even close to consider myself a skilled banchmarker, so do not take these values too serious 🙂

So the first snippet just using the CPU. The second the GPU. As you can see, for 100k vectors, the matrix multiplication actually faster using the CPU, if we compare the total operation times. I played with 10M vectors too (100M killed my GPU VI for some reason, overload?), in this case the execution times are in the same range, comparing total execution times.

 

However, if you can make a smart code which keeps as many operations on the GPU as possible, and you minimize the frequency of data copies between the host and the GPU, you could gain a lot of speed! But all depends on your algorithm...

 

matrix_1.pngsimpleTestGPU.png

0 Kudos
Message 14 of 16
(2,039 Views)

But on the GPU you can (maybe?) use sgl's I you don't require the accuracy. On cpu that is a pain, sine the matrix vi's are all made for dbl's. And since uploading\downloading seems to be the bottleneck, the break even point could be at a much lower number. It's comparing apples with pears, but if speed is important, it might help.

Message 15 of 16
(2,032 Views)

wiebe@CARYA wrote:

But on the GPU you can (maybe?) use sgl's I you don't require the accuracy. On cpu that is a pain, sine the matrix vi's are all made for dbl's. And since uploading\downloading seems to be the bottleneck, the break even point could be at a much lower number. It's comparing apples with pears, but if speed is important, it might help.


Yep, actually older GPUs only supported SGL data types. The CUDA VIs support SGLs, the memory up/download and allocation VIs support lots of data types:

cuda1.png

The cuBLAS matrix multiplication VI supports matrices with the following data types: SGL, DBL, CSG, CDB.

0 Kudos
Message 16 of 16
(2,029 Views)