multiply vectors by matrix in gpu

Blokk · ‎10-10-2017

If I find some time later, I can put together some benchmarking VI, CPU vs GPU. For sure, up to a certain size, you are better with the CPU, since data copies between the video card and the host RAM also takes time.

Besides, modern CPUs are multicore. So you could even parallelize your matrix multiplications on your CPU, I guess LabVIEW can even do this automatically in some cases.

So I will try to find time, and I will put together some benchmarking, I got interested 🙂 I have a massive gaming nvidia card, and an Intel i7 CPU, so I just need time and install CUDA drivers...

Blokk · ‎10-10-2017

Ok, I have installed CUDA on my home PC. If I find time in the evening, I will put together some test code...

Blokk · ‎10-11-2017

By the way, do you use LabVIEW 32 bit or 64 bit version?

I just realized this still existing issue with the GPU Toolkit:

https://forums.ni.com/t5/LabVIEW/CUDA-Matrix-Multiplication-Fails/m-p/3262686/highlight/true#M951836

Since I have LV 32 bit installed on a 64 bit Windows10 OS, along with Cuda 9.0 (which has only 64 bit support), I cannot use the libraries from CUDA (no error shown, but the CUBLAS version comes back as 0.0, indicating LabVIEW cannot handle the x64 Cublas dll). So if the VI you posted does not work (result is an array with zeroes), this explains it. If you use LV 64 bit, CUDA GPU Toolkit should function fine. So sorry, I cannot do the benchmarking yet, first I will need to install LV 64 bit version, when I find more time... (I do not want to follow the "hack" mentioned in the above link, that manually copying 32 bit DLLs from and older CUDA Toolkit version).

Blokk · ‎10-17-2017

I installed LV 2017 64bit, now it work fine with CUDA 9.0. I got cuBLAS work fine too.

So to recall our discussion, the point is with GPU calcs is that, it takes time to upload/download data to/from the GPU. I just made two simple test VIs, but I am not even close to consider myself a skilled banchmarker, so do not take these values too serious 🙂

So the first snippet just using the CPU. The second the GPU. As you can see, for 100k vectors, the matrix multiplication actually faster using the CPU, if we compare the total operation times. I played with 10M vectors too (100M killed my GPU VI for some reason, overload?), in this case the execution times are in the same range, comparing total execution times.

However, if you can make a smart code which keeps as many operations on the GPU as possible, and you minimize the frequency of data copies between the host and the GPU, you could gain a lot of speed! But all depends on your algorithm...

wiebe@CARYA · ‎10-17-2017

But on the GPU you can (maybe?) use sgl's I you don't require the accuracy. On cpu that is a pain, sine the matrix vi's are all made for dbl's. And since uploading\downloading seems to be the bottleneck, the break even point could be at a much lower number. It's comparing apples with pears, but if speed is important, it might help.

Search LabVIEW like a graph!

Blokk · ‎10-17-2017

wiebe@CARYA wrote:

But on the GPU you can (maybe?) use sgl's I you don't require the accuracy. On cpu that is a pain, sine the matrix vi's are all made for dbl's. And since uploading\downloading seems to be the bottleneck, the break even point could be at a much lower number. It's comparing apples with pears, but if speed is important, it might help.

Yep, actually older GPUs only supported SGL data types. The CUDA VIs support SGLs, the memory up/download and allocation VIs support lots of data types:

The cuBLAS matrix multiplication VI supports matrices with the following data types: SGL, DBL, CSG, CDB.

LabVIEW

multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu

Re: multiply vectors by matrix in gpu