01-16-2014 01:48 AM
Hi everyone, I am interest in CUDA, so I download Xswords' code to try FFT. However, when I change the batch number, the program will give almost all false results; only the result of the first batch is right. The attachment is the code. Could someone tell me where I made mistakes in the code? Thx.
03-09-2014 08:48 PM
Can someone help me?
03-10-2014 10:03 AM
Because CUFFT is based on a C API, all calls to do the (inverse) FFT use a void pointer (void*) for the input signal (spectrum) out output spectrum (signal).
LabVIEW as no concept of a general void data pointer so mapping the overloaded cuFFT function calls in LabVIEW require special typing. Your instincts are correct regarding inplace operations on SGL or CSG data. The CUFFT API only has a single function because they can pass SGL or CSG data to the function. In LabVIEW, these calls incorporate the type (either SGL in and out, or CSG in or out). The only other choice is to only support one of the types but that means someone's use case will be left out.
In terms of the CSG inplace version, it is not the same as the complex FFT case because the output is packed in the first N/2 CSG elements. It does not produce N CSG elements where the imaginary components are set to zero.
Regarding your example recommendation, there are many applications that use real FFTs but as you see, they are far more complex to handle because of all of the inplace options exported by CUFFT. In this sense, they are meant for advanced users and not necessarily suitable as a basic toolkit example.
Because the API for the real (inverse) FFT is more complicated, I'm not sure a crafted example would have covered what you needed and prevented the discussion thread we built over time. In the end, the toolkit is not designed to teach someone how to use CUDA/CUBLAS/CUFFT functions but help someone already familiar with them call them safely from LabVIEW. Having examples hinder in this regard as some may feel like they don't need to be familiar w/ the CUDA interfaces to call the functions from LabVIEW.
03-10-2014 10:26 AM
I wish I could supply the modified version of the VI captured here but I can't since your version of LV won't be able to load it. The problem you are running into is very common when trying to do the real FFT in-place. I did not try to correct the padding needed for the input signals. Instead, I changed to use the non-inplace version (SGL->CSG) which uses different GPU buffers for the signal and spectrum.
Unless you have very limited resources on the GPU, this should offer better performance. How much may or may not be meaningful to your use case. To do this I allocate an extra CSG buffer up front, pass it to the real FFT call which now accepts SGL and CSG inputs, and free it at the end.
The output still contains only half of the spectrum for each signal but your N=8, batch = 5 produces results equivalent to the CPU version.
03-10-2014 10:58 AM
I also encountered your problem after I posted the above code. But I have not got time to further resolve it. As I am also a begginer of GPU computation and GPU analysis toolkit, it is really not easy for me to understand Mathguy's explanation in a short period of time.
I can do out-of-place r2c and c2c fft using GPU analysis toolkit now, but I still am curious of how to do r2c computation as I want to save the GPU memory for very large size data processing.
08-15-2014 08:11 AM
I attach an example for an R2C out-of-place fft example for processing an 2D array real input according to Mathguy's guidance. My environment is Labview 2012 64bit, GPU analysis toolkit 2012 and win7 64bit. Hope it is useful for those who need.
08-15-2014 08:42 AM
01-19-2016 07:19 AM
I need also an 1D Fourier Transformation like Xsword, so I tried his out of place example to understand how I can adjust his VI for my application. But there is one problem. I don't get any output. I'm working with Labview 2015 on a PC with Win7. I also tested already if the CUDA Toolkit is installed correctly using example code provided by nVidia. With this examples everything worekd fine. At the moment I am not able to figure out why the VI isn't working properly. Maybe some one could help me with this issue? I attached a screenshot of my output with this Xswords example and a screenshot of the output I obtain, if I using the 2D FFT example provided by NI.
Maybe someone of you could help me with this issue?
Thanks in advance.
EDIT: I solved the problem. I used version 7.5 of the CUDA Toolkit, but the GPU Analysis Toolkit can not work with this, so I installed version 4.1 and now everything is working.
Edit: After an aditional try (I didn't change anything) I got an error message:
Error code: -359631
call to cudaMalloc in cudart32_75.dll.
<ERR>NVIDIA provides the following information on this error condition:
cudaErrorUnknown = 30
This indicates that an unknown internal error has occurred.
<b>library version supplying error info:</b>
The following are details specific to LabVIEW execution.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin
-> lvcuda.lvlib:CUDA SGL Device Ptr.lvclass:Allocate Memory.vi:6140001
-> r2c_out-o-f-place_fft_ example.vi
Most NVIDIA functions execute asynchronously. This means the function that generated this error information may not be the function responsible for the error condition.
If the functions are from different NVIDIA libraries, the detailed information here is for a unrelated error potentially.