GPU Computing

Showing results for 
Search instead for 
Did you mean: 

Inplace R2C FFT problem

Hi everyone, I am interest in CUDA, so I download Xswords' code to try FFT. However, when I change the batch number, the program will give almost all false results; only the result of the first batch is right. The attachment is the code. Could someone tell me where I made mistakes in the code? Thx.


0 Kudos
Message 11 of 18

Can someone help me?

0 Kudos
Message 12 of 18

Because CUFFT is based on a C API, all calls to do the (inverse) FFT use a void pointer (void*) for the input signal (spectrum) out output spectrum (signal).

LabVIEW as no concept of a general void data pointer so mapping the overloaded cuFFT function calls in LabVIEW require special typing. Your instincts are correct regarding inplace operations on SGL or CSG data. The CUFFT API only has a single function because they can pass SGL or CSG data to the function. In LabVIEW, these calls incorporate the type (either SGL in and out, or CSG in or out). The only other choice is to only support one of the types but that means someone's use case will be left out.

In terms of the CSG inplace version, it is not the same as the complex FFT case because the output is packed in the first N/2 CSG elements. It does not produce N CSG elements where the imaginary components are set to zero.

Regarding your example recommendation, there are many applications that use real FFTs but as you see, they are far more complex to handle because of all of the inplace options exported by CUFFT. In this sense, they are meant for advanced users and not necessarily suitable as a basic toolkit example.

Because the API for the real (inverse) FFT is more complicated, I'm not sure a crafted example would have covered what you needed and prevented the discussion thread we built over time. In the end, the toolkit is not designed to teach someone how to use CUDA/CUBLAS/CUFFT functions but help someone already familiar with them call them safely from LabVIEW. Having examples hinder in this regard as some may feel like they don't need to be familiar w/ the CUDA interfaces to call the functions from LabVIEW.

0 Kudos
Message 13 of 18

I wish I could supply the modified version of the VI captured here but I can't since your version of LV won't be able to load it. The problem you are running into is very common when trying to do the real FFT in-place. I did not try to correct the padding needed for the input signals. Instead, I changed to use the non-inplace version (SGL->CSG) which uses different GPU buffers for the signal and spectrum.

Unless you have very limited resources on the GPU, this should offer better performance. How much may or may not be meaningful to your use case. To do this I allocate an extra CSG buffer up front, pass it to the real FFT call which now accepts SGL and CSG inputs, and free it at the end.

The output still contains only half of the spectrum for each signal but your N=8, batch = 5 produces results equivalent to the CPU version.

0 Kudos
Message 14 of 18

Hi, DestinyS,

I also encountered your problem after I posted the above code. But I have not got time to further resolve it. As I am also a begginer of GPU computation and GPU analysis toolkit, it is really not easy for me to understand Mathguy's explanation in a short period of time.

I can do out-of-place r2c and c2c fft using GPU analysis toolkit now, but I still am curious of how to do r2c computation as I want to save the GPU memory for very large size data processing.


0 Kudos
Message 15 of 18

Dear all,

     I attach an example for an R2C out-of-place fft example for processing an 2D array real input according to Mathguy's guidance. My environment is Labview 2012 64bit, GPU analysis toolkit 2012 and win7 64bit. Hope it is useful for those who need.


0 Kudos
Message 16 of 18

Here is the final inplace r2c example. Same environment as above.

0 Kudos
Message 17 of 18

Hi everyone,

I need also an 1D Fourier Transformation like Xsword, so I tried his out of place example to understand how I can adjust his VI for my application. But there is one problem. I don't get any output. I'm working with Labview 2015 on a PC with Win7. I also tested already if the CUDA Toolkit is installed correctly using example code provided by nVidia. With this examples everything worekd fine. At the moment I am not able to figure out why the VI isn't working properly. Maybe some one could help me with this issue? I attached a screenshot of my output with this Xswords example and a screenshot of the output I obtain, if I using the 2D FFT example provided by NI.

Maybe someone of you could help me with this issue?

Thanks in advance.


EDIT: I solved the problem. I used version 7.5 of the CUDA Toolkit, but the GPU Analysis Toolkit can not work with this, so I installed version 4.1 and now everything is working.

Picture FFT example NI.PNG

Block Diagram FFT example Xsword.PNG

Picture FFT example Xsword.PNG

Edit: After an aditional try (I didn't change anything) I got an error message:

Error code: -359631


call to cudaMalloc in  cudart32_75.dll.

<ERR>NVIDIA provides the following information on this error condition:


cudaErrorUnknown = 30


This indicates that an unknown internal error has occurred.

<b>library version supplying error info:</b>


The following are details specific to LabVIEW execution.

<b>library path:</b>

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin

<b>call chain:</b>

-> lvcuda.lvlib:CUDA SGL Device Ptr.lvclass:Allocate

-> r2c_out-o-f-place_fft_


Most NVIDIA functions execute asynchronously. This means the function that generated this error information may not be the function responsible for the error condition.

If the functions are from different NVIDIA libraries, the detailed information here is for a unrelated error potentially.

0 Kudos
Message 18 of 18