GPU Computing

cancel
Showing results for 
Search instead for 
Did you mean: 

Inplace R2C FFT problem

Hi, MathGuy,

     I am now trying to do multiple 1D R2C inplace fft. My input data are from some images and I converted them from U16 into SGL to feed into Download Data.vi after I allocate the memory corresponding to the signal elements number and create the 1D CUFFT_R2C plan. When I call the FFT.vi subsequently, I selected Real:SGL->CSG manually but encountered the error report that the spectrum input could not be left unwired. However, in the help document of FFT.vi, it has been shown that an inplace fft will be executed if the spectrum input is unwired. Why is this problem? How could I realize the R2C inplace fft?

Untitled-1.jpg

Many thanks.

Xsword.

0 Kudos
Message 1 of 18
(20,364 Views)

You are using the wrong instance for the FFT function. Here's a picture of the options and I've circled the two which you might use for the inplace operation.

Inplace SGL-based FFT Options.png

The in-place operation can be defined based on the input type (SGL) or output type (CSG). You'll have to look at NVIDIA's CUFFT library documentation to determine which is more appropriate for your use of data and how you have to prepare/interpret real data stored in complex form or vice versa.

This matters in LabVIEW because G is a strongly typed language and an N-element CSG array is different than a 2*N element SGL array despite using the same amount of memory.

0 Kudos
Message 2 of 18
(13,715 Views)

I used your recommendation and checked CUFFT 5.0 documentation again. I found tow places are relevant to my problem:

1. The cufftExecR2C() has the following description:

"cufftExecR2C() (cufftExecD2Z()) executes a single-precision (double-precision) real-to-complex, implicitly forward, CUFFT transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the nonredundant Fourier coefficients in the odata array. Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms and cufftDoubleComplex data type in double-precision transforms. If idata and odata are the same, this method does an in-place transform."

2. data layout description:

"......Finally, R2C demands an input array of real values and returns an array of nonredundant complex elements. In real-to-complex and complex-to-real transforms the size of input data and the size of output data differ. ...... For in-place transforms the user can specify one of two supported data layouts: native or padded. The first is used for best performance and the latter for FFTW compatibility......."

I tried to understand the above descrptions and also refered to the "1D Real-to-Complex Transforms" code example in C. Below is my LV code for successive inplace r2c and c2c 1D FFT on a sine signal of 1024 elements:

Untitled-1.jpg

But the output of the inplace r2c fft seemed to have something wrong compared with that from inplace c2c fft as shown below:

Untitled-2.jpg

My expectation was the output of r2c fft should be exactly the left half of what c2c fft result gives, but it is not.I really can not identify the explanations of the difference. Is it related with what you mentioned "... to determine which is more appropriate for your use of data and how you have to prepare/interpret real data stored in complex form or vice versa"? I checked the cufft library several times and could not find any useful reasons.

Could you please point out my incorrectness?

Xsword

0 Kudos
Message 3 of 18
(13,715 Views)

Any advice?

0 Kudos
Message 4 of 18
(13,715 Views)

May I seek your help, Mathguy?

Xsword

0 Kudos
Message 5 of 18
(13,715 Views)

First, I would implement the R2C as an out-of-place operation to get valid results. The means that you allocate an SGL buffer for the input and a separate CSG buffer for the output. Once this is working, then you'll have your baseline for an expected real FFT result.

Your most recent in-place example creates DBL data and converts it to CSG which should interlace the data according to NVIDIA's documentation. This means that you are using CSG as the data type for both input & output. The FFT instance you should call is the CSG inplace version. It looks as if you are calling the SGL inplace version.

Why do I recommend doing an out-of-place version first? Because the library may only pass back half of the spectrum as an optimization. This will different from the purely complex version of the FFT. 

0 Kudos
Message 6 of 18
(13,715 Views)

Hi, MathGuy,

Here is my R2C out-of-place program:

ofpr2c01.jpg

I create a sin function with 2048 points as the input, and could get the right FFT output as:

ofpr2c02.jpg

I will paste my inplace R2C code next.

Xsword

0 Kudos
Message 7 of 18
(13,715 Views)

Dear MathGuy,

Please accept my late bless of happy new year!

With your recommendation, I made the above r2c inplace example doing fft on a sin function, which gave perferectly right result. Now I am trying to do the r2c inplace fft on the same signal, but when I select the fft instance, I am confused to select "real->SGL(inplace)" (which is the default if Automatic is ticked) or "real->SGL->CSG"(error indicated with spectrum input unwired). I attach the my VI for your reference. Please help to point out where I should set?

Thank you!

Xsword

Message was edited by: Xsword

0 Kudos
Message 8 of 18
(13,715 Views)

You'll have to use on of the options w/ '(inplace)' in the instance text. You pick SGL or CSG based on which element type is most efficient. Typically if you are doing lots of pre-processing, you use SGL and do a type cast to CSG when you get the results. If you are going to do alot of post-processing, choose CSG and type cast the data to CSG in the beginning.

I don't know what other processing you're doing so I can't answer which one is best for you. When you select the data type you want to use (SGL or CSG), the inplace instance that's correct will not cause a 'broken' arrow when wired.

What I don't see from your inplace examples up to this point is the type cast. When you do in-place processing you will be downloading and uploading data using the same data type (not a mixture of SGL or CSG). Before you download data or after you upload data, you'll need to use the Type Cast function to re-interpret the array data to be the appropriate type.

As I mentioned above, you also have to be aware of (a) whether the FFT returns the entire spectrum or only half of it and (b) whether the input data has to be packed or spaced for the real FFT.

0 Kudos
Message 9 of 18
(13,715 Views)

Thank you for your clarification on the FFT type instance selection.

Then I adopted the first way from your recommendation of using SGL and doing type cast to CSG after the fft. This gives me correct result of half of the spectrum (which I intend to obtain) as attached LV code shows.

For the sencond way of doing type cast to CSG first, I havn't figured out how to realize and also question the difference of this operation from C2C fft transform. Arn't they are the same essentially?

Could you further explain your warnning (b), what does it mean "packed or spaced" data for real FFT?

BTW: Since many input data are real values in practical applications, why not include some typical R2C(D2Z) examples in GPU analysis toolkit? After all, it will very helpful for strangers to understand and use the toolkit!

Xsword

0 Kudos
Message 10 of 18
(13,715 Views)