05-28-2013 06:39 PM
Mathguy,
I noticed that the cufft vi used in example runs in C2C mode. However actually the data going in is converted from real data to csg data. Any reason we do it like this? Does this mean that the C2C mode works equally fast as the R2C mode? If so, I'll be using C2C mode too, since I did not figure out how to make R2C in place mode work properly. It keeps giving results with weird length.
BTW, another irrelevent question. I managed to make cusparse work, but there's a problem with the cusparseSscrmm_v2 function. I was trying to perform a multiply between n*n matrix and a n*m tall matrix, but I have to set ldc parameter (leading demension of output matrix C) to n+1 to make it work, or it gives an error 3 (unacceptable ldc value or something). Now the problem is, the output matrix have a column of zeros at the end as a result. This becomes a problem since I'm trying to do some more calculations on the card. Is there any fast method to delete this column without uploading the data from the graphic memory?
Thanks!
Zhihao
05-29-2013 11:21 AM
I converted from real to complex to simplify the example. I would not expect the complex fft to be faster than the real fft unless (a) the non-inplace version was in use and (b) the total size of the input signal and output spectrum consumed most of the gpu memory.
As you mentioned, doing the operation in-place is tricky. I have successfully done it for each interface but I had to carefully read the cufft documentation. You don't have to use the in-place version if you're willing to consume more storage on teh GPU. Just use the default FFT instance that's selected when you wire in a real gpu buffer.
As for the CUSPARSE, I don't have much experience with this library so I can't offer much help. Searching the developer forums on NVIDIA's site might help. If you don't find something, you may find posting it to that group will get some attention. I know that I found some issues in CUBLAS regarding some leading dimension inputs but these were fixed in CUDA v4.1.
If you were dealing w/ an additional column in dense matrices, removing the column is as simple as reducing the column dimension by one because matrices are stored column-wise. I do not know how the matrix data is stored in CUSPARSE. If they are really zeros, it seems as if they would not be physically stored (due to the sparse type) and perhaps manipulating the column dimension might work too.
When I find issues like this with a 3rd party library, I try to test different CUDA versions as well as different OS versions (e.g. Windows 32-bit vs 64-bit). I have found that bugs don't always show up in all cases.
Darren
05-29-2013 01:19 PM
Darren,
The informations is helpful indeed. Some further questions, since I'm doing fft to a single(float) matrix in the graphic memory, I'm trying to use the R2C non-in-place fft. However the output of the functions is always unexpected: the first n/2 data points (n is the number of data points in one row) seems to be correct, but the following data seems to be the begining half of another transformed row, as shown as the following figure:
In the picture n=2049. Does this mean that the non-in-place R2C transform accepts n points and only gives an output of n/2? I know the second half is redundant but I've never seen any fft function throw it away.
About the additional column, I tried to read the data with a correctly sized matrix, but the output placed the zero in the first line to the first point in the second line, and shift every following line accordingly. Is there a function in the CUDA library that can manipulate the size of a matrix? I dug into CUBLAS library but did not find any.
I wrote a post on CUDA forum also, but no one seems to know about it. Change to x64 lib might be a good idea, maybe I can try later installing a x64 Labview on another machine.
Thank you!
Zhihao
11-24-2013 08:26 AM
Dear MathGuy and Zhihao,
How if I want to do a multiple-R2C fft? Could you please give some example LV code snapshot to explain? I tried to realize this but do not know how to start with.
Thanks.
Xsword.
11-25-2013 09:49 AM
The pattern is the same for all FFTs. What is different is how to deal w/ the 'packed' data when doing an inplace operation for R2C or C2R. Details can be found in NVIDIA's CUFFT library reference online.