Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

jiangliang · ‎12-17-2017

Dear All:

I need to run this function for a huge number of times, so improve the performance of this function is important for me, does anyone can tell me is there something I can do to improve performance of such function?

Thanks!

altenbach · ‎12-17-2017

Can you attach a small VI containing typical data? How does the data look like? Is zero filling appropriate?

The FFT itself could be parallelized using the equivalent function from the MASM toolkit.

Your last function could be replaced by the absolute value function which also gives you r. Why calculate theta if you don't need it? What are your speed requirements?

I have some other ideas, but please post some code&data first.

LabVIEW Champion.

jiangliang · ‎12-17-2017

Thank you for your reply!

The initial motivation for me to post this thread is I want to know if it is possible to calc partial data since I only need a part of the 2DFFT output, if it is possible, it would make my program running much more faster.

I'm using this SubVI as a part of objective function in global optimization program, it will run for hours even days, so any improvement on this objective function would be nice.

I did try MASM toolkit after I post the thread, it does give me some boost at about 20%. (on a 2C4T computer)

I also tried use Abs instead of Complex to r and theta, it doesn't seems give me much performance boost.(let's say less then 3%)

Also, I noticed if I disable "shift?", the whole subVI is running much faster, make this code running 50% faster! I think I can do the shift at later stage.

And I did make DBL to SGL, which also helps, now I can do a 8192*8192 FFT in about 400ms.

Any more suggestion is welcome~

altenbach · ‎12-18-2017

jiangliang wrote:
Also, I noticed if I disable "shift?", the whole subVI is running much faster, make this code running 50% faster! I think I can do the shift at later stage.

Yes, while abs is faster that "complex to r/theta", is is only a small part of the total calculation here and will not make a large difference.

Instead of using the "shift" input, you could probably shift the data once so the objective function does not need to do be shifted at all. Zero filling does not add any new information.

Yes, you could probably reduce it to some partial problem, but it might not be faster because FFT is so optimized

Maybe you could even transform the data so you can fit for the function before the fft? Can you tell us a little bit more about the data and what it represents?

Make sure to disable debugging and also keep the front panel of the model VI closed during run. Also make sure to avoid any code that forces the front panel to be in memory.

It would be nice to have some typical synthetic data to play with.

LabVIEW Champion.

Blokk · ‎12-18-2017

You could also get a decent NVIDIA video card, and use the GPU Analysis toolkit. http://sine.ni.com/nips/cds/view/p/lang/en/nid/210829

(I would recommend to install LabVIEW x64 bit version for this toolkit, you get better compatibility with the CUDA compiler, and you can also handle much higher RAMs from LV)

jiangliang · ‎12-18-2017

Thank you for your advice, I do get a gtx 1060 and want to give it a try, but I also learned that when we use GPU to do the math, usually we take most of the time on transfer data back and forth, Iam not sure how much we can gain on this.

Unless I can do the padding and trimming part in GPU, is it possible? IBM not really familiar with this toolkit.

In that case I believe we can have a much much faster result.

Also, I think we might lost the performance boost brings by using 0x40 when we call the objective function, overall I'm not sure if it really worth. Anyway, I'll give it a try tomorrow.

jiangliang · ‎12-18-2017

The raw data is just tens of points in a 2d array, we are trying to figure out how can we get the best arrange for a solid lidar prototype.

The FFT act like thin lens, when we using lens, we need to make sure the diameter of the lens is large enough to give full resolution, in our case the size of FFT is somewhat like the diameter of lens.

Blokk · ‎12-18-2017

@jiangliang wrote:

Thank you for your advice, I do get a gtx 1060 and want to give it a try, but I also learned that when we use GPU to do the math, usually we take most of the time on transfer data back and forth, Iam not sure how much we can gain on this.

Unless I can do the padding and trimming part in GPU, is it possible? IBM not really familiar with this toolkit.

In that case I believe we can have a much much faster result.

Also, I think we might lost the performance boost brings by using 0x40 when we call the objective function, overall I'm not sure if it really worth. Anyway, I'll give it a try tomorrow.

Yes, the trick is that you need to do as much as possible on the GPU, and minimize CPU-GPU(-CPU) data transfers.

jiangliang · ‎12-18-2017

So from your experience, is it possible to do all the job in GPU? Or do we have to transfer massive amount of data to and fro the GPU?

Actually there is only part of the objective function, I have to map some coordinators to this 512*512 array (set to 1 based on these coordinators, and leave rest of the array to 0).

After the FFT, I have to measure the peak value of side slopes, and use them as the objective function outputs.

It seems to me the input and output of objective function is not of much data, I think it should make a perfect case to use GPU, as long as GPU is capable of doing these job.

I also have a flexrio 7931, will it be a good idea to make objective function running at the FPGA?

Blokk · ‎12-18-2017

@jiangliang wrote:

So from your experience, is it possible to do all the job in GPU? Or do we have to transfer massive amount of data to and fro the GPU?

Actually there is only part of the objective function, I have to map some coordinators to this 512*512 array (set to 1 based on these coordinators, and leave rest of the array to 0).

After the FFT, I have to measure the peak value of side slopes, and use them as the objective function outputs.

It seems to me the input and output of objective function is not of much data, I think it should make a perfect case to use GPU, as long as GPU is capable of doing these job.

I also have a flexrio 7931, will it be a good idea to make objective function running at the FPGA?

You can have any sort of calculations on the GPU. You can even use different memory types on the GPU, start to read the CUDA literature and the GPU computing forum board: https://forums.ni.com/t5/GPU-Computing/gp-p/5053

I am not familiar with FPGA, but as much as I know, they are also good candidates for FFT calcs...

LabVIEW

Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)

Re: Is there a way to improve this code? (Large 2dFFT Window with Small Output Window)