12-22-2023 10:21 AM
Sometimes you have to be really careful with how you set up your benchmarking to get useful and valid results. Others around here have a lot more knowledge and insight than I, but at the very least, please post the code and an example data file that you used for your benchmarking. Code should be back-saved several versions so more people are able to try out the code -- let me suggest LV 2018 or thereabouts.
My earlier reply pointed out one flaw in your benchmark timing method and your recent results hint that it may not have been corrected. My earlier suggestion is *not* an ideal way to fix the benchmark timing though, just something quick to try with a chance of pointing you in a direction.
-Kevin P
12-22-2023 10:39 AM
@BabarAly1 wrote:
Thank you for your kind responses. It is really helpful in understanding the problem.
I have tried to benchmark timings separately for all three loops again but this time, I just started with one DDC.
Experiment#1
In first while loop, I read 4160 I-16 samples and feed into the Enqueue Function.
In second while loop, I am dequeuing the 4160 samples, discard half of the samples, decimate by 2, form complex array of 1040 samples from I & Q and feed to the SubVI (Digital Down-conversion(DDC)), feed output of DDC to downsampler (downsample factor of 1040), output is only one complex sample.
In this case, time taken by first while loop is 6833 ms and second while loop is 6934 ms for complete file of 16 GBs.
Experiment#2
Now, I have added the third while loop. The output of downsampler from second while loop goes into the Enqueue Function, which is dequeued in third while loop. I perform no other function in this loop.
Now, the results are surprising for me as the first loop takes 19393 ms, whereas second while loop 34176 ms and third while loop takes 34178 ms.
I am unable to understand this strange behavior.
Specs:
I have set max Queues size of both the queue to -1. SubVIs are reentrant. I have disabled Debugging in the main VI for timing benchmarking.
Additionally, I have tried to use DVRs as well.
Why not decimate in the first while loop as soon as you read the file, you need to help the computer out as much as possible which you can do by reducing the amount of memory it has to use as soon as possible.
Also you want to pre-allocate memory for all operations, have a block for the read and a block for the decimated data. I forget if you can do pipes in LabVIEW, I think the DVR is a wrapper on a pipe but I would have to go back and read the docs.
Again, reading the file into chunks of memory and processing those chunks in parallel instead of reading/processing all in one go will help speed things up. Have you ever written any C code? There are things you learn writing C code that help you understand how to make code that runs fast on a computer. LabVIEW abstracts many of these things to make code that runs generally fast but if you want the rocket powered drag racer of code execution, you will need to dig under the hood of LabVIEW a bit to expose its inner workings and how to manipulate them to make it go real fast.
Break down your VIs like this:
VI1: [read initial file into chunks of memory -> decimate data into other chunks of memory ->pass data VI2 process to handle decimated memory chunk]
VI2: [do all data processing to the point of having the final output in memory (don't save to file)]
VI3: [ join all processes and concatenate the output of the process into the final file]
The VI1 and VI3 will only have a single instance and the VI2 will have many instances ( they are the helper processes or process pool ). Launch all the VI2s before running VI1 so they all spin up before you start reading the file.