I'm searching, but can't find any documentation on what underlying algorithm is used for the Sort 1D Array primitive in LabVIEW.
I'm trying to set up some benchmarking to determine if it is efficient enough, or if there is a better method for what I'm trying to do.
My situation is I have two 1-D arrays that I want to "interleave" with each other where each of those arrays is already sorted. So array A is 1 2 5 8 9 10. Array B is 3 6 7 11 12. I want 1 2 3 4 5, ...10 11 12 merged/sorted array.
The easy way was to concatenate the two arrays and do a Sort 1-D array. But that probably is not as efficient. I'm thinking I should just create a 1-D array for size of A + Size of B. Use replace array subset keeping track of position in each array, and just grab which is the smaller of the two values "next up" in each array.
If using a GPU is an option in your application, I would give a try to CUDA. I used it in the past, and for parallelizable problems you might get very decent speed up. I did not use it (at least explicitely) for sorting, but FFT operations were impressive on large data compared to CPU functions...
I have found some interesting readings for GPU approaches, see below, but also mentioning algorithms using multicores of the CPU. It is a good question whether in LabVIEW, a sort function uses a single core, or is there something parallel approach too?
EDIT: I have found an older discussion: http://188.8.131.52/t5/LabVIEW/Which-algorithm-does-Sort-1D-Array-use/td-p/2913744
Since your two arrays can not be simply interleaved and a decision will be requied for each new value added to the array, I suspect that you will get the best results with an array that is pre-allocated to accept both arrays and then use "replace Array subset" to put the two arrays into that common "sort arrray" then let the sort- 1D array do its thing.
That assumption is based on the idea that you can eliminate the memory allocation time byallocating it ahead of time and that the Sort 1D array can run faster than explicit checking in LV.
But while I am typing ... the compare from the comparison pallete in a inplace strcuture will still requiring deciding which of the two indexes (from the A and B arrays) will need to be incremented.
Still betting on the Sort 1D using a preallocated array.
You will of course share your findings?
Thanks Blokk. The one you have, Jeff suggested that the underlying algorithm was quicksort. (One I'll have to study more to grasp.) But there wasn't any confirmation of that. For my use case, coding in a GPU is not an option.
Thanks Ben, I'll try your idea as well. I just got done coding up where I initialize a 1-D array I'll call C with the size of A + size of B. Set two counters at zero to be tracked in a shift register. Check the elements A(i) and B(j). Replace C(i+j) with the smaller of the two and either increment the i or j counter.
Now I just need to create a couple typical data sets and set up the benchmarking VI. I'll let you know the findings.
This is probably slower but you could try and sort both arrays and then mergesort them together.
Both arrays are sorted to begin with. The merge sort is effectively what I'm going to try to use instead of 1-D array sort primitive. Bascially the last frame of animation shown in that wikipedia article.
The built-in function uses a randomized quicksort algorithm which is quite reasonable as a general-purpose sorting algorithm. What you are describing is the last step of a merge sort. Whether it is faster depends on the vagaries of G versus primitives so you just have to test it.
Your example does show integers (unsigned) which is interesting because I have routines that sort arrays of 1M unsigned integers between 4 and 16 times faster than the built-in function (U32-U8). Would be much better if not for the G overhead of my version.
What I'm actually sorting are some clusters where the 1st element I'm sorting by is actually a U64. I'm trying to take data generated by two different areas of my application where they are timestamped (the U64), have a flag indicating what the data represents, and the payload of data. I'm want to merge the two sources into a sorted order so I can feed them through the single DMA channel I have available going to an FPGA. On the other side, the FPGA unpacks the timestamp, waits for the timestamp to occur on my FPGA clock, then decides which analog output to write the payload data to based on the flag indicator.
I'm basically buffering up about 4-5 seconds of data I want to write out onto two channels at a rate of about 4 kHz. (So about 8 kHz worth of data pumping through.) Something is falling behind on my host side, so I'm trying to benchmark what is causing the problem. My current merge and sort of the two streams seems to be taking longer than it should, so I'm looking at how long it is actually taking and trying to speed it up. One theory is that I'm really merging into order more so than sorting, so I my merge/sort algorithm could be more efficient than it is now.
Personally I'd go with a For loop rather than a pre-allocated array. Here's one such version, not fully tested for corner cases like one array being empty.
Your code resembles what I've drawn by quite a bit.
Here is what I've done. I've just created my datasets to test it out.
I do see something in yours I need to pay attention to and that is if it reaches the end of one, that it always draws the remaining elements from the other. I think that is what you effectively doing with your index comparisons. I don't think I accounted for that yet. I did account for the situation where one comes through empty.
In this case, I went with the preallocate and replace. I could have possibly gone with just auto-index on the output. Since the array size is known before it starts, there shouldn't be any array resizing, relocation to slow things downs either way.