06-13-2009 09:11 PM
Hi all,
I want to deal with acquired data on the fly, and here is the question. I have a 1D array and I want to get the INDEXES of the biggest 40 data. I can sort the array to get the 40 biggest data, and then search the corresponding indexes of them. But the size of the array is very big and I don't think it's an optimized method. So, is there any better idea? Thank you.
Best wishes,
Bo
Solved! Go to Solution.
06-13-2009 10:03 PM
Take a look at the attached code....
06-13-2009 10:19 PM - edited 06-13-2009 10:23 PM
Here's a solution that gives the indices of the 40 largest elements.
It can still be improved a little bit, but is certainly faster than your search approach. How big is your array?
(the "solution" in the post above gives the values of the largest 39 elements (not indices!), something completely different)
06-13-2009 11:19 PM - last edited on 06-15-2009 11:35 AM by Support
altenbach wrote:It can still be improved a little bit, ....
OK, here's the improvements I made on a quick hunch.
For a 10M array of random data it is about 60x faster (100ms vs 7000ms) than my simple solution above. (it's about 30x faster for a 1M array).
Basically, here we only keep the top 40 elements in an array and keep the currently smallest element in a shift register. If the new element is larger than the currently smallest element, we replace the smallest element by the new element, sort the top 40 array, and tag the currently smallest element. If the new element is smaller, we don't do anything.
You have to be careful, though, because the algorithm is now data dependent and can be slow for certain pathalogical data.
For example if the input is a perfectly sorted array, the penalty is very big, while if we have a reverse sorted array the algorithm is extremely fast. For random input arrays, the chance to find a new top 40 element gets smaller very quicky and we win because we mostly execute the empty case.
Modify as needed. You need to test on your typical data to make sure it is suitable.
I am sure there are even more optimized possibilities, anyone want to try? 😄
06-14-2009 11:05 AM
Hi Jaime,
Thank you for your reply, but the result was the 40 biggest value but not indexes.
Regards,
Bo
06-14-2009 11:09 AM
Hi altenbach,
Thank you for your replies. And ancturally I got the your previous solution from someone else. But I'm sure your later one is much better. And I'll try that. Thank you for your help!!
Best wishes,
Bo
06-14-2009 11:13 AM - edited 06-14-2009 11:20 AM
If you are looking for trivially simple code, here's yet another alternative. For a 10M array it's about 30x slower than the optimized solution but about twice as fast as my first solution, so it is certainly not competitive. It is probably sufficient for small datasets.
It just shows that there are quite a few ways to do all this. 😉
(slow, but simple!)
Maybe we should do a challenge competition. Who can make the fastest solution? 😄
06-14-2009 11:36 AM
06-14-2009 11:56 AM
Coq rouge wrote:
I was just wondering. How will your largest40IndicesFaster.vi cope with several equal number in the array. Will it not fail in such cases. Your last suggestion will cope better with this situation I think.
Well, you can always try, but I don't think it will fail. The array is initialized with all -inf, and the smallest element will remain -inf until the top 40 array is filled with real data. For example if you make an array that is duplicate by appending the input array twice, all solutions will still give the same result.
The results differ slightly if the input has less than 40 elements, because some solutions will be padded to 40 elements with -1 values.
The results may also differ for bad data, e.g. if the input contains NaN.
06-14-2009 12:08 PM
Well perhaps I am wrong;)