LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
altenbach

"Search Array": Add "array is sorted" boolean input

Status: New

Searching an array for a certain element can be expensive for large arrays. The speed could be dramatically increased if we can assume that the input array is sorted. The speed difference can be several orders of magnitude.

 

There is an old example that discusses this in more detail. I also wrote a similar tool long ago when I needed to recursively score positions for the tic-tac-toe solver, using a retrograde analysis similar to what's used to generate endgame tablebases for chess. (It literally took hours with the plain old "search array"!).

 

(A similar tool is the "search ordered table.vi", which only works for DBL and returns a fractional index.)

 

Suggestion:

 

"Search array" should have a boolean input (default=FALSE) where we can tell the algorithm that the array is sorted.

 

(The output would be exactly as before, with -1 indicating "not found".). Setting this input to TRUE would switch to a binary search where in the worst case only ln2(N) instead of N comparisons need to be made. (e.g. 20 instead of 1048576 for a 50000x speedup!!!)

 

It could look as follows.

 

Of course it should continue to accept all possible datatypes (including clusters) and not be limited to simple datatypes as the polymorphic example quoted above.

.
Message Edited by altenbach on 08-31-2009 03:30 PM
10 Comments
Knight of NI

How do you envision this working for arrays of clusters, or would that input simply be ignored in this case?

 

Aside: I took a look at the old example you had mentioned and found an interesting result. Running it in 8.2  (after having mass-compiled), the linear search took far less time than the binary search. It was anywhere between 5x to 10x faster. I did not look in detail at the actual implementation to see what caused this, but found the result worth further investigation. 

altenbach
Knight of NI

Interesting. I had the binary search much slower than the linear search only until did a "save all" in LV2009.  Now the search with "notaword" default shows linear=30355.87µs, binary=229.58µs, for a 130x speedup. 🙂 Maybe something went wrong with your masscompile?

 

It is well known that freshly converted, but not yet saved VIs, are often fatter and slower. Could you try again after a "save all"? Maybe the masscompile did not stick for some reason.

 

For clusters it should also work. Currently, arrays of clusters are sorted using the cluster order as priority. I assume that if we sort an array of clusters first, the binary search algorithm should work just fine.

 

Still, the binary search needs some tweaks for a final release, because a sorted array may contain duplicate elements, where the current examples fails. for example seraching for 3 in [1,3,3,3,3,5] would not return the second element. This can (and should) easily be handled with a little tweak in the code: We simply treat a match as a potential "fail high" until we get the matching element with the smallest index. This probably means we always need log2(N) comparisons, same as for a "not found" and still much faster than a linear search. I have not written any code, but it does not sound complicated... 😉

This also means we should retain the "start index" input. It would keep the same functionality.

 

Of course the openg variation for multiple matches can be simplified for sorted inputs, because duplicate elements are necessarily adjacent. 🙂

AristosQueue (NI)
NI Employee (retired)
There's another suggestion on the Idea Exchange to make Sort 1D Array take a VI Reference to provide the comparison operator. If that idea is acted upon then obviously the comparison would need to be supplied to this node as well, if the boolean were set to True.
Knight of NI

altenbach wrote:

It is well known that freshly converted, but not yet saved VIs, are often fatter and slower. Could you try again after a "save all"? Maybe the masscompile did not stick for some reason.


Actually, I had done a "save all" and had incorrectly stated in my response that I had done a mass-compile. The results I mentioned were after having saved the VIs. I just opened the VI again now and ran it, and this time I got far lower numbers with the binary search (consistent with what you were saying). Quite bizarre.

AristosQueue (NI)
NI Employee (retired)
After loading a new hierarchy, all the Front Panels are in memory until you save. When a subVI's front panel is in memory, LV takes time to update the FP controls display values as the subVIs execute so that the values are up-to-date if the panel is actually displayed. That takes time, time that evaporates when you save the VIs and the FPs leave memory. Normally an FP is only in memory if there's a control reference, local variable or an explicit call to load the FP into memory.
Lavezza
Active Participant

Don't forget about usint Variant Attributes. It is much faster than linear and does better than the binary search.

Variant Search.PNG

altenbach
Knight of NI

> Don't forget about usint Variant Attributes. It is much faster than linear and does better than the binary search

 

Yes, but only if you don't include the first loop in the timing measurement. It is not a fair comparison.

 

This method is probably only useful if the list is static.

Lavezza
Active Participant

Point taken, but I find a lot of my arrays are static (read from a large config file at startup, etc.)

 

We had a lot of code where two arrays were passed around, the 'name' array and the 'value' array. The name array was static, the value array was changing. Searching for a name in the name array would return the index that we could use to pull data from the value array. In this case, it was much faster to have a single variant with each name-value pair as an attribute. And if we added a new name at run-time, I'm not sure we'd be any slower than if we were using a binary search. For a binary search we'd have to add the name to the name array, sort the new name array, search the new name array for the name we just added and then use that index to insert a new value into the value array.

 

However, I should have been clearer in my original post. I just wanted everyone to remember about the variant option, I'm not suggesting that it is always the best choice. I'd be more than happy to have a 'sorted array' option for the search array and I'm giving you kudos for it.

altenbach
Knight of NI

> Quite bizarre

 

Could it be you had the FP of the subVI open?

 

Anyway, here's a quick histogram for the search time of 40000 randomly picked words from the list. Note that the y axis is log scale and the x axis is log10(time). I did not modify the code and the somewhat funky timers, still, as you can see, binary search is better by about 3+ orders of magnitude for a ~210000 sized string array.

 

Message Edited by altenbach on 09-01-2009 03:17 PM
Knight of NI

Aristos wrote:
After loading a new hierarchy, all the Front Panels are in memory until you save. When a subVI's front panel is in memory, LV takes time to update the FP controls display values as the subVIs execute so that the values are up-to-date if the panel is actually displayed. That takes time, time that evaporates when you save the VIs and the FPs leave memory. Normally an FP is only in memory if there's a control reference, local variable or an explicit call to load the FP into memory.

But I did do a Save All before running.

 


altenbach wrote:
Could it be you had the FP of the subVI open?

Nope.

 

Like I said, it was bizarre, and I can't reproduce it, and I'm not even worrying about it since I believe that a binary search is faster than a linear search. Smiley Wink

 

 

Message Edited by smercurio_fc on 09-01-2009 05:21 PM