LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Searching Array of Strings for Element Quick sanity check

So I was whipping up a demo for our group to optomize searching an array for elements with strings.  I eventually got to this benchmark vi.

String Search NameIndex.png

 

I got unexpected results that are not explained by the help file. 

Match First String Function

Owning Palette: Additional String VIs and Functions

Requires: Base Package

Compares each string in string array with the beginning of string until it encounters a match.

 

This led me to believe that for strings that follow the naming convention Parameter_Index Reversing the strings would greatly improve search times using the Match First String function.  It did not!  In fact just the opposite occured. so:

Does this function work by searching from the end?

Can the documentation here and in Search 1D Array be improved to explain the benefit of using Match First?


"Should be" isn't "Is" -Jay
0 Kudos
Message 1 of 19
(3,352 Views)

@Jeff Bohrer wrote:

[..]This led me to believe that for strings that follow the naming convention Parameter_Index Reversing the strings would greatly improve search times [...]


Why should it? You still have to compare all of the string since it is possible that not only the index is changing for each element within the array.

 

On the other hand, i concur that Match First String should result in the same timings regardless if the strings are reversed or not.

I will try to dig a little more, but currenlty i dont see any "explainable" reason, why reversing each string before Match First String should result in a slowdown of about 20 times....

 

Norbert 

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
0 Kudos
Message 2 of 19
(3,319 Views)

Can you attach the actual VI, since the snippet got mutiliated with property nodes?

0 Kudos
Message 3 of 19
(3,314 Views)

SInce there is no buffer allocation at the "reverse string", I wonder if the strings remain untouched and are only flagged to be indexed from the end instead. (similar to what the compiler does for reverse array or transpose array). This will of course mess with the algorithm.

 

You probably have two faster choices:

 

  • Sort the "array to search" and then use a binary search (worst case: logN comparisons instead of N)
  • Use variant attributes
Message 4 of 19
(3,307 Views)

@Norbert_B wrote:

@Jeff Bohrer wrote:

[..]This led me to believe that for strings that follow the naming convention Parameter_Index Reversing the strings would greatly improve search times [...]


Why should it? You still have to compare all of the string since it is possible that not only the index is changing for each element within the array.

 

On the other hand, i concur that Match First String should result in the same timings regardless if the strings are reversed or not.

I will try to dig a little more, but currenlty i dont see any "explainable" reason, why reversing each string before Match First String should result in a slowdown of about 20 times....

 

Norbert 


Comparing the element to find with the matching element will take the same time in any case.  I believe the speed-up is in discarding the non-matches.  Quick test case
with the benchmark provided.  Size=300, Len =30, # of searches=300 (Find all in random order)

@Search array  = @2.4mSec

@Match First = @ 65uSec 

@Reverse and Match First = @ 320uSec

The difference between Reverse Match First and Match First is directly proportional to len (The number of common charaters at the front)

 

From this I suspect:

  •  Search Array searches every element regardless of if a match has already been found and Match First Returns as soon as it encounters a match.  This is hinted at in the documentation.
  • Match First Actually compares character by charater and aborts the compare on the first charater that does not match. This would somwhat explain the use of the phrase "..With the begining of String..." otherwise "Begining" makes no sense in the contex.
  • Some bright developer at the Nidiface understood the common "Param_Index" naming convention (or unintentionally wrote the nicest bug I've seen) and did the compare in reverse order.

Of course this is all a theory to explain observation.  Until it can be documented, I have only uncovered a means to improve string array searches that may change in future versions of LabVIEWSmiley Wink


"Should be" isn't "Is" -Jay
0 Kudos
Message 5 of 19
(3,304 Views)

Jeff,

 

my numbers differ a bit, but show a comparable trend.

 

Search 1D Array has to be slower in performance, because it can handle different types of data. Hence, it should use a "brute force" bit-wise compare (excpetion: Boolean which would require some more logic).

Match First String works only with strings, so searching the strings can be improved by using a better algorithm like KPM. Even though the whole string has to match, it is quite obvious that performance is different....

 

Norbert

 

PS: I like Altenbach's explanation with the reverse index. It can indeed explain the slowdown.

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
0 Kudos
Message 6 of 19
(3,295 Views)

@altenbach wrote:

SInce there is no buffer allocation at the "reverse string", I wonder if the strings remain untouched and are only flagged to be indexed from the end instead. (similar to what the compiler does for reverse array or transpose array). This will of course mess with the algorithm.

 

You probably have two faster choices:

 

  • Sort the "array to search" and then use a binary search (worst case: logN comparisons instead of N)
  • Use variant attributes

Sorting both arrays and tossing out the all of the array before each first match would speed it up a bit tooSmiley Very Happy but thats outside the scope of this investigationSmiley Wink


"Should be" isn't "Is" -Jay
0 Kudos
Message 7 of 19
(3,292 Views)

Did you actually look at the results?

 

Find first match without "reversing" bails after the first numeric and your found indices are only single digits! It is faster because it does not do enough! It always find a match in the first few elements.

 

Not sure if this is a bug or expected behavior.

Message 8 of 19
(3,286 Views)

Ahahahaha, Altenbach, you really are a boy scout, aren't you?

 

And to add: It is obviously expected behavior (questionable: desired?)

If the "string" (so what you are comparing in your array) is longer than the string in the array index, the comparison is simply done to its end.

EDIT: What i'm trying to say: The shorter string limits the comparison, regardless if it's string or string array index. 

So since you have "counter" in the end of the string without prefexing '0's, it is somehow expected:

 

XXX_1 (string array index) 

XXX_123456 (string, what you are searching for)

 

=> This is a match.

 

Since it does not search further, the algorithm will stop and pass a true and the index of the string array containing this "matching" string....

 

Norbert 

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.
0 Kudos
Message 9 of 19
(3,280 Views)

And now let's compare the variant attribute method.

 

For a 500k array, I get the following times:

  • Search 120ms
  • Find first: fail!
  • Reverse first: 100ms
  • Variants: 170us (>500x faster)

 That should give you something to talk about at your group demo! 😄

 

(sorry for the ugly code, you started it!)

 

 

Message 10 of 19
(3,266 Views)