LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Search strings with misspellings

I have a 1D array of strings that I wrote a simple VI to search through. Of course with straight-forward methods, I have to spell the search string exactly the same way as the element I'd like it to return. I was contemplating ways around this and realized people must have needed to do this before. Has anyone out there come up with an efficient way of searching an array of strings that accounts for user errors such as spelling mistakes and just returns the closest match?

0 Kudos
Message 1 of 11
(3,696 Views)

How do you define closest match? Least amount of matching characters? Closest on a QWERTY keyboard? At what point do you say the word is not in your list of words, rather than just spelled wrong?

 

Why is the user typing in a word that may or may not be in your list? You could put all the strings into a combo box and let the user select the exact string available. 

Message 2 of 11
(3,676 Views)

if you'd like to read up on the topic

https://en.wikipedia.org/wiki/Approximate_string_matching

 

but i would go with gregryj and define the selection beforehand.

don't trust user input 😉 they are the worst


If Tetris has taught me anything, it's errors pile up and accomplishments disappear.
Message 3 of 11
(3,616 Views)

Back in the mid seventies in a pascal course (?) we did something similar to the following.

 

  • Create groups of similarly sounding characters (A: soft consonants, B: sharp consonants, C: high pitch vowels, 😧 low pitch vowels, etc.)
  • For each word,create a transformed word where each character is replaced with the group name (A, B, C ...).
  • Delete all consecutive duplicates (e.g. ABBCCD > ABCD).
  • Compare the results for the transformed test word with all transformed words form the dictionary
  • Similar sounding words will have an equal transform. 

I don't remember the details. Probably would need some research.... 😄

Message 4 of 11
(3,602 Views)

So I have some pretty terrible code I don't mind sharing that I wrong a while ago with zero research.  It is a function that takes two strings and tells you how similar they are.  I used it for finding an artist that matched a google search closely and then if the score was above some value it would just use the new artist name otherwise it would prompt if the new name was the right one or the old one.

 

Written in the 8.x era and appears to care about capitalization which it probably shouldn't.  Also there isn't much documentation so good luck.

0 Kudos
Message 5 of 11
(3,594 Views)

I’ve not used it, but you look into the Spelfix1 extension to SQLite.

0 Kudos
Message 6 of 11
(3,578 Views)

@Hooovahh wrote:

So I have some pretty terrible code ...


Just for kicks, here is a quick and rough draft of a near literal translation that uses more modern tools (lexical class, swap values, conditional tunnels, etc) and is mostly blue....

 

Seems to give the same results if you convert all to lowercase in your inputs. Not sure how sound the algorithm is 😄

 

StringConfidence.png

 

 

Message 7 of 11
(3,569 Views)

Thank you and kudo'd.  Yeah this technique mostly is about a letter, following an expected letter.  So if my name is Brian, and someone spells it Brain, it will find the 'r' following 'b', and that's it, 'a' shouldn't come after 'r', 'i' shouldn't come after 'a', and 'n' shouldn't come after 'i'.  This would result in a pretty poor score of 0.3 and all that you have is one letter swapped.  Now that I look at it this is a pretty bad way of doing things and someone should come up with a better technique.  Maybe add the number of each letter to be seen?  Or maybe look at the letters that are wrong and see if they are close to the correct key on the keyboard?  Then you might also need a list of commonly misspelled words.

 

Now that I'm talking this through this feels more like a cryptography problem with brute force and rainbow tables.

0 Kudos
Message 8 of 11
(3,556 Views)

Oh wow lots of things on the internet.  One I found on LAVA is an attempt at implementing the Levenshtein Distance.  The lower the number the more similar two strings are.

 

https://lavag.org/files/file/64-strings-levenshtein-distance/

 

Here's some other stuff in one dimensional programming:

http://php.net/manual/en/function.similar-text.php

https://stackoverflow.com/questions/26446348/checking-if-one-string-is-similar-to-another

https://stackoverflow.com/questions/17388213/find-the-similarity-percent-between-two-strings

0 Kudos
Message 9 of 11
(3,550 Views)

You might also obtain some insights Googling "spell checking algorithm".  I would think that the algorithm for listing suggestions is the same kind of thing.

Bill
CLD
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.
0 Kudos
Message 10 of 11
(3,535 Views)