Finding the closest similar text

Contents

If four mistakes are made in the word “bread”,

then the word “beer” will turn out!

We have two lists. Both in that and in another approximately the same elements, but they can be written a little differently. The task is to select for each element in the first list the most similar element from the second list, i.e. implement the search for the nearest maximally similar text.

The big question, in this case, is what is considered the criterion of “similarity”. Just the number of matching characters? Or the number of consecutive matches? Should it be case sensitive or spaces? Different word positions in a phrase? There are many options and there is no unambiguous solution – for each situation, one or the other will be preferable to the rest.

If you follow the principle of Occam and do not complicate unnecessarily, then with the help of a small macro function in VBA you can implement the most obvious option – search by the maximum number of character matches. It’s not perfect, but works quite reliably for most situations:

To add such a user-defined function, we first enter the Visual Basic editor (tab Developer – Visual Basic or Alt + F11) and add a new module there through the menu Insert – Module. Then, in the resulting empty module, we need to copy the following code of our function:

Function FuzzyLookup(Lookup_Value As String, Tbl As Range) As String Dim cell As Range, txt As String, p As Integer, pos As Integer, maxp As Integer, maxstr As String For Each cell In Tbl ' loop through all the cells in the table txt = cell p = 0 For i = 1 To Len(Lookup_Value) 'Looking through the characters in the search text pos = InStr(1, txt, Mid(Lookup_Value, i, 1), vbTextCompare) 'Looking for an occurrence If pos > 0 Then p = p + 1 'increment the match counter txt = Left(txt, pos - 1) & Right(txt, Len(txt) - pos) 'remove the found character from the text End If Next i If p > maxp Then 'remember the best match maxp = p maxstr = cell End If Next cell FuzzyLookup = maxstr End Function  

You can now close the editor and return to Excel. In the function wizard on the tab Formulas – Insert Function (Formulas — Insert Function) in category Custom (User defined) our new feature is coming FuzzyLookup, which can be used with the following arguments:

=FuzzyLookup(text_we are looking for; query_range)

Those. in the example shown above, in cell B2, enter

=FuzzyLookup(A2;$D$2:$D$22)

and copy the formula to the entire column.

Please note that our function is not case sensitive and the position of individual letters in a word.

The scope of such a function can be the widest: from automatically correcting the crooked hands of an operator entering data to comparing lists with different spellings of the same information.

  • Checking text by mask (for pattern matching)
  • Search for Latin characters in text

 

Leave a Reply