Text cleanup

Often the text that we get to work in the cells of a Microsoft Excel sheet is far from perfect. If it was entered by other users (or unloaded from some corporate database or ERP system) not quite correctly, then it can easily contain:

  • extra spaces before, after or between words (for beauty!)
  • unnecessary characters (“g.” before the name of the city)
  • invisible non-printable characters (non-breaking space left after copying from Word or “curve” unloading from 1C, line breaks, tabs)
  • apostrophes (text prefix – a special character that sets the text format for the cell)

Let’s look at ways to get rid of such “garbage”.

Replacement

“Old but not obsolete” trick. Select the range of cells to be cleared and use the Replace tool from the D tabhome – Find and select (Home – Find & Select – Replace) or press the keyboard shortcut Ctrl + H.

Initially, this window was conceived for the wholesale replacement of one text with another according to the principle “find Masha – replace with Petya”, but we, in this case, can also use it to remove excess text. For example, in the first line, enter “g.” (without quotes!), and in the second do not enter anything and press the button Replace All. Excel will remove all “g.” before city names:

Text cleanup

Just do not forget to select the desired range of cells first, otherwise the replacement will occur on the entire sheet!

Removing spaces

If you need to remove all spaces from the text at all (for example, they stand as thousandth separators inside large numbers), then you can use the same replacement: press Ctrl + H, enter a space in the first line, enter nothing in the second line and press the button Replace All.

However, a situation often arises when it is not necessary to remove all the spaces in a row, but only the extra ones – otherwise all the words will stick together with each other. Excel has a special function for this – TRIM from category Text. It removes all spaces from the text, except for single spaces between words, i.e. we get exactly what we need as an output:

Text cleanup

Removing non-printing characters

In some cases, however, the function TRIM (TRIM) may not help. Sometimes what looks like a space is not actually a space, but an invisible special character (non-breaking space, line break, tab, etc.). For such characters, the internal character code is different from the space code (32), so the function TRIM can’t clean them up.

There are two possible solutions:

  • Carefully select these special characters in the text with the mouse, copy them (Ctrl + C) and insert (Ctrl + V) to the first line in the replace window (Ctrl + H). Then press the button Replace all (Replace All) for removing.
  • Use function PRINT (CLEAN). This function works similarly to the function TRIM, but removes not spaces from the text, but nonprinting characters. Unfortunately, it is also not able to cope with all special characters, but most of them can be removed with its help.

SUBSTITUTE function

Replacing some characters with others can also be implemented using formulas. For this category Text Excel has a function SUBSTITUTE (SUBSTITUTE). It has three required arguments:

  • The text in which we make the replacement
  • Old text – the one that is being replaced
  • The new text is the one we are replacing with

With its help, you can easily get rid of errors (replacing “a” with “o”), extra spaces (replacing them with an empty string “”), remove extra separators from numbers (do not forget to multiply the result by 1 later so that the text becomes a number) :

Text cleanup

Removing apostrophes at the beginning of cells

The apostrophe (‘) at the beginning of a cell in a Microsoft Excel worksheet is a special character, officially called textual prefix. It is needed in order to make it clear to Excel that all subsequent contents of the cell should be treated as text, and not as a number. In fact, it serves as a convenient alternative to presetting the text format for a cell (Home – Number – Text) and for entering long sequences of numbers (bank account numbers, credit cards, inventory numbers, etc.), it is simply indispensable. But sometimes it ends up in cells against our will (after unloading from corporate databases, for example) and begins to interfere with calculations. To remove it, you will have to use a small macro. Open the Visual Basic Editor with a keyboard shortcut Alt + F11, insert a new module (menu Insert – Module) and enter its text there:

	   Sub Apostrophe_Remove()      For Each cell In Selection         If Not cell.HasFormula Then            v = cell.Value            cell.Clear            cell.Formula = v         End If       Next   End Sub  

Now, if we select a range on the sheet and run our macro (Alt + F8 or tab developer – button Macros), then the apostrophes before the contents of the selected cells will disappear.

English letters instead of

This is, as a rule, a purely human factor. When entering text data into a cell, instead of letters, similar English letters are accidentally entered (“tse” instead of “es”, “y” instead of “y”, etc.) And everything is decent from the outside, because the style of these characters is sometimes exactly the same , but Excel perceives them, of course, as different values ​​and produces errors in formulas, duplicates in filters, etc.

You can, of course, manually replace Latin characters with their corresponding Cyrillic alphabet, but it will be much faster to do this using a macro. Open the Visual Basic Editor with a keyboard shortcut Alt + F11, insert a new module (menu Insert – Module) and enter its text there:

	   Sub Replace_Latin_to_()     Rus = "асекорхуАСЕНКМОРТХ"     Eng = "acekopxyACEHKMOPTX"     For Each cell In Selection       For i = 1 To Len(cell)         c1 = Mid(cell, i, 1)         If c1 Like "[" & Eng & "]" Then            c2 = Mid(Rus, InStr(1, Eng, c1), 1)            cell.Value = Replace(cell, c1, c2)         End If       Next i     Next cell   End Sub  

Now, if we select a range on the sheet and run our macro (Alt + F8 or tab developer – button Macros), then all English letters found in the selected cells will be replaced with equivalent ones. Just be careful not to accidentally replace the Latin you need 🙂

  • Search for Latin characters in text
  • Checking text for compliance with a given pattern (mask)
  • Dividing “sticky” text from one column into several

 

Leave a Reply