Contents
- Method 1. Looking for repetitions: text in columns and an array formula
- Method 2. Highlighting repetitions inside a cell with a macro
- Method 3. We display repetitions in the adjacent column
- Method 4. Removing repetitions inside a cell with a macro
- Method 5. Removing duplicates inside a cell using Power Query
I have already written more than once about searching and highlighting duplicates in different cells and ranges, but what if you need to find and, possibly, delete duplicate words inside a cell? For example, we have such a table with data (separators may not necessarily be spaces):
It is clearly seen that some names in the lists inside the cells are repeated. Let’s see what we can do with it.
Method 1. Looking for repetitions: text in columns and an array formula
This is not the most convenient and fastest, but the easiest option for solving the problem “on the knee”. Select the original list and split it into columns by spaces using the command Data – Text by Columns (Data — Text to columns). In the opened window of the three-step Wizard, select the format By separator (By delimiter) in the first step and check the box Space (Space) on the second:
If there may be extra spaces in the source data, then it is better to enable the option Treat consecutive delimiters as one (Treat consecutive delimiters as one) – this will save us from extra columns.
On the third step in the field Place in set an empty cell next to the table so that the results do not overwrite our original data and click on Finish (Finish):
Our data will be divided into cells. It remains to count the number of repetitions in each line using a small but tricky array formula:
In the English version it will be =SUMPRODUCT(N(COUNTIF(B2:G2,B2:G2)>1))
Let’s analyze the logic of its work on the example of the first line.
- First we use the formula COUNTIF(B2: G2;B2: G2) we calculate in turn the number of occurrences of each name in the range B2: G2 and get an array at the output {1,2,1,2,1}Because Ivan occurs 1 time in the first line, Elena – 2 times, Sergei – 1, etc.
- Checking with COUNTIF(B2:G2;B2:G2)>1 which of the numbers obtained is greater than one, i.e. where we have repetitions. At the output, this formula will give us an array of test results in the form {FALSE, TRUE, FALSE, TRUE, FALSE}.
- We translate the logical values FALSE and TRUE into more convenient for counting 0 and 1, respectively, using the function Ч. The output is an array {0,1,0,1,0}.
- We sum all the elements of the resulting array with the function SUMPRODUCT. You could also use the normal function SUM, but then you would have to press instead of the usual Enter keyboard shortcut Ctrl+Shift+Enterto enter the formula as an array formula.
By the resulting column, you can easily filter rows with repetitions and then work with them further manually.
The disadvantages of this method, however, are very obvious: if you change the source data, you will have to repeat the whole procedure again, the duplicates are not very noticeable and you also need to delete them hand-to-hand. So let’s move on.
Method 2. Highlighting repetitions inside a cell with a macro
If duplicates need to be clearly shown, then it will be more convenient to use a special macro for this. Open the Visual Basic editor with the button of the same name on the tab developer (Developer — Visual Basic) or keyboard shortcut Alt+F11. Insert a new empty module into the book through the menu Insert – Module and copy the following code there:
Sub Color_Duplicates() Dim col As New Collection Dim curpos As Integer, i As Integer On Error Resume Next For Each cell In Selection Set col = Nothing curpos = 1 'remove extra spaces and split the text from the cell by spaces arWords = Split(WorksheetFunction. Trim(cell.Value), " ") For i = LBound(arWords) To UBound(arWords) 'Loop through the words in the resulting array Err.Clear 'Clear errors curpos = InStr(curpos, cell, arWords(i)) 'Start position current word col.Add arWords(i), arWords(i) 'trying to add the current word to the collection If Err.Number 0 Then 'if an error occurs, it means it's a repetition, highlight in red cell.Characters(Start:=curpos, Length :=Len(arWords(i))).Font.ColorIndex = 3 cell.Characters(Start:=InStr(1, cell, arWords(i)), Length:=Len(arWords(i))).Font.ColorIndex = 3 End If curpos = curpos + Len(arWords(i)) 'go to next word Next i Next cell End Sub
Now you can return to the main Excel window, select cells with text and run the created macro using the button Macros tab developer (Developer — Macros) or keyboard shortcut Alt+F8. This macro iterates over all selected cells and marks the repetitions in red font right inside the cell:
If you want only clones to be highlighted, but not the first occurrences (i.e. only the second and third, but not the first Alice, for example), it will be enough just to remove line 20 from the code.
Method 3. We display repetitions in the adjacent column
If repetitions inside cells need not just be highlighted, but explicitly displayed, for example, in an adjacent column, then it will be more convenient to use a macro function created in the image of the previous macro for this. Let’s add a new module in the Visual Basic editor and paste our function code there GetDuplicates:
Function GetDuplicates(cell As Range) As String Dim col As New Collection Dim i As Integer, sDupes As String On Error Resume Next Set col = Nothing 'divide the text in the cell by spaces arWords = Split(WorksheetFunction.Trim(cell.Value), " ") 'loop through all resulting words For i = LBound(arWords) To UBound(arWords) Err.Clear 'clear errors col.Add arWords(i), arWords(i) 'try to add word to collection 'if errors does not occur, it is not a repetition - add a word to the result If Err.Number 0 Then sDupes = sDupes & " " & arWords(i) Next i GetDuplicates = Trim(sDupes) 'print the results End Function
This function, as you might guess, takes a cell with text as its only argument and displays as a result all the repetitions that it finds there:
Method 4. Removing repetitions inside a cell with a macro
If you just need to remove duplicates inside the cell so that all the words remaining there are not repeated, then the macro will be similar to the previous one, but simpler:
Sub Delete_Duplicates() Dim col As New Collection Dim i As Integer On Error Resume Next For Each cell In Selection Set col = Nothing sResult = "" " ") 'loop through all resulting words For i = LBound(arWords) To UBound(arWords) Err.Clear 'clear errors col.Add arWords(i), arWords(i) 'try to add word to collection 'if errors does not occur, this is not a repetition - add a word to the result If Err.Number = 0 Then sResult = sResult & " " & arWords(i) Next i cell.Value = Trim(sResult) 'display results without repetitions Next cell End Sub
Method 5. Removing duplicates inside a cell using Power Query
This method uses a free data science add-in for Excel called Power Query. For Excel 2010-2013, you can download it from the Microsoft website, and in Excel 2016-2019 it is already built-in by default. A huge advantage of this option is the ability to automatically update – if the source data changes in the future, then we do not have to do all the processing again (as in Method 1) or run a macro (as in Method 4) – it will be enough just to update the created query.
First, our data needs to be loaded into Power Query. The easiest way to do this is to turn our table into a “smart” one with a keyboard shortcut Ctrl+T or by button Format as a table tab Home (Home — Format as Table)and then click the button From Table/Range (From table/range) tab Power Query (if you have Excel 2010-2013) or on the tab Data (if you have Excel 2016 or newer):
A Power Query Query Editor window will open on top of the Excel window with our data loaded there:
Then we do the following chain of actions:
Delete an unnecessary step modified type (Changed Type) on the right in the applied steps panel using the cross to the left of the step.
So that you can then identify the belonging of each name to the original line – add a column with line numbering on the tab Add Column – Index Column – From 1 (Add Column — Index Column — From 1):
Select the column with names and click on the tab Transform – Split Column – By Delimiter (Transform — Split Column — By delimiter), and in the window that opens, select division by each space and, most importantly, division into rows, not columns in the advanced options:
After clicking on OK we will see the following:
Now select both columns (hold down the key Ctrl or Shift) and remove duplicates via Home — Delete Rows — Delete Duplicates (Home — Remove Rows — Remove Duplicates).
It remains to collect everything back into the cells 🙂 To do this, select the column Index and use the command Group by tab Transformation (Transform — Group By) with the following parameters:
After clicking on OK our names will be grouped into nested tables that mimic the initial cells – only this time without repetitions. You can see the contents of the collapsed tables if you click in the background of the cell next to the word Table (but not in the word Table!):
It remains to pull out all the names from the first column of each table and glue them with a space. This can be done using a small formula in the M language built into Power Query. Select on the tab Adding a column Command Custom column (Add Column — Custom Column) and enter in the window that opens the name of the new column and the formula (case sensitive!):
=Text.Combine([Cells][Names],” “)
Here the expression [Cells][Names] retrieves the contents of a column Names from each table in a column Cellsand the function Text.Combine glues then all of them through the given separator (space). After clicking on OK, we will finally see what we want:
It remains to remove columns that are no longer needed Index и Cellsby right-clicking on their titles and selecting the command Remove columns (Remove Columns) and dump the results to a sheet via Home — Close and load — Close and load in (Home — Close & Load — Close & Load to..):
Problem solved! If in the future the data in the original “smart” table changes or new rows are added to it, then it will be enough just to update the query by right-clicking on the resulting green table and selecting the command Update & Save or by pressing the keyboard shortcut Ctrl+Alt+F5.
- How to Find and Display Differences in Two Lists in Excel
- How to highlight matching pairs of duplicates with the same color
- Comparing two tables using VLOOKUP, Pivot or Power Query