Contents
Searching for keywords in source text is one of the most common tasks when working with data. Let’s look at its solution in several ways using the following example:
Let’s suppose that you and I have a list of keywords – the names of car brands – and a large table of all kinds of spare parts, where descriptions can sometimes contain one or several such brands at once, if the spare part fits more than one brand of car. Our task is to find and display all detected keywords in neighboring cells through a given separator character (for example, a comma).
Method 1. Power Query
Of course, first we turn our tables into dynamic (“smart”) using a keyboard shortcut Ctrl+T or commands Home – Format as a table (Home — Format as Table), give them names (for example Stampsи Spare parts) and load one by one into the Power Query editor by selecting on the tab Data – From Table/Range (Data — From Table/Range). If you have older versions of Excel 2010-2013, where Power Query is installed as a separate add-in, then the desired button will be on the tab Power Query. If you have a brand new version of Excel 365, then the button From Table/Range called there now With leaves (From Sheet).
After loading each table in Power Query, we return back to Excel with the command Home — Close and load — Close and load to… — Create connection only (Home — Close & Load — Close & Load to… — Only create connection).
Now let’s create a duplicate request Spare partsby right-clicking on it and selecting Duplicate request (Duplicate query), then rename the resulting copy request to The results and we will continue to work with him.
The logic of actions is the following:
- On the Advanced tab Adding a column choose a team Custom column (Add column — Custom column) and enter the formula = Brands. After clicking on OK we will get a new column, where in each cell there will be a nested table with a list of our keywords – automaker brands:
- Use the button with double arrows in the header of the added column to expand all nested tables. At the same time, the lines with descriptions of spare parts will multiply by a multiple of the number of brands, and we will get all possible pairs-combinations of “spare part-brand”:
- On the Advanced tab Adding a column choose a team Conditional column (Conditional column) and set a condition for checking the occurrence of a keyword (brand) in the source text (part description):
- To make the search case insensitive, manually add the third argument in the formula bar Compare.OrdinalIgnoreCase to the occurrence check function Text.Contains (if the formula bar is not visible, then it can be enabled on the tab Review):
- We filter the resulting table, leaving only ones in the last column, i.e. matches and remove the unnecessary column Occurrences.
- Grouping identical descriptions with the command Group by tab Transformation (Transform — Group by). As an aggregation operation, choose All lines (All rows). At the output, we get a column with tables, which contains all the details for each spare part, including the brands of automakers we need:
- To extract grades for each part, add another calculated column on the tab Adding a Column – Custom Column (Add column — Custom column) and use a formula consisting of a table (they are located in our column Details) and the name of the extracted column:
- We click on the button with double arrows in the header of the resulting column and select the command Extract values (Extract values)to output stamps with any delimiter character you want:
- Removing an unnecessary column Details.
- To add to the resulting table the parts that disappeared from it, where no brands were found in the descriptions, we perform the procedure for combining the query Result with original request Spare parts button Combine tab Home (Home — Merge queries). Connection type – Outer Join Right (Right outer join):
- All that remains is to remove the extra columns and rename-move the remaining ones – and our task is solved:
Method 2. Formulas
If you have a version of Excel 2016 or later, then our problem can be solved in a very compact and elegant way using the new function COMBINE (TEXTJOIN):
The logic behind this formula is simple:
- Function SEARCH (FIND) searches for the occurrence of each brand in turn in the current description of the part and returns either the serial number of the symbol, starting from which the brand was found, or the error #VALUE! if the brand is not in the description.
- Then using the function IF (IF) и EOSHIBKA (ISERROR) we replace the errors with an empty text string “”, and the ordinal numbers of the characters with the brand names themselves.
- The resulting array of empty cells and found brands is assembled into a single string through a given separator character using the function COMBINE (TEXTJOIN).
Performance Comparison and Power Query Query Buffering for Speedup
For performance testing, let’s take a table of 100 spare parts descriptions as initial data. On it we get the following results:
- Recalculation time by formulas (Method 2) – 9 sec. when you first copy the formula to the entire column and 2 sec. at repeated (buffering affects, probably).
- The update time of the Power Query query (Method 1) is much worse – 110 seconds.
Of course, a lot depends on the hardware of a particular PC and the installed version of Office and updates, but the overall picture, I think, is clear.
To speed up a Power Query query, let’s buffer the lookup table Stamps, because it does not change in the process of query execution and it is not necessary to constantly recalculate it (as Power Query de facto does). For this we use the function Table.Buffer from the built-in Power Query language M.
To do this, open a query The results and on the tab Review press the button Advanced Editor (View — Advanced Editor). In the window that opens, add a line with a new variable Marky 2, which will be a buffered version of our automaker directory, and use this new variable later in the following query command:
After such refinement, the update speed of our request increases by almost 7 times – up to 15 seconds. Quite a different thing 🙂
- Fuzzy text search in Power Query
- Bulk text replacement with formulas
- Bulk text replacement in Power Query with List.Accumulate function