Contents
How to quickly and in bulk replace the text according to the reference list with formulas – we have already sorted it out. Now let’s try to do it in Power Query.
As often happens perform this task is much easier than explaining why it works, but let’s try to do both 🙂
So, we have two “smart” dynamic tables created from ordinary ranges with a keyboard shortcut Ctrl+T or team Home – Format as a table (Home — Format as Table):
I called the first table Data, the second table – Directoryusing field Table name (Table name) tab Constructor (Design).
Task: replace in addresses in the table Data all occurrences from a column To find Handbook to their corresponding correct counterparts from the column Substitute. The rest of the text in the cells should remain untouched.
Step 1. Load the directory into Power Query and turn it into a list
Having set the active cell to any place in the reference table, click on the tab Data (Date)or on the tab Power Query (if you have an old version of Excel and you installed Power Query as an add-in on a separate tab) on the button From table/range (From Table/Range).
The reference table will be loaded into the Power Query query editor:
In order not to interfere, an automatically added step modified type (Changed Type) in the right panel, the applied steps can be safely deleted, leaving only the step Source (Source):
Now, to perform further transformations and replacements, we need to turn this table into a list (list).
Lyrical digression
- Table is a two-dimensional array consisting of several rows and columns.
- Record (Record) – one-dimensional array-string, consisting of several fields-elements with names, for example [Name = “Masha”, Gender = “f”, Age = 25]
- List – a one-dimensional array-column, consisting of several elements, for example {1, 2, 3, 10, 42} or { “Faith Hope Love” }
To solve our problem, we will be primarily interested in the type List.
The trick here is that list items in Power Query can be not only banal numbers or text, but also other lists or records. It is in such a tricky list (list), consisting of records (records) that we need to turn our directory. In Power Query syntactic notation (entries in square brackets, lists in curly brackets) this would look like:
{
[ Find = “St. Petersburg”, Replace = “St. Petersburg” ] ,
[ Find = “St. Petersburg”, Replace = “St. Petersburg” ] ,
[ Find = “Peter”, Replace = “St. Petersburg” ] ,
etc.
}
Such a transformation is performed using a special function of the M language built into Power Query – Table.ToRecords. To apply it directly in the formula bar, add this function to the step code there Source.
It was:
After:
After adding the Table.ToRecords function, the appearance of our table will change – it will turn into a list of records. The contents of individual records can be seen at the bottom of the view pane by clicking in the cell background next to any word Record (but not in a single word!)
In addition to the above, it makes sense to add one more stroke – to cache (buffer) our created list. This will force Power Query to load our lookup list once into memory and not recalculate it again when we later access it to replace it. To do this, wrap our formula in another function – List.Buffer:
Such caching will give a very noticeable increase in speed (by several times!) with a large amount of initial data to be cleared.
This completes the preparation of the handbook.
It remains to click on Home – Close and Load – Close and Load to… (Home — Close&Load — Close&Load to..), select an option Just create a connection (Only create connection) and return to Excel.
Step 2. Loading the data table
Everything is trite here. As before with the reference book, we get up to any place in the table, click on the tab Data button From Table/Range and our table Data gets into Power Query. Automatically added step modified type (Changed Type) you can also remove:
No special preparatory actions are required to be done with it, and we move on to the most important thing.
Step 3. Perform replacements using the List.Accumulate function
Let’s add a calculated column to our data table using the command Adding a Column – Custom Column (Add column — Custom column): and enter the name of the added column in the window that opens (for example, corrected address) and our magic function List.Accumulate:
It remains to click on OK – and we get a column with the replacements made:
Note that:
- Since Power Query is case sensitive, there was no replacement in the penultimate line, because in the directory we have “SPb”, not “SPb”.
- If there are several substrings to replace at once in the source data (for example, in the 7th line you need to replace both “S-Pb” and “Prospectus”), then this does not create any problems (unlike replacing with formulas from the previous method).
- If there is nothing to replace in the source text (9th line), then no errors occur (unlike, again, from replacement by formulas).
The speed of such a request is very, very decent. For example, for a table of initial data with a size of 5000 rows, this query was updated in less than a second (without buffering, by the way, about 3 seconds!)
How the List.Accumulate function works
In principle, this could be the end (for me to write, and for you to read) this article. If you want to not only be able to, but also understand how it works “under the hood”, then you will have to dive a little deeper into the rabbit hole and deal with the List.Accumulate function, which did all the bulk replacement work for us.
The syntax for this function is:
=List.Accumulate(list, seed, accumulator)
where
- list is the list whose elements we are iterating over.
- seed – initial state
- accumulator – a function that performs some operation (mathematical, text, etc.) on the next element of the list and accumulates the result of processing in a special variable.
In general, the syntax for writing functions in Power Query looks like this:
(argument1, argument2, … argumentN) => some actions with arguments
For example, the summation function could be represented as:
(a, b) => a + b
For List.Accumulate , this accumulator function has two required arguments (they can be named anything, but the usual names are state и current, as in the official help for this function, where:
- state – a variable where the result is accumulated (its initial value is the one mentioned above seed)
- current – the next iterated value from the list list
For example, let’s take a look at the steps of the logic of the following construction:
=List.Accumulate({3, 2, 5}, 10, (state, current) => state + current)
- Variable value state is set equal to the initial argument seedIe state = 10
- We take the first element of the list (current = 3) and add it to the variable state (ten). We get state = 13.
- We take the second element of the list (current = 2) and plus it to the current accumulated value in the variable state (ten). We get state = 15.
- We take the third element of the list (current = 5) and plus it to the current accumulated value in the variable state (ten). We get state = 20.
This is the latest accumulated state the value is our List.Accumulate function and outputs as a result:
If you fantasize a little, then using the List.Accumulate function, you can simulate, for example, the Excel function CONCATENATE (in Power Query, its analogue is called Text.Combine) using the expression:
Or even search for the maximum value (imitation of Excel’s MAX function, which in Power Query is called List.Max):
However, the main feature of List.Accumulate is the ability to process not only simple text or numeric lists as arguments, but more complex objects – for example, lists-from-lists or lists-from-records (hello, Directory!)
Let’s look again at the construction that performed the replacement in our problem:
List.Accumulate(Directory, [Address], (state,current) => Text.Replace(state, current[Find], current[Replace]) )
What is really going on here?
- As initial value (seed) we take the first clumsy text from the column [Address] our table: 199034, St. Petersburg, str. Beringa, d. 1
- Then List.Accumulate iterates over the elements of the list one by one – Handbook. Each element of this list is a record consisting of a pair of fields “What to find – What to replace with” or, in other words, the next line in the directory.
- The accumulator function puts into a variable state initial value (first address 199034, St. Petersburg, str. Beringa, d. 1) and performs an accumulator function on it – the replacement operation using the standard M-function Text.Replace (analogous to Excel’s SUBSTITUTE function). Its syntax is:
Text.Replace( original text, what we are looking for, what we are replacing with )
and here we have:
- state is our dirty address, which lies in state (getting there from seed)
- current[Search] – field value To find from the next iterated entry of the list Directory, which lies in the variable current
- current[Replace] – field value Substitute from the next iterated entry of the list Directorylying in current
Thus, for each address, a full cycle of enumeration of all lines in the directory is run each time, replacing the text from the [Find] field with the value from the [Replace] field.
Hope you got the idea 🙂
- Bulk replace text in a list using formulas
- Regular Expressions (RegExp) in Power Query