Assembling tables from different Excel files with Power Query

Formulation of the problem

Let’s look at a beautiful solution for one of the very standard situations that most Excel users face sooner or later: you need to quickly and automatically collect data from a large number of files into one final table. 

Suppose we have the following folder, which contains several files with data from branch cities:

Assembling tables from different Excel files with Power Query

The number of files does not matter and may change in the future. Each file has a sheet named Saleswhere the data table is located:

Assembling tables from different Excel files with Power Query

The number of rows (orders) in the tables, of course, is different, but the set of columns is standard everywhere.

Task: to collect data from all files into one book with subsequent automatic updating when adding or deleting city files or rows in tables. According to the final consolidated table, then it will be possible to build any reports, pivot tables, filter-sort data, etc. The main thing is to be able to collect.

We select weapons

For the solution, we need the latest version of Excel 2016 (the necessary functionality is already built into it by default) or previous versions of Excel 2010-2013 with the free add-in installed Power Query from Microsoft (download it here). Power Query is a super flexible and super powerful tool for loading data into Excel from the outside world, then stripping and processing it. Power Query supports almost all existing data sources – from text files to SQL and even Facebook 🙂

If you do not have Excel 2013 or 2016, then you can not read further (just kidding). In older versions of Excel, such a task can only be accomplished by programming a macro in Visual Basic (which is very difficult for beginners) or by monotonous manual copying (which takes a long time and generates errors).

Step 1. Import one file as a sample

First, let’s import data from one workbook as an example, so that Excel “picks up the idea”. To do this, create a new blank workbook and…

  • if you have Excel 2016, then open the tab Data and then Create Query – From File – From Book (Data — New Query- From file — From Excel)
  • if you have Excel 2010-2013 with the Power Query add-in installed, then open the tab Power Query and select on it From file – From book (From file — From Excel)

Then, in the window that opens, go to our folder with reports and select any of the city files (it does not matter which one, because they are all typical). After a couple of seconds, the Navigator window should appear, where you need to select the sheet we need (Sales) on the left side, and its contents will be displayed on the right side:

Assembling tables from different Excel files with Power Query

If you click on the button in the lower right corner of this window Download (Load), then the table will be immediately imported to the sheet in its original form. For a single file, this is good, but we need to load many such files, so we will go a little differently and click the button Correction (Edit). After that, the Power Query query editor should be displayed in a separate window with our data from the book:

Assembling tables from different Excel files with Power Query

This is a very powerful tool that allows you to “finish” the table to the view we need. Even a superficial description of all its functions would take about a hundred pages, but, if very briefly, using this window you can:

  • filter out unnecessary data, empty lines, lines with errors
  • sort data by one or more columns
  • get rid of repetition
  • divide sticky text by columns (by delimiters, number of characters, etc.)
  • put text in order (remove extra spaces, correct case, etc.)
  • convert data types in every possible way (turn numbers like text into normal numbers and vice versa)
  • transpose (rotate) tables and expand two-dimensional cross-tables into flat ones
  • add additional columns to the table and use formulas and functions in them using the M language built into Power Query.

For example, let’s add a column with the text name of the month to our table, so that later it would be easier to build pivot table reports. To do this, right-click on the column heading dateand select the command Duplicate column (Duplicate Column), and then right-click on the header of the duplicate column that appears and select Commands Transform – Month – Month Name:

Assembling tables from different Excel files with Power Query

A new column should be formed with the text names of the month for each row. By double-clicking on a column heading, you can rename it from Copy Date to a more comfortable Month, eg.

Assembling tables from different Excel files with Power Query

If in some columns the program did not quite correctly recognize the data type, then you can help it by clicking on the format icon on the left side of each column:

Assembling tables from different Excel files with Power Query

You can exclude lines with errors or empty lines, as well as unnecessary managers or customers, using a simple filter:

Assembling tables from different Excel files with Power Query

Moreover, all performed transformations are fixed in the right panel, where they can always be rolled back (cross) or change their parameters (gear):

Assembling tables from different Excel files with Power Query

Light and elegant, isn’t it?

Step 2. Let’s transform our request into a function

In order to subsequently repeat all the data transformations made for each imported book, we need to convert our created request into a function, which will then be applied, in turn, to all of our files. To do this is actually very simple.

In the Query Editor, go to the View tab and click the button Advanced Editor (View — Advanced Editor). A window should open where all our previous actions will be written in the form of code in the M language. Please note that the path to the file that we imported for the example is hardcoded in the code:

Assembling tables from different Excel files with Power Query

Now let’s make a couple of adjustments:

Assembling tables from different Excel files with Power Query

Their meaning is simple: the first line (filepath)=> turns our procedure into a function with an argument filepath, and below we change the fixed path to the value of this variable. 

All. Click on Finish and should see this:

Assembling tables from different Excel files with Power Query

Don’t be afraid that the data has disappeared – in fact, everything is OK, everything should look like this 🙂 We have successfully created our custom function, where the entire algorithm for importing and processing data is remembered without being tied to a specific file. It remains to give it a more understandable name (for example getData) in the panel on the right in the field First name and you can reap Home — Close and download (Home — Close and Load). Please note that the path to the file that we imported for the example is hardcoded in the code. You will return to the main Microsoft Excel window, but a panel with the created connection to our function should appear on the right:

Assembling tables from different Excel files with Power Query

Step 3. Collecting all the files

All the hardest part is behind, the pleasant and easy part remains. Go to the tab Data – Create Query – From File – From Folder (Data — New Query — From file — From folder) or, if you have Excel 2010-2013, similarly to the tab Power Query. In the window that appears, specify the folder where all our source city files are located and click OK. The next step should open a window where all the Excel files found in this folder (and its subfolders) and details for each of them will be listed:

Assembling tables from different Excel files with Power Query

Click Change (Edit) and again we get into the familiar query editor window.

Now we need to add another column to our table with our created function, which will “pull” the data from each file. To do this, go to the tab Add Column – Custom Column (Add Column — Add Custom Column) and in the window that appears, enter our function getData, specifying for it as an argument the full path to each file:

Assembling tables from different Excel files with Power Query

After clicking on OK the created column should be added to our table on the right.

Now let’s delete all unnecessary columns (as in Excel, using the right mouse button – Remove), leaving only the added column and the column with the file name, because this name (more precisely, the city) will be useful to have in the total data for each row.

And now the “wow moment” – click on the icon with its own arrows in the upper right corner of the added column with our function:

Assembling tables from different Excel files with Power Query

… uncheck Use original column name as prefix (Use original column name as prefix)and click OK. And our function will load and process the data from each file, following the recorded algorithm and collecting everything in a common table:

Assembling tables from different Excel files with Power Query

For complete beauty, you can also remove the .xlsx extensions from the first column with file names – by standard replacement with “nothing” (right-click on the column header – Substitute) and rename this column to City. And also correct the data format in the column with the date.

All! Click on Home – Close and Load (Home — Close & Load). All data collected by the query for all cities will be uploaded to the current Excel sheet in the “smart table” format:

Assembling tables from different Excel files with Power Query

The created connection and our assembly function do not need to be saved separately in any way – they are saved together with the current file in the usual way.

In the future, with any changes in the folder (adding or removing cities) or in files (changing the number of lines), it will be enough to right-click directly on the table or on the query in the right panel and select the command Update & Save (Refresh) – Power Query will “rebuild” all the data again in a few seconds.

PS

Amendment. After the January 2017 updates, Power Query learned how to collect Excel workbooks by itself, i.e. no need to make a separate function anymore – it happens automatically. Thus, the second step from this article is no longer needed and the whole process becomes noticeably simpler:

  1. Choose Create Request – From File – From Folder – Select Folder – OK
  2. After the list of files appears, press Change
  3. In the Query Editor window, expand the Binary column with a double arrow and select the sheet name to be taken from each file

And that’s all! Song!

  • Redesign of the crosstab into a flat one suitable for building pivot tables
  • Building an animated bubble chart in Power View
  • Macro to assemble sheets from different Excel files into one

Leave a Reply