It’s very common that you need to combine data from multiple worksheets in the same Excel workbook when you’re using Power BI or Power Query/Get&Transform in Excel. Indeed a lot of people have blogged about how to solve this problem, but none of the solutions I’ve found on the internet work in more complex scenarios when the data on each sheet needs some kind of transformation before it can be combined. I was asked to explain how to do this recently while teaching a Power BI class, so in this blog post I’m going to walk through a worked example and point out a few issues that might trip up even experienced Power BI users.
First of all, the source data. Let’s say you have an Excel workbook with four worksheets: Q1, Q2, Q3 and Q4. On each worksheet is some sales data for the three months in each quarter; for example the Q1 worksheet looks like this:
…the Q2 worksheet looks like this:
…and so on. The required output for Power BI should be a table that looks like this:
Now most of the blog posts that describe this problem, such as Ken Puls’s post here, assume each worksheet has a table with the same column names on it. If each sheet has the same columns, this means you can just connect to the Excel workbook and get a table containing the contents (Miguel Escobar has a great post describing how to do this here) and then click the Expand/Aggregate button:
However in this particular case it doesn’t solve the problem, because we get this:
Aha, you may say, we have to transform the data before we can combine it and so we need to create a function and call it for every worksheet – the technique I’ve already blogged about here. And yes, that is basically what needs to happen, but the devil’s in the detail.
Here’s the solution, step-by-step:
Step 1: Get a table with all the worksheets listed
In Power BI connect to your Excel file as normal, then in the Navigator pane right-click on the name of the Excel workbook and select Edit rather than selecting any of the individual worksheets:
The result will be a table that looks something like this:
If you need to, filter out any rows that do not contain “Sheet” in the Kind column and also filter out any worksheets that you don’t want to combine data from.
Step 2: Create your template query
Duplicate the query above and call the new query Template.
Now, in the Template query, select one of the worksheets to use to build the query whose logic will be applied to all the other worksheets, and filter the table above so it only contains the row for that worksheet. In this case I’m using the worksheet called Q1:
Then – and this is important – remove all the other columns in the table except the Data column:
Doing this changes the M code generated for the next thing you’ll do; removing all these columns changes the way the row is referenced (see the section on “The effect of primary keys” in this post) and makes sure the name of the worksheet won’t be hard-coded anywhere.
After that click the Table link inside the cell, and you’ll see the contents of the worksheet:
There will probably be a Changed Type step in the query that sets the data types for each of the columns, and you will need to delete it:
You can now perform any other transformations you need on this query, but you will need to avoid any transformations that generate M code referring to any columns on the original worksheet that aren’t present on other worksheets. Remember, these transformations will need to be applied to the other worksheets and they will fail if they refer to columns that aren’t present – this is why you had to delete the Changed Type step earlier, because it sets the types on the January, February and March columns, and you’ll probably need to delete any other Changed Type steps that are created elsewhere in the query. Open up the Advanced Editor and check the M code for the whole query just to be sure.
In this case all I need to do is unpivot the month columns by selecting the Product column and using the Unpivot Other Columns button on the Transform tab, and then renaming the columns appropriately:
Step 3: Create a function
Next you need to create a new parameter by clicking the Manage Parameters/New Parameter button, call the parameter Worksheet, set the data type to text and have it return the name of the worksheet you chose in the previous step:
Now, go back to the Template query, find the step called Filtered Rows towards the beginning where you filtered down to a single worksheet, and click the gear icon next to the step to edit it:
Then, edit the step so it uses the value returned by the parameter to filter by instead of the hard-coded value you entered earlier. To do this, click on the button shown below, select Parameter and then select the Worksheet parameter in the next dropdown box along:
Finally, go to the Queries pane on the left-hand side of the screen and right-click on the Template query and select Create Function..
You’ll be prompted to give the new function a name; call it GetData:
Step 4: Invoke the function and combine the data
Finally, go back to the duplicate copy of the original query created at the beginning of step 2. Then go to the Add Column tab on the ribbon and click the Invoke Custom Function button and invoke the GetData function, passing in the contents of the Name column to the function’s only parameter:
Last of all, click the Expand/Aggregate button on the new column and expand the nested tables:
After removing any unnecessary columns, you’ll see the data from all the worksheets combined into a single table as desired:
Don’t forget to set the data types on each column.