Combining Data From Multiple Excel Workbooks With Power Query–The Easy Way!

If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table. The bad news is that this is something that Power Query can’t do through the user interface (although so many people have asked for it I wouldn’t be surprised if it gets added to the product soon) and it’s not obvious how to do it.

This is a topic that has been blogged about many times over the past year or so (see DutchDataDude, Mike Alexander, Ken Puls, Miguel Escobar – apologies to anyone I’ve missed) so why should I write about it? Well, all these other posts show you the steps you have to go through to build your own function and then use that function inside a query, which is fine, but it involves a lot of clicking and typing code each time you want to do it. It’s all very time-consuming if you don’t know Power Query that well, though, and not something a regular Excel user would want to do. I’ve got an easier way though: a generic function that can combine data from workbooks in any folder you point it at. Once you’ve created it it’s very easy for anyone to use, can be reused over and over, and of course you can share this function through the Power BI Data Catalog if you have a Power BI for Office 365 subscription.

Steps to add the Power Query function to your workbook

You can either follow the steps below to add the function to your workbook, or instead just download the sample workbook containing the function here – which is a lot quicker!

1) Copy the following code onto the clipboard

//Define function parameters
(#"Directory containing Excel files to combine" as text, 
optional #"Name of each Excel object to combine" as text, 
optional #"Use first rows as headers" as logical) =>
let
    //If the optional Excel object name parameter is not set, then default to Sheet1
    ExcelName = if #"Name of each Excel object to combine" = null 
                then "Sheet1" 
                else #"Name of each Excel object to combine",
    //If the optional Use first rows as headers parameter is not set, then default to true
    UseFirstRowsAsHeaders = if #"Use first rows as headers"= null 
                            then true 
                            else #"Use first rows as headers",
    //Get a list of all the files in the folder specified
    Source = Folder.Files(#"Directory containing Excel files to combine"),
    //Filter these to only get Excel files
    OnlyGetExcelFiles = Table.SelectRows(Source, 
                              each ([Extension] = ".xlsx") 
                              or ([Extension] = ".xls")),
    //Find the full path of each file
    FullPath = Table.CombineColumns(
                    OnlyGetExcelFiles ,
                    {"Folder Path", "Name"},
                    Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    //Get a list containing each file path
    ExcelFiles = Table.Column(FullPath, "Merged"),
    //Define a function to get the data from the specified name in each Excel workbook
    GetExcelContents = (FileName as text) =>
     let
      //Connect to the workbook
      Source = Excel.Workbook(File.Contents(FileName), UseFirstRowsAsHeaders),
      //Get a table of data from the name specified
      //If the name doesn't exist catch the error and return null
      ExcelData = try Source{[Item=ExcelName]}[Data] 
                           otherwise try Source{[Name=ExcelName]}[Data] 
                           otherwise null
     in
      ExcelData,
    //Call the above function for each Excel file
    ReadAllWorkbooks = List.Transform(ExcelFiles, each GetExcelContents(_)),
    //Remove any null values resulting from errors
    IgnoreNulls = List.RemoveNulls(ReadAllWorkbooks),
    //Combine the data from each workbook into a single table
    CombineData = Table.Combine(IgnoreNulls)
in
   CombineData

 

2) Open Excel and go to the Power Query tab on the ribbon. Click on the From Other Sources button and then click Blank Query.

image

3) The Power Query query editor window will open. Go to the View tab and click on the Advanced Editor button.

image

4) The Advanced Editor will open. Delete all the code in the main textbox and replace it with the code above. Click OK to close the Advanced Editor.

image

5) In the Query Settings pane on the right-hand side of the Query Editor, change the name of the query to CombineExcel, then go to the Home tab on the ribbon and click the Close & Load button. The Query Editor will close.

image

image

6) You can now see your function in the Workbook Queries pane in Excel! It should look like this:

image

 

Using the function to combine data from multiple workbooks

To use the function, double-click on it in the Workbook Queries pane or right-click and select Invoke. The following dialog will appear:

image

You can enter three parameters here:

  • The path of the directory containing the Excel workbooks that you want to read data from. The function can read from xlsx and xls files (though for the latter to work you need the Access 2010 engine installed, a free download if you only have Excel 2013) and will ignore any other files in the folder. The function will also read any Excel files in any subfolders.
  • Where you want to get data from in each Excel workbook. This can be the name of a worksheet (for example you could enter Sheet2 here) or a named range or a table. It’s an optional parameter so if you leave it blank it will get data from the worksheet Sheet1. If a workbook doesn’t contain the name you enter here it will be ignored. If the format of the data in each worksheet is not consistent (for example if you have different column names) then be warned: you may get some strange results.
  • Whether data on your worksheet (if you’re getting data from a worksheet) contains headers. Enter true here if your data does have a header row in every worksheet; false otherwise. This is also an optional parameter and if you leave this box empty the default value is true.

When you click OK, a new Power Query query will be created, the Query Editor window will open and you’ll see all the data from all of the Excel workbooks combined. The first step of this query is a call to the CombineExcel() function and you can carry on working with your data in Power Query as normal.

image

Disclaimer: I’ve done a reasonable amount of testing on this and I’m pretty sure it works well, but of course there will be bugs. Please leave a comment if you find a bug or can suggest any other improvements.

39 thoughts on “Combining Data From Multiple Excel Workbooks With Power Query–The Easy Way!

  1. Great info here, Chris. I’ll be buying your book this weekend 🙂

    In terms of reusable processes and automation – is there some kind of template that can be created and reused from client to client with Power View’s? Say we want to offer some PowerViews of typical indicators (top 10 customers, gross profit by region, etc), would they have to be built every time? I know we can reuse the DAX/MDX query when connecting to a particular cube in PowerPivot.

  2. Hey, Chris
    Thanks so much for this great post. Very strait forward way of handling functions.
    If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table.

    Learn Analytics

    • I would love to see Power Query being able to call VBA functions as well. Sometimes we don´t have time to port some VBAs functions to the M language. Therefore, being able to call the functions directly from Power Query would be really very very exciting as well… What do you think Chriss?

  3. Here is my take on it that I developed some time ago. My approach dynamically gets the sheet names from the files – so you could have each file with different number of sheets and named differently. It also takes care of .xlsx/.xlsb/.xls file formats

    First Define a Function – GetPath- This gets the Path from a cell named File_Path on a sheet called SET_PARAMETERS

    let fGetPath=()=>

    let Source = Excel.CurrentWorkbook(),
    fPath = Source{[Name=”File_Path”]}[Content],
    fPath1 = fPath{0}[Column1]
    in
    fPath1
    in
    fGetPath

    Then define another function called GetSheets – This gets all the sheet names from the file

    let fGetShts=(fPath ,fName)=>
    let
    Source = Excel.Workbook(File.Contents(fPath & fName),true),
    DataTbl = Table.SelectColumns(Source,{“Data”}),
    Column_Names = Table.ColumnNames(DataTbl{0}[Data]),
    Expand_tbl = Table.ExpandTableColumn(DataTbl, “Data”, Column_Names, Column_Names)

    in
    Expand_tbl
    in
    fGetShts

    Finally a Query called GetData

    let
    fPath = GetPath(),
    Source = Folder.Files(fPath),
    Custom = Table.AddColumn(Source, “Data”, each GetShts([Folder Path],[Name])),
    DataTbl = Table.SelectColumns(Custom,{“Data”}),
    Column_Names = Table.ColumnNames(DataTbl{0}[Data]),
    Expand_Content = Table.ExpandTableColumn(DataTbl, “Data”, Column_Names, Column_Names)
    in
    Expand_Content

  4. Hi Chris

    I saw this blog when you first posted it and made a note for future reference. Today I had a need – WOW! this is a fabulous, easy to use solution. I have now saved a master Excel template with this code loaded. In future when I need to do this, I will just start with that Workbook rather than doing the whole copy/paste thing again.

    Thanks for the great tool and your ongoing sharing.

  5. Hey Chris,

    Is it possible to combine the data models between worksheets? I have 2 worksheets, one with 24 million, one with 5 million rows loaded into their respective data models through Power Query. I need to merge/join these on a column. I was hoping not to have to re-download 5 million rows since it is already sitting in Power Pivot. Any ideas on this?

  6. I suppose I can live with that for now. Then, a more serious concern is how refreshing this much data will work. I’m pretty sure it always refreshes the entire data set, right? It’d be great if it could just load changes/new records.

  7. Chris, is there a way that you can pass a more meaningful prompt to the user when you invoke the function? Ie because the input parameters are set as text, the prompt says ‘Example: ABC’. I would like to change this default prompt to something more meaningful. Is this possible?

  8. Chris,
    I’ve been using this function since you published it, with great success. One thing I would like to add for debugging is the ability to add a column which contains the file path of the source file as it is combined. Is that possible using this approach?

  9. Thanks for this tool, it has saved me countless hours and eliminates excel copy and paste error on our end.

    Is there a way that we could add a column, stating the file name that it combined from?

    I think I saw this in another combine method before, but I can’t find it.

    • Hi Kyle, if you look for my comment above (dated February 24) you’ll see a link to a workbook containing a version of the function that does exactly that.

  10. Excellent post, thanks. I am wondering if there is any performance improvement, if binary XLSB workbooks are used, instead of xlsx/xlsm. I will try put this theory to the test with 1K workbooks soon!

  11. I’m newbie, please excuse me for stupid question. Is it possible to manipulate data (for example transpose it) before combining it?
    When I work on file and use Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content], it’s work. But when I try to use same manipulation for all files in folder, I failed to change code.

    Can you help me?

  12. Chris – this is awesome. I am using this to load ancient xls workbooks from a client still using Excel 2003. It worked a treat – just had to change the file extension parameter to xls.
    Note that tables don’t work – you need to use sheet names instead. Also you need to ensure that the Power BI Desktop, Office 365 (or whatever version) and Access Database Engine are all the same bit – either 32 bit or 64 bit. You can test this by using Get Data and loading an xls workbook before trying this function.

  13. A couple of observations on the function Excel.Workbook(): it can’t read Hidden or xlVeryHidden worksheets. However, it can read ranges that exist on a hidden sheet – a possible workaround. It also only reads ranges that are more than one cell, i.e. single cell ranges are ignored.

  14. Hi Chris – this is great! A colleague of mine discovered your post about 2 years ago and we’ve been using this “magic formula” quite a lot!
    I’ve come to face an issue to which you’ll probably answer “why would you want to do that?”. It would be a fair question, but please ignore it and help me out with the solution (if there’s one, that is!) 🙂

    Is it possible to ask the query to check some subfolders only, and not all of them? For example, add a line to the function saying “only look into folders that contain “xxxxx” in their folder name”.

    That would be absolutely brilliant.

    • Yes, it would be possible – but there’s no need to use this function now that the latest versions of Power BI and Power Query allow you to combine data from multiple Excel workbooks out of the box. Since this is the case, you just need to use the From Folder data source, apply whatever filter you need to get your list of Excel workbooks, then click the Combine button.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s