Loading Data From Multiple Excel Workbooks Into Power BI–And Making Sure Data Refresh Works After Publishing

I can hear you yawning already – yet another blog post on getting data from multiple Excel workbooks in Power Query and Power BI. Just about everyone who has ever written a blog post on Power BI has written about this subject, including me. However there’s a twist this time: what if your Excel workbooks are stored in SharePoint or OneDrive For Business? If they are, then your dataset may not refresh successfully after you have published unless you load your data in a particular way.

Some background first. A few weeks ago I was contacted by a reader who had seen my post on data refresh errors and the Web.Contents() function and was experiencing the same issue when using Web.Contents() to get data from multiple Excel workbooks stored in SharePoint. Up until recently the Web.Contents() function – which is used by the From Web option in Power BI’s Get Data experience – was the only way to get data reliably from single Excel files stored in SharePoint or OneDrive For Business. However the limitations on Web.Contents(), M functions and data refresh described in my blog post meant that it wasn’t possible to use Web.Contents() to get data from multiple Excel files stored in SharePoint or OneDrive For Business.

The April 2016 Power BI Desktop update introduced a new way of getting data from Excel files stored in SharePoint: the SharePoint Files data source, based on the SharePoint.Files() M function. Both Mike Carlo and Ken Puls have already blogged about this in depth and so I won’t repeat what they’ve written; their posts have a lot of good information on how to construct the URLs to find your files in SharePoint. How do you use it to combine data from multiple Excel workbooks though?

Imagine you have four identically-structure Excel workbooks stored in a folder in OneDrive For Business:

image

Each one has a table called Table1 with some sales data in it:

In Power BI Desktop, create a new query and click the Get Data button. In the Get Data dialog, go to Files and click on SharePoint Folder:

Enter the URL for your OneDrive For Business site:

You’ll then see all the files in all your folders in OneDrive For Business:

Filter the folders in the Folder Path column so you only see the files in the folder containing your Excel workbooks:

Right-click on the Content column and select Remove Other Columns to get rid of all but the Content column. Then click the Add Custom Column button and add the following expression:

[sourcecode language='text' ]
Excel.Workbook([Content])
[/sourcecode]

This tells Power BI to treat each file in this folder as an Excel Workbook. Click OK, and then right-click on the Content column and select Remove (you won’t need this any more). Next, click on the Expand icon in the top right-hand corner of the Custom column and click OK on the flyout:

This will give you a table listing all of the contents of each workbook:

Filter this table so you only have the four tables from each workbook:

Next, right-click on the Data column and select Remove Other Columns, then finally click OK on the Expand icon again:

You’ll now have a table containing all of the data from the tables called Table1 in each workbook in the folder:

Don’t forget to set appropriate data types for each column (columns will have the data type Any by default, which will be treated as text later on)! You can now build your report and publish it:

In order for refresh to work, you’ll need to configure the credentials used by the Power BI service to connect to your data. In the browser, click on the ellipses for the Dataset for your report and select Schedule Refresh:

This will take you to the Datasets tab on the Settings page. You can schedule refresh here if you want, but the important thing is to click the Edit Credentials link:

If like me you have an Office 365 subscription and use SharePoint Online, then the dialog choose oAuth2 in the Authentication Method dropdown box and click Sign In:

You’ll see the Windows Organizational Account sign-in page appear briefly while you are signed in but you shouldn’t need to do anything. And that’s it!

You can now edit the data in any of your Excel workbooks and, once the dataset has refreshed, those changes will show up in the report. If you want to do a manual refresh of the data, clicking the Refresh button at the top of the report won’t do any good; you have to click on the Refresh Now option on the dataset (seen in the screenshot above, just below the Schedule Refresh option).

28 thoughts on “Loading Data From Multiple Excel Workbooks Into Power BI–And Making Sure Data Refresh Works After Publishing

  1. You know… I could almost swear that there was a config change to the Scheduled Refresh part. I set up a dashboard ages ago that consolidated Excel files stored in OneDrive for Business, and scheduled the refresh. It ran fine for months, then started failing about 3 weeks ago due to “incorrect credentials” or some such. Logging in to fix it showed the OAuth2 authentication which I’d never seen before. Easy enough to fix, fortunately, but still a bit weird.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I’ve seen the same thing too. I think it’s related to updates to the service…

  2. Gents, I’d kind of mentioned this in passing the other day but it’s starting to become an annoyance!
    When I enter my credentials which are of OAuth2 and save all is good and I can refresh from files saved in OneDrive
    When I come back another day my manual refresh fails via PBI webpage . I go to check the problem and it is credentials required
    When I go to the box it’s defaulted to anonymous. I wasn’t joking when I said I cannot seem to make them stick
    It won’t be long before I hand over my application to a user and I know its going to cause no end of confusion if this issue persist
    Am I missing something?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hmm, not as far as I know. As I said, I’ve seen it happen once or twice but not every day.

    2. It’s my understanding that there’s effectively a display bug where updating your credentials doesn’t remember the type of credentials you used the previous time and so it defaults to the first choice in the list. This is unrelated to your token becoming invalid and causing your refresh to fail.

      As for the invalid AAD token, I know of at least two “non-bug” reasons why this can happen: there can be a maximum lifespan for the token (which may depend on your particular AAD tenant) and changing the password will probably invalidate the token. Then of course you may be the victim of a bug of some kind :/.

      1. Thanks Curt!
        I think on this occasion I’ll hold my hand up and say it was because one of my data files was missing from the OneDrive folder
        I however seen this type of issue before and the token thing has definitely got me thinking. Here’s a question, if someone else performs an on demand refresh from the website will it fail because they also need to enter their credentials? Will they be saved and stored unless their token is invalidated in some way?
        I have permissed a number of users to access my group workspace where the data files are saved. Ideally a master set of credentials would apply being me as the group workspace owner/administrator

    3. Hi,

      We used this post’s approach, but we’re facing the same issue as Anthony, and it’s becoming more than just an annoyance, it has upgraded to the serious problem category. We update the credentials, they stick for two or three days, and then the scheduled refresh just fails until we go there and update them again.

      We actually just opened an issue, because it’s becoming a serious problem for our customers:
      https://ideas.powerbi.com/forums/360879-issues/suggestions/16301404-scheduled-refresh-problem-credentials-provided-f

      If you still have this problem, please vote, and share it around. If not, I would really appreciate to learn of a solution for this.

      Tks

  3. Hi Joanna, I’m relieved it’s not only me, I thought I was losing my mind
    Agreed, this needs to be addressed as priority
    The nomenclature for the refresh types also needs review because it is confusing
    I think refreshes should be of 2 types only, on demand or scheduled (one drive refreshes should fall into the latter category)
    For now however I’d settle for my credentials sticking and the hourly OneDrive refresh working reliably
    I’m actively championing usage of these technologies so don’t want to be caught with my pants down when something doesn’t do what it says on the tin

  4. Thank you Chris for your valuable insights about loading multiple workbooks into 1 query. How about if workbooks are password protected? There is error Format not recognized

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I don’t think you are able to extract data from password protected workbooks, unfortunately…

  5. Thank you for this, it has gotten me further than anything else. However, I have a question where I’m stumped.
    in the original files that I’m combining (and consequently how each file comes into the folder and will be added on refresh) there are 4 rows at the top that I don’t need then the 5th row is the column headers.
    is there a step in between these 2 steps you have to add that specific clean up?
    “Next, right-click on the Data column and select Remove Other Columns, then finally click OK on the Expand icon again:”
    and (clean up here?)
    “You’ll now have a table containing all of the data from the tables called Table1 in each workbook in the folder:”

    how do I do this after if not between those steps? If I say remove top 4 rows and promote headers, it only does that for the combined report not each sheet.

    thank you

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Britni, my example is a bit simpler than yours – my Excel tables don’t need any extra cleanup. In your case it sounds like you need to use a function to remove the first four rows, but unfortunately that will also cause problems with scheduled refresh in Power BI… see https://blog.crossjoin.co.uk/2016/05/15/creating-m-functions-from-parameterised-queries-in-power-bi/

  6. Thanks for this, Chris. I struggled a few hours and searched the doc with no luck, till I found your post. This has occured me other times so confirms the hard work you has its benefits for us, the power bi community.

    Anyway we are now facing an error with this solution, as the Oauth2 combo in the cloud conecction manager/updated is not showing for the SharepointListOnPremise (it shows some strange “undefined” value. the issue has been escalated and we are supossely wating any answer.

    If you don’t find this bug at the moment, mi suspicions are that there is a bug in that Oauth2 functionality only in Western Europe o Europe globally. Any workaround or clue you could throw would be appreciated. The bug is now the first in the Issues list at the community (http://community.powerbi.com/t5/Issues/idb-p/Issues and direct link: http://community.powerbi.com/t5/Issues/Invalid-credentials-on-Sharepoint-folder/idi-p/93059).

    Thanks!
    Regards

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Sounds like a bug in the latest release…?

      1. Yes, it is surely a bug. The annoying thing is that the MSFT team is not giving any response on it yet, nor ETA, and many clients are suffering it in the delivered solutions.

  7. Does this still work with Power BI Desktop Feb 2017 edition? I followed your example and get an error 500. Can you please elaborate on the URL chunk(s) that should be used after /personal/?

    Also, can this pick up new xlsx (or csv for that matter) added to the watched folder, or only updates to the files that were there when the query was initially set in Power BI Desktop?

    Thanks for any help. It’s frustrating to jump through hoops while Onedrive is listed as a data source in Power BI Service.

  8. Hi, I tried to do it and I get : Unable to connect

    Details: “Microsoft.Mashup.Engine1.Library.Resources.httpResource:
    Request Failed

    Odata Version: 3 and 4, Error: The remote server returned an error: 404 Not Found”

    any idea how to deal with this?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Do you get this error in Power BI Desktop or when you publish to PowerBI.com?

  9. Chris,

    Great blog on combining excel files. I have created a number of steps for excel files with unstructured data – report headers contain date and category, columns containing more than one set of row data, etc.
    Is it possible to apply these steps to each file, and then combine them? My goal is for a non-Power BI user to add files to a folder to pull in automatically without need for cleaning data in the excel file beforehand.

    Thanks,

    Scott

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Yes, it is possible to do this. However this blog post is now obsolete: the new From Folder functionality in the latest releases of Power BI/Excel will allow you to do what you want.

  10. HI, great post!
    I have a question: do you know how many files you can combine at max?
    I’m pulling data from a sharepoint library which is likely to contain hundreds of excel files so I was wondering how many files this procedure can handle before breaking up.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Elis, there is no limit – but in fact you don’t need to use the code from this post any more because Power BI now has native functionality that allows you to combine data from multiple Excel files: https://blog.crossjoin.co.uk/2016/09/19/loading-data-from-multiple-excel-workbooks-into-power-bi-and-making-sure-data-refresh-works-after-publishing/

Leave a ReplyCancel reply