Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

Power BI Data Privacy Levels And Cloud /Web-Based Data Sources Or Dataflows

UPDATE: it is now possible to set privacy levels for cloud data sources in the Power BI portal. See https://powerbi.microsoft.com/en-us/blog/privacy-levels-for-cloud-data-sources/

I have already blogged in great detail many times about Power BI/Power Query data privacy settings (see this series for example) but there’s always something new to learn. Recently I was asked a question by Ian Eckert about how Power BI handles data privacy for cloud or web-based data sources after a dataset has been published, and it prompted yet more revelations…

Consider the following M query:

let
    Source = Xml.Tables(
        Web.Contents("https://blog.crossjoin.co.uk/feed/")
        ),
    channel = Source{0}[channel],
    language = channel{0}[language],
    out = Json.Document(
        Web.Contents(
            "https://data.gov.uk/api",
            [
                RelativePath="3/action/package_search",
                Query=[q=language]
            ]
            )
            ),
    result = out[result],
    results = result[results],
    #"Converted to Table" = Table.FromList(
        results,
        Splitter.SplitByNothing(),
        null,
        null,
        ExtraValues.Error
        ),
    Column1 = #"Converted to Table"{0}[Column1],
    #"Converted to Table1" = Record.ToTable(Column1)
in
    #"Converted to Table1"

It doesn’t do anything particularly interesting, but it does take data from one web-based data source (the RSS feed for this blog) and sends it to another (the UK government’s open data metadata search web service). As a result, in Power BI Desktop, if you set the data privacy settings for both data sources to Public then the query runs, but if you set the data privacy settings for both data sources to Private:

image

image

…As expected, you get the following error:

image

Formula.Firewall: Query ‘Test’ (step ‘Converted to Table1’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

Now the strange thing is that, when you publish the dataset that contains this M query, refresh always works. Why? What’s more, other datasets that do something similar will always fail when refreshed.

It turns out that when you publish a dataset that uses cloud or web-based data sources like the two used here, the Power BI service does not use the data privacy settings you have set in Power BI Desktop but instead it automatically assigns data privacy levels as follows:

  • Data sources, like the ones used here, that use Anonymous authentication are automatically given the privacy level Public
  • All other data sources are given the privacy level Private.

Interestingly, Power BI dataflows also count as cloud-based data sources and because they do not use Anonymous authentication they default to Private too, so if you are combining data from a dataflow with another data source in your dataset then you need to be careful of this.

What’s more there is at the time of writing no way to change these data privacy levels in the Power BI web-based portal. Hopefully this will change soon.

There are some workarounds though!

First of all, you can force refresh to take place through a gateway. This might sound strange because in theory, if you’re only using cloud or web-based data sources, a gateway should not be necessary. However there are already similar scenarios where a gateway is needed, for example if you are scraping data from a web page you need to use a gateway, and if you are combining data from a cloud-based data source with an on-premises data source you also need to use a gateway. If you add your cloud/web-based data sources as data sources in your gateway (unfortunately it does not seem to be possible to add a dataflow as a data source in a gateway, though) you can set their data privacy levels in the Advanced Settings section in the Manage Gateways screen:

image

You will also need to set the “Use a data gateway” option to On in the Settings dialog for your dataset after it has been published:

image

The other workaround is to copy the M code for your query and paste it into a new blank M query in an entity in a dataflow, as Matthew Roche shows here. While it does not seem to be possible to set data privacy levels for individual data sources when creating an entity, it is possible to turn off data privacy checks for an entity completely. If you create a query that sends data from one data source to another (regardless, as far as I can see, of the authentication mechanism used), you will see the following message in the Power Query Online query editor:

image

The evaluation was cancelled because combining data from multiple sources may reveal data from one source to another. Click Continue if the possibility of revealing data is okay.

If you click Continue, data privacy checks are turned off and the query runs; you can also click the Options button on the ribbon and check the “Allow combining data from multiple sources” option:

image

If one of your data sources is already itself a dataflow you may need to do some editing of the M query to make things work, but as Matthew Roche shows here it is possible to have an entity in a dataflow refer to another entity without using a computed entity (which is a Premium-only feature).

[Thanks to Arthi Ramasubramanian Iyer from Microsoft for providing background information for this post]

Credentials, Data Privacy Settings And Data Sources In Power Query/Power BI

Recently I’ve been doing some more investigations into how data privacy settings work in Power BI. This is a subject I’ve blogged about in great detail already in a series of posts last year, but this functionality is so complex that there is always more to learn. I don’t have any profound new insights to offer; instead this blog post is a write up of a series of experiments whose results shed light onto how the process of setting data privacy levels works end-to-end.

Consider the following M query:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show?id=" & id
                )
            )
in
    output

The query does the following:

  • Runs a query against the UK government’s open data search API (the same API I use in this post on the RelativePath and Query options for the Web.Contents() function) to get a list of datasets related to the search term “cows” via the Package Search endpoint
  • Gets the first dataset returned by the search and finds its ID
  • Sends this ID to the Package Show endpoint in the same API to get the full JSON representation of this data set. Note that the entire URL is dynamically generated and that the Query option of Web.Contents() is not used here.

It’s a typical scenario where data privacy settings can cause problems: data from one data source, the package_search endpoint, is sent to another data source, the package_show endpoint. My series from last year on data privacy settings provides some useful background information on why this is such an important thing for the Power Query engine.

Assuming that you have never used this API before, when you try to run this query in the Power Query Editor in Power BI Desktop, you’ll see the following prompt to edit the credentials used:

image

Before you click the Edit Credentials button, there are two interesting things to point out. First, in the Query Dependencies view, you see this:

image

Notice that the Package Search endpoint is shown but not the Package Show endpoint.

Second, if you click the Data Source Settings button, you’ll see the following in the dialog that appears:

image

Not only does it only show the Package Search endpoint, there is a warning that says:

“Some data sources may not be listed because of hand-authored queries”

This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.

Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

image

Setting credentials at the level of https://data.gov.uk means you only get prompted once; however if you select https://data.gov.uk/api/3/action/package_search from the dropdown list and click Connect you will get prompted again to set credentials, this time with a dropdown that shows all paths down to package show:

image

Assuming you set credentials at the level of https://data.gov.uk and click Connect, then Fiddler shows that a call is made to https://data.gov.uk/api/3/action/package_search?q=cows, presumably to check whether the credentials entered actually work and you move back to the Query Editor.

image

Next, in the Query Editor, you see the data privacy settings prompt:

image

Clicking Continue brings up the data privacy levels dialog:

image

You have the choice to ignore privacy levels for this file, but of course you should always try to avoid doing that. You also have two dropdown boxes that both show https://data.gov.uk on the left-hand side and another two dropdown boxes next to them, although only the top one of these is enabled.

In the first column of dropdown boxes, in the first dropdown, you can see all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_search:

image

In the dropdown box immediately underneath you can see for the first time all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_show:

image

If you select https://data.gov.uk in the top-left dropdown only the top-right dropdown is enabled, and in the top-right dropdown you can set the data privacy levels Public, Organizational and Private.

image

The meanings of these three levels are described in my earlier series and here, and I won’t go into detail about what they mean in this post. The bottom-right dropdown is disabled because if you set privacy levels for https://data.gov.uk then all urls that start with this path inherit this setting. This is similar to what happens with the None data privacy setting that I describe here, I think.

Setting the data privacy level to Public on https://data.gov.uk in the top-right dropdown means the query runs successfully:

image

The expected activity is shown in Fiddler:

image

And at last the Package Show endpoint is shown in the Query Dependencies view:

image

The Data Source Settings dialog shows the same as it does above in the “Data sources in current file” tab, including the warning about hand-authored queries, but on the “Global permissions” tab there is now an entry for https://data.gov.uk :

image

Although you only set a privacy level for https://data.gov.uk earlier, it’s interesting to note that the entry for https://data.gov.uk/api/3/action/package_search has a privacy level set explicitly to Public and not to None:

image

Stepping back a few steps to the Privacy Levels dialog, if you set a privacy level of Private for https://data.gov.uk like so:

image

…then the query fails with the error “Formula.Firewall: Query ‘Query1’ (step ‘output’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.”:

image

From my point of view, this is the first really interesting finding: the two endpoints, https://data.gov.uk/api/3/action/package_search and https://data.gov.uk/api/3/action/package_show, are considered as separate data sources (which tallies with what is shown in the Query Dependencies view) even though they have both inherited their data privacy level setting from https://data.gov.uk. Since they are both Private then data cannot be sent from one source to the other, hence the error.

The second interesting finding becomes apparent if you follow the steps above with a slightly different version of the original query that uses the Query option in the call to the Package Show endpoint:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show",
                [Query=[#"id"=id]]
                )
            )
in
    output

In this case when you look in the Data Source Settings dialog you now see both endpoints listed and you no longer see the “hand-authored queries” warning:

image

It looks like whatever method it is that the Power Query engine searches for data sources inside a query is confused by dynamically generated urls – which might also explain why data sources that use dynamic urls can’t be refreshed after publishing.

Listing Windows Language Code Identifiers And Their Associated Date And Number Formats With M In Power BI/Power Query

In a comment on my blog post about international date and number formats and changing data types with the “using locale” option in Power Query/Power BI, Jan Karel Pieterse asked if there was any way to get a list of the thousand and decimal separators used for number formatting by each language and region. Since this is exactly the kind of geeky question that fascinates me I decided to write an M query to answer it and – for bonus points – to find the default date format used too.

To start off, I found a table of all Windows Language Code Identifiers on this page:

https://msdn.microsoft.com/en-us/library/cc233982.aspx

Of course this can be loaded into Power Query easily using the “From Web” source. After that it’s easy to add a column to the table that takes a sample date (March 22nd 2018) and number (one hundred thousand and one tenth) and converts it to text using the language code identifier on each row: the Text.From() function does this for dates, and for numbers you have to use Number.ToText() if you want to get thousand separators and decimal separators. There are a few minor problems to deal with, such as the fact that Power Query doesn’t know what to do with dates for the “Congo Swahili” language code identifier and some rows have multiple language tags, but nothing serious.

Here’s the full code:

let
    //Sample dates and numbers to show
    SampleDate = #date(2018,3,22),
    SampleNumber = 100000+(1/10),
    //MS web page with list of language tags
    LocaleWebPage =
        Web.Page(
        Web.Contents(
         "https://msdn.microsoft.com/en-us/library/cc233982.aspx"
         )
         ),
    LocaleList = LocaleWebPage{1}[Data],
    RemoveColumns =
        Table.SelectColumns(
            LocaleList,
            {"Language", "Location (or type)", "Language tag"}
            ),
    SplitColumn =
        Table.SplitColumn(
            RemoveColumns,
            "Language tag",
            Splitter.SplitTextByAnyDelimiter(
                {",","or"},
                QuoteStyle.Csv
                ),
                {"Language tag"}
                ),
    //Create example columns
    DateExample =
        Table.AddColumn(
            SplitColumn,
            "Date",
            each Text.From(SampleDate, [Language tag])
            , Text.Type),
    NumberExample =
        Table.AddColumn(
            DateExample,
            "Number",
            each Number.ToText(SampleNumber,"N", [Language tag])
            , Text.Type),
    //Remove any rows containing errors
    RemoveErrors = Table.RemoveRowsWithErrors(NumberExample)
in
    RemoveErrors

Here’s some of the output:

image

So, if you’ve ever wondered how the Cornish speakers of south-west England like to format their dates or whether the Oromo speakers of Ethiopia use a comma or a full stop as a decimal separator, wonder no more. And if you are not interested in M at all and just want to download an Excel workbook with a list of all LCIDs and how numbers and dates are formatted for them, you can do so here.

Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

(query as text, optional rowlimit as number) as table =>
let
    QueryRecord = 
        [q=query],
    AddRowLimit = 
        if 
            rowlimit=null 
        then 
            QueryRecord 
        else 
            Record.AddField(
                QueryRecord,
                "rows",
                Text.From(rowlimit)
                ),
    CallWebService = 
        Web.Contents(
        "https://data.gov.uk/api", 
            [
                RelativePath="3/action/package_search", 
                Query= AddRowLimit
            ]
        ),
    Source = 
        Json.Document(CallWebService)[result][results],
    ToTable = 
        Table.FromList(
            Source, 
            Splitter.SplitByNothing(), 
            null, 
            null, 
            ExtraValues.Error
            ),
    Expand = 
        Table.ExpandRecordColumn(
            ToTable, 
            "Column1", 
            {"title"},
            {"title"}
            ),
    ChangeType = 
        Table.TransformColumnTypes(
            Expand,
            {{"title", type text}}
            )
in
    ChangeType

 

The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:

SearchData("data")

…returns a table of ten rows, the default maximum number of rows returned:

image

[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:

image

Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                SearchData("Data", count)
        ]
        )

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:

image

The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:

image

As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:

Table.FirstN(SearchForDataWithFolding,3)

Shows a row limit of three passed back to the web service:

image

There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:

image

Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                if count=0 
                then 
                #table(type table [title = Text.Type], {}) 
                else 
                SearchData("Data", count)
        ]
        )

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

Using OpenApi.Document() To Create A Power BI Custom Connector For The Power BI REST API

The idea of creating a Power BI custom connector for the Power BI REST API is not a new one: Miguel Escobar wrote one earlier this year (see here for the source code and documentation). However the Power BI REST API has a lot of endpoints so writing code to support them all would be extremely time-consuming, and there is another option: using the new OpenApi.Document() M function – only available for custom connectors, and not in Power BI Desktop or Excel – to read the API definition from an Open API (aka Swagger) definition file.

Full documentation and samples for using OpenApi.Document are available here, and an Open API definition file for the Power BI REST API is available here. The Power BI REST API uses Azure Active Directory authentication and handling AAD authentication in custom connectors is well documented here; Miguel’s sample shows how to adapt this code for Power BI.

After that, just download the Power BI REST API definition, save it as a .json file and add it to your Visual Studio custom connector project and set the Build Action property of the file to Compile:

image

You can then use the Extension.Contents() M function to read the contents of the file in code and pass that to OpenApi.Document(). The last problem to solve is that OpenApi.Document() only supports basic and anonymous authentication by default, so support for OAuth2 has to be handled using the ManualCredentials option. Here’s the M function definition from my demo project showing how to do this:

[DataSource.Kind="OAuth2Demo", Publish="OAuth2Demo.Publish"]
shared OAuth2.OpenAPIDemo = () =>
    let
        OAuthCredential = Extension.CurrentCredential(),
        OAuthToken = OAuthCredential[access_token],        
        SecurityHeaders = [ Authorization = "Bearer " & OAuthToken],
        PBISwagger = Extension.Contents("PBISwagger.json"),
        CallWebService = 
 	 OpenApi.Document(
	  PBISwagger,
	  [ Headers = SecurityHeaders, ManualCredentials = true ]
	 )
    in
        CallWebService;

Once you build your custom connector, you see all of the endpoints (or at least those that support GET and POST requests) exposed as functions in the Navigation table:

image

It looks super-easy to build a custom connector this way, but is it a actually good idea? I can see a few pros and cons:

  • As you’ll find if you try this, not everything ‘just works’ – there are a few functions that return errors.
  • In this case I embedded the Open API definition file in the project but it’s also possible to download it dynamically using Web.Contents(). This means that whenever the published Open API definition file is updated the custom connector also updates automatically. Are you sure the published Open API definition file will get updated when the API changes, though? What if it doesn’t, or there is an error in it?
  • Connecting to an API is all very well but it’s very important that a custom connector delivers data in a format that is modelled appropriately for Power BI, and very few APIs do this on their own. Pretty much every custom connector I have built has a lot of code in it to transform the data output by the API into something like a star schema, and the need to do this cancels out all the magic automatic stuff that OpenApi.Document() does for you.

Loading Data From Multiple Excel Workbooks Into Power BI–And Making Sure Data Refresh Works After Publishing

I can hear you yawning already – yet another blog post on getting data from multiple Excel workbooks in Power Query and Power BI. Just about everyone who has ever written a blog post on Power BI has written about this subject, including me. However there’s a twist this time: what if your Excel workbooks are stored in SharePoint or OneDrive For Business? If they are, then your dataset may not refresh successfully after you have published unless you load your data in a particular way.

Some background first. A few weeks ago I was contacted by a reader who had seen my post on data refresh errors and the Web.Contents() function and was experiencing the same issue when using Web.Contents() to get data from multiple Excel workbooks stored in SharePoint. Up until recently the Web.Contents() function – which is used by the From Web option in Power BI’s Get Data experience – was the only way to get data reliably from single Excel files stored in SharePoint or OneDrive For Business. However the limitations on Web.Contents(), M functions and data refresh described in my blog post meant that it wasn’t possible to use Web.Contents() to get data from multiple Excel files stored in SharePoint or OneDrive For Business.

The April 2016 Power BI Desktop update introduced a new way of getting data from Excel files stored in SharePoint: the SharePoint Files data source, based on the SharePoint.Files() M function. Both Mike Carlo and Ken Puls have already blogged about this in depth and so I won’t repeat what they’ve written; their posts have a lot of good information on how to construct the URLs to find your files in SharePoint. How do you use it to combine data from multiple Excel workbooks though?

Imagine you have four identically-structure Excel workbooks stored in a folder in OneDrive For Business:

image

Each one has a table called Table1 with some sales data in it:

image

In Power BI Desktop, create a new query and click the Get Data button. In the Get Data dialog, go to Files and click on SharePoint Folder:

image

Enter the URL for your OneDrive For Business site:

image

You’ll then see all the files in all your folders in OneDrive For Business:

image

Filter the folders in the Folder Path column so you only see the files in the folder containing your Excel workbooks:

image

Right-click on the Content column and select Remove Other Columns to get rid of all but the Content column. Then click the Add Custom Column button and add the following expression:

Excel.Workbook([Content])

image

This tells Power BI to treat each file in this folder as an Excel Workbook. Click OK, and then right-click on the Content column and select Remove (you won’t need this any more). Next, click on the Expand icon in the top right-hand corner of the Custom column and click OK on the flyout:

image

image

This will give you a table listing all of the contents of each workbook:

image

Filter this table so you only have the four tables from each workbook:

Next, right-click on the Data column and select Remove Other Columns, then finally click OK on the Expand icon again:

image

You’ll now have a table containing all of the data from the tables called Table1 in each workbook in the folder:

image

Don’t forget to set appropriate data types for each column (columns will have the data type Any by default, which will be treated as text later on)! You can now build your report and publish it:

image

In order for refresh to work, you’ll need to configure the credentials used by the Power BI service to connect to your data. In the browser, click on the ellipses for the Dataset for your report and select Schedule Refresh:

image

This will take you to the Datasets tab on the Settings page. You can schedule refresh here if you want, but the important thing is to click the Edit Credentials link:

image

If like me you have an Office 365 subscription and use SharePoint Online, then the dialog choose oAuth2 in the Authentication Method dropdown box and click Sign In:

image

You’ll see the Windows Organizational Account sign-in page appear briefly while you are signed in but you shouldn’t need to do anything. And that’s it!

You can now edit the data in any of your Excel workbooks and, once the dataset has refreshed, those changes will show up in the report. If you want to do a manual refresh of the data, clicking the Refresh button at the top of the report won’t do any good; you have to click on the Refresh Now option on the dataset (seen in the screenshot above, just below the Schedule Refresh option).

%d bloggers like this: