Web.Contents(), M Functions And Dataset Refresh Errors In Power BI

One slightly frustrating feature of Power BI is that some of the cool stuff you can do in M code for loading data, and which works in Power BI Desktop (and in Power Query), causes errors when you try to refresh your dataset after it has been published to PowerBI.com. I recently learned some interesting tricks for working around these problems when you are using Web.Contents() and M custom functions, which I thought deserved a wider audience and which are the subject of this post; thanks are due to Curt Hagenlocher of Microsoft and Idan Cohen for sharing this information.

First of all, I recommend you read my previous post on using the RelativePath and Query options with Web.Contents() to get some background, not just on the M functionality I’ll be using but also on the web service I’ll be using in my examples.

Let’s look at an example of where the problem occurs. The following M query uses a function to call the UK government’s open data metadata search API multiple times and then return a result into a table:

let
    Terms = 
      #table(
       {"Term"},
       {{"apples"}, {"oranges"}, {"pears"}}
       ),
    SearchSuccessful = (Term) => 
    let
        Source = 
         Json.Document(
           Web.Contents(
             "https://data.gov.uk/api/3/action/package_search?q=" 
             & Term
            )
           ),
        Success = Source[success]
    in
        Success,
    Output = 
     Table.AddColumn(
       Terms, 
       "Search Successful", 
       each SearchSuccessful([Term])
      )
in
    Output

Here’s the output:

image

This is just a variation on the widely-used M pattern for using functions to iterate over and combine data from multiple data sources; Matt Masson has a good blog describing this pattern here. In this case I’m doing the following:

  • Defining a table using #table() with three rows containing three search terms.
  • Defining a function that calls the metadata API. It takes one parameter, a search term, and returns a value indicating whether the search was successful or not from the JSON document returned. What the API actually returns isn’t relevant here, though, just the fact that I’m calling it. Note the highlighted lines in the code above that show how I’m constructing the URL passed to Web.Contents() by simply concatenating the base URL with the string passed in via the custom function’s Term parameter.
  • Adding a custom column to the table returned by the first step, and calling the function defined in the second step using the search term given in each row.

This query refreshes with no problems in Power BI Desktop. However, when you publish a report that uses this code to PowerBI.com and try to refresh the dataset, you’ll see that refresh fails and returns a rather unhelpful error message:

Data source error Unable to refresh the model (id=1264553) because it references an unsupported data source.

image

image

The problem is that when a published dataset is refreshed, Power BI does some static analysis on the code to determine what the data sources for the dataset are and whether the supplied credentials are correct. Unfortunately in some cases, such as when the definition of a data source depends on the parameters from a custom M function, that static analysis fails and therefore the dataset does not refresh.

The good news is that when, as in this case, the data source is a call to Web.Contents() then Power BI only checks the base url passed into the first parameter during static analysis – and as my previous blog post shows, by using the RelativePath and Query options with Web.Contents() you can leave the value passed to the first parameter as a static string. Therefore, the following version of the query does refresh successfully in Power BI:

let
    Terms = 
      #table(
       {"Term"},
       {{"apples"}, {"oranges"}, {"pears"}}
       ),
    SearchSuccessful = (Term) => 
    let
        Source = 
         Json.Document(
           Web.Contents(
             "https://data.gov.uk/api/3/action/package_search", 
             [Query=[q=Term]]
            )
           ),
        Success = Source[success]
    in
        Success,
    Output = 
     Table.AddColumn(
       Terms, 
       "Search Successful", 
       each SearchSuccessful([Term])
      )
in
    Output

This technique will only work if the url passed to the first parameter of Web.Contents() is valid in itself, is accessible and does not return an error. But what if it isn’t? Luckily there’s another trick you can play: when you specify the Query option it can override parts of the url supplied in the first parameter. For example, take the following expression:

Web.Contents(
 "https://data.gov.uk/api/3/action/package_search?q=apples", 
 [Query=[q="oranges"]]
)

When static analysis is carried out before dataset refresh, the url

https://data.gov.uk/api/3/action/package_search?q=apples

..is evaluated. However when the dataset is actually refreshed, the search term in the Query option overrides the search term in the base url, so that the call to the web service that is actually made and whose data is used by the query is:

https://data.gov.uk/api/3/action/package_search?q=oranges

This means you can specify a base url that isn’t really just a base url just so that static analysis succeeds, and then use the Query option to construct the url you really want to use.

Of course this is all a bit of a hack and I’m sure, eventually, we’ll get to the point where any M code that works in Power BI Desktop and/or Power Query works in a published report. However it doesn’t sound as though this will be happening in the near future so it’s good to know how to work around this problem. I wonder whether there are other, similar tricks you can play with functions that access data sources apart from Web.Contents()? I need to do some testing…

26 thoughts on “Web.Contents(), M Functions And Dataset Refresh Errors In Power BI

  1. Really interesting post, as always Chris. I am actually struggling with apparently the same problem, this case with the AzureEnterprise.Contents() function. If you want more details in order to test it, just mail me. Regards!

  2. I recently stumbled across this problem when deploying a PowerQuery to our SharePoint site to run with Excel Services. I’m going to try this technique instead of passing a static url with the query embedded. It might work! Thanks for your posts!

  3. I stumbled upon this technique myself, the relativePath and Query arguments are well hidden in the documentation for sure.

    Iterating over many requests can help with paged API requests for large datasets, just remember power BI is asynchronous ask the data may not come in in order.

    Merging two data sources can still be a problem on the refresh. I get around this by having one data source pull I the original data then start a second table with:

    Let mergedTable = OriginalTable

    You can now merge as many tables of different API SOURCES together without the refresh complaining.

  4. Fun fact: I added RelativePath as an option to Web.Contents solely to provide a workaround for this problem.

  5. Hello Chris, thank you for this amazing post: it’s been very helpful because I am stuck in this precise scenario (with a slight twist).
    I am trying to parametrize the access query to files inside a specific OneDrive folder.

    I can get the list of all files inside my OneDrive easily with
    src = SharePoint.Files(my_organization_url, [ApiVersion = 15])
    and then get my folder
    filtered = Table.SelectRows(src, each ([Folder Path] = “my_folder_path”))
    I then select two, columns, “Folder Path” and “Name”
    select_cols = Table.SelectColumns(filtered,{“Folder Path”, “Name”})
    and merge them to get my pathname list
    file_path_names = Table.CombineColumns(select_cols,{“Folder Path”, “Name”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”filePathName”)

    At this point, I am ready to apply my function fxExtract to [filePathName]:
    let
    fxExtract =(filepathname as text) as table =>
    let
    data = Excel.Workbook(Web.Contents(filepathname), null, true)
    in
    data
    in
    fxExtract
    by adding a custom column like this:
    extract_data = Table.AddColumn(file_path_names, “wb_data”, each fxExtract([filePathName]))

    After reading your post everybody knows that this will work in Power BI Desktop but it will not support refresh in Power BI Service (I also verified it by myself).

    So i tried to follow your suggestion, decomposing the Web.Contents part but I encountered two blocking problems.

    FIRST PROBLEM – Base URL
    You said that, in order for it to succeed, the first part of the URL must be a valid URL.
    This means I cannot use any “folder” URLs (by omitting the file name in the URL), because then authentication will fail and this in turn will make Web.Contents fail as well.

    (P.S.: I have tried with several paths in my OneDrive and it seems Power BI would not let you connect to them. If you use Web as a source, it will only accepts full file path names., not folder path names).

    Do you have any idea, in light of this, on how to setup the base URL in this case?

    SECOND PROBLEM – Parametrization using Query option
    Even if I figured out the first problem, I would stay stuck on how to pass the file names to the base URL by using the Query option.
    Let’s consider your example for clarity’s sake:
               Web.Contents(
                 working_base_url,
                 [Query=[q=filename]]
                )
    This is obviously wrong in my case, because ‘q’ has been specified as a parameter, so the resulting URL will be:
    working_base_url/?q=filename
    while I am looking for the following form:
    working_base_url/filename

    I actually can’t find any documentation on how to use the optional Query record, but I have a feeling this must be possible to accomplish.
    I would try with [Query=filename] but I first need to solve the first problem.

    Sorry for the lengthy comment and thanks for reading, any help would be greatly appreciated.
    I believe this is an interesting technique that could be useful to many people.

    P.P.S.: I also started to approach the same problem through Microsoft Graph API calls, but I am not sure it’s the way to go because of the ApiKey. I think it might be more of a hassle than a real solution. Nonetheless I actually got to list the files in the folder, but I still need to get the binaries:
    https://graph.microsoft.com/beta/users(USER_ID)/drive/root/children(USERFOLDER_ID)

    • Hmm, this looks like a difficult problem. I have no idea what the answer to either of these questions could be – I would need to spend quite a lot of time testing to try to find out. When I have some free time I’ll try to take a look but it will be some time before I’ll be able to do this I’m afraid.

  6. Hi Chris,
    thanks for your reply.
    It is indeed a tricky scenario, and I posted it to you almost exclusively for informative reasons.
    I would dig deeper myself, but I cannot find any additional reference other than this:
    https://msdn.microsoft.com/en-us/library/mt260892.aspx

    In this MSDN reference there is no focus on parameters and I really don’t know what I am touching there.
    By the way, I tried with [Query=filename] and it doesn’t work (expected I would say).

    I would be more than happy If you can point me towards more thorough reference or documentation, and if there isn’t any I’ll write directly to Microsoft.

    Thanks for your time,
    if I make progress I’ll report back here, this technique could be quite usueful.

    • I can’t resist a challenge, and so even though there are thousands of more important things I should be doing I just spent several hours solving this problem 🙂 I will write up the solution in a day or two in a separate blog post.

      • Chris, this is amazing!
        Your reply really made my day, and I can’t wait to read your take on this issue.

        If this technique proves to be feasible, Power BI users can basically use an arbitrary OneDrive folder as a data source that supports scheduled refresh: this would simply awesome (M language is awesome!)

        Thanks again for the kind interest you showed for this issue.

        Andrea

      • Here are a few hints while you wait… Don’t try using Web.Content(), I don’t think there’s a way it can be made to work here. Instead, because SharePoint.Files() returns a link to the content of each file(in the Content column), you just need to create a function that takes a binary, turns it into an Excel workbook and extracts the table/range/sheet you need, then call that function in a Calculated column on the table returned by SharePoint.Files(), passing the Content column to your function, then finally expand out the resulting column to get at the data.

      • Hello Chris,
        following your hints I was able to develop the solution on my own with two Functions and a Query.
        The dataset now offers the option to schedule updates, this is absolutely fantastic.

        I wont spoil anything to your reader while we wait for the post, but I already noted down a couple of observations to do.
        I will wait for the post to appear and use that comment section.

        As always, I want you to know that your work is greatly appreciated.
        Thank you very much for the help!

  7. Chris,
    Thanks for this excellent post – I am trying it now to get round a similar problem in PowerBI.

    In my case I have defined a separate function which I call from another table.
    I have also been using one of your other posts on monitoring performance to see how long the API calls take to run as there is some concern over the impact of my API calls on the server.
    What I’m a bit puzzled by is why PowerBI seems to call the API twice when the function is called. I have included a fragment of the trace analysis below.
    I wasn’t sure if it was because of the way I was invoking the function or how Web.Content retrieves the data.

    I was wondering if you had ever encountered something similar?

    Many thanks again for your great post and apologies if the question is out of place.

    Date Command Request Duration
    2016-09-15T10:22:28.8108993 Engine/IO/Web/Request/GetResponse RequestUri:https://some url 00:00:16.4505802
    2016-09-15T10:22:14.8284810 Engine/IO/Web/Request/GetResponse RequestUri:https://some url 00:00:13.9823147

  8. Wondering how these can be leverage for function that dynamically iterate over the facebook post id. For e.g In the code below, how do we pass PostID with query option like you shown in your blog?

    Source = Json.Document(Web.Contents(“https://graph.facebook.com/v2.7/”
    & PostID
    &”/insights/post_engaged_users?access_token=”
    & AccessToken)),

    • Try as below :
      Var = AccessToken,
      Json.Document(Web.Contents(“https://graph.facebook.com/v2.7/”
      & PostID
      &”/insights/post_engaged_users,[Query=[access_token=Var]])

      • Thanks @Sachin, but it didn’t work. I think as Chris mentioned it tries to evaluate the URL as it is and it fails as it wont be the valid URL with the “& PostID” in it. Not sure though, but it didn’t work.

      • I am wondering if anyone managed to refresh the Facebook data on power bi service by any chance chance.

  9. Thank You Chris. I was puzzled by this error while working for a customer. The error message thrown by Power Bi online was not at all helpful. Thanks for posting this lovely tip.

  10. Hi guys, same issue here….

    I am pulling paginated rows from API (roughly 500 pages, 50 rows each) via Web.Contents(). The call also needs to contains access token.

    I wonder if someone has already figured out a workaround which could be implemented by a non-programmer?

    Basically, my steps are:
    1. I call API to see how many pages of data there are.
    2. I get the number.
    3. Based on the number, I create the exact number of the “paginated URLs”, I attach access token and I call all those URLs.
    4. I wait.
    5. Voila, my data is in Excel.

    This obviously works in Excel or PBI for Desktop but does not work in PBI Online. I thought I was gonna get heart attack. I spent so many hours on the report to learn that this kind of “data sourcing” is not supported in PBI Online.. Terrible dissapointment..

    Thx for any help!

    • It’s hard to say without knowing a lot more about your API, but there’s no reason why this should not be possible if you can use Web.Contents() in the way I describe in this post.

  11. Hey Chris,

    I have a table in Power BI that contains a single column with id’s I get from a previous api call. I’ve then added a custom column that makes a separate call for each row while appending the id to the end of the path. This worked in desktop, but gave me the head ache you’ve addressed in this post. So, now I have separated the id in my function from the static path to the RelativePath property in my Web.Contents call and though my dataset will now refresh, the credentials cannot be verified without an id at the end of the base URL. This is similar to your example for replacing a Query, but it’s my understanding that RelativePath always gets appended to the end. Is that true and do you know of a work around?

    Thanks

    • RelativePath always gets appended to the end, yes. It sounds like you have hit one of those scenarios where data refresh can’t be made to work, unfortunately – something I’ve run into rather too often myself.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s