Web.Contents(), Caching And The ExcludedFromCacheKey Option In Power BI And Power Query

When you’re using the Web.Contents() M function to call a web service from Power Query or Power BI, you don’t necessarily get one HTTP request each time you call the function: some caching takes place, so that if you make the same request multiple times your query won’t waste time asking for the same data over and over. In this post I’m going to share the results of some tests I made to show how caching works with Web.Contents() and what factors influence it.

For my tests I built a simple web service in Microsoft Flow, similar to the one I blogged about here, that accepts a HTTP POST request and calls a stored procedure in an Azure SQL Database. The stored procedure then updates a table in the database and this in turn allows me to count the number of times the web service is called. Finally, the web service returns the value 0 if the stored procedure has executed successfully.

This web service can then be called from either Power Query or Power BI using the Web.Contents() function, something like this (because the URL for the web service is very long I stored it in a parameter called WebServiceURL):

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = Web.Contents(
WebServiceURL,
[Content=Text.ToBinary("Hello")]
),
#"Imported JSON" = Json.Document(Source,1252)
in
#"Imported JSON"
[/sourcecode]

The output of the query when run in Power Query and loaded to an Excel table is this:

image

The first important thing to point out is that the above query, when refreshed in the latest versions of Power Query (I’m running Excel 2016 build 7571.2109) and Power BI (build 2.41.4581.361- November 2016 release), results in a single call to the web service. It might seem like I’m stating the obvious but in the past I’ve seen plenty of cases where a data source has been queried multiple times by Power Query/Power BI even if I was only expecting it to be queried once.

Now, let’s look at a query that calls this web service several times. Here’s the query above converted to a function called fnCallWebService:

[sourcecode language=”text”]
() =>
let
Source = Web.Contents(
WebServiceURL,
[Content=Text.ToBinary("Hello")]),
#"Imported JSON" = Json.Document(Source,1252)
in
#"Imported JSON"
[/sourcecode]

Here’s a query that calls this function once for each row of the following table:

image

[sourcecode language=”text”]
let
Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Row", Int64.Type}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"fnCallWebService",
each fnCallWebService())
in
#"Invoked Custom Function"
[/sourcecode]

In the query above I used the Invoke Custom Function button to call the function for each row in the source table and put the value returned by the function in a new column. The output is this:

image

Even though the function is called four times, once for each row in the original table, that does not mean that the web service is called four times – it isn’t, it’s only called once. In this case Power BI/Power Query knows that each of the four calls to the function is making an identical request to the web service and so it only goes to the web service once, and thereafter uses a cached result the other three times.

One way to stop this caching from taking place is to add an HTTP header to the request to the web service and pass a different value to that header for each call. Here’s another version of my function, now called fnCallWebServiceWithHeaders, which this time takes a number as a parameter and then passes that number to the web service via a header called MyHeader:

[sourcecode language=”text” highlight=”5″]
(RowNum as number) => let
Source = Web.Contents(
WebServiceURL,
[Content=Text.ToBinary("Hello"),
Headers=[MyHeader=Text.From(RowNum)]]),
#"Imported JSON" = Json.Document(Source,1252)
in
#"Imported JSON"
[/sourcecode]

Now if I call this function for each row of the table, and for each call pass the value in the [Row] column through to the function like so:

image

[sourcecode language=”text”]
let
Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Row", Int64.Type}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"fnCallWebServiceWithHeaders",
each fnCallWebServiceWithHeaders([Row]))
in
#"Invoked Custom Function"
[/sourcecode]

…the web service gets hit four times. The presence of a different value for the MyHeader header in each request is enough to prevent any caching from taking place.

It is possible, however, to get Power BI/Power Query to ignore one or more headers when working out whether caching should take place using the ExcludedFromCacheKey option in Web.Contents(). Here’s one more version of my function, now called fnCallWebServiceWithHeadersExlCache, which uses this option:

[sourcecode language=”text” highlight=”7″]
(RowNum as number) =>
let
Source = Web.Contents(
WebServiceURL,
[Content=Text.ToBinary("Hello"),
Headers=[MyHeader=Text.From(RowNum)],
ExcludedFromCacheKey={"MyHeader"}]),
#"Imported JSON" = Json.Document(Source,1252)
in
#"Imported JSON"
[/sourcecode]

The ExcludedFromCacheKey option takes a list of text values which represent the names of headers that are to be ignored when considering which requests can be cached. In the example above my list contains just the one header, MyHeader, and when this is used in a query like so:

[sourcecode language=”text”]
let
Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Row", Int64.Type}}),
#"Invoked Custom Function" = Table.AddColumn(
#"Changed Type",
"fnCallWebServiceWithHeaders",
each fnCallWebServiceWithHeadersExlCache([Row]))
in
#"Invoked Custom Function"
[/sourcecode]

…even though the function is called four times, once for each row in the table, and even though each time Web.Contents() is called it is with a different value passed through to the MyHeader header, only one request is made to the web service and the three subsequent requests are answered from cache again.

In summary, if you’re calling a web service multiple times in a query and especially if you’re using the Headers option in Web.Contents(), this is important stuff to understand because caching can make a big difference to the performance of your queries.

4 thoughts on “Web.Contents(), Caching And The ExcludedFromCacheKey Option In Power BI And Power Query

  1. If you want Web.Contents to ignore the cache and force a new request, you can also set the IsRetry option to true. Semantically, this means that the previous request produced something invalid — an error, maybe, or a message that says “still processing, try again” and so the new request should ignore the result of the previous one.

  2. Hi Chris,

    I have a query that loops over a URL that is parameterised as ?currentpage=1 using List.Generate and Web.Contents. When List.Generate finds a page that returns no data, it stops at that point. The function thus returns all data from each page with data – but when it meets a blank page, it stops looking for more data.

    This has been working successfully for many months – freshing automatically on the Power BI service overnight. But recently it has stopped refreshing properly online and in desktop. It seems like now, Web.Contents returns page 1, no matter what parameter is used … I’ve tried the Power BI Community pages and Microsoft Support. I’ve also tried various tricks like IsRetry = true, Headers = [clearCahceTry = Text.From(Number.Random())], and Binary.Buffer(Web.Contents…)

    I am struggling to work out what else to try, and am wondering if you have any ideas?

Leave a Reply