Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

[sourcecode language=”text” padlinenumbers=”true”]
(query as text, optional rowlimit as number) as table =>
let
QueryRecord =
[q=query],
AddRowLimit =
if
rowlimit=null
then
QueryRecord
else
Record.AddField(
QueryRecord,
"rows",
Text.From(rowlimit)
),
CallWebService =
Web.Contents(
"https://data.gov.uk/api",
[
RelativePath="3/action/package_search",
Query= AddRowLimit
]
),
Source =
Json.Document(CallWebService)[result][results],
ToTable =
Table.FromList(
Source,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
Expand =
Table.ExpandRecordColumn(
ToTable,
"Column1",
{"title"},
{"title"}
),
ChangeType =
Table.TransformColumnTypes(
Expand,
{{"title", type text}}
)
in
ChangeType
[/sourcecode]

 

The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:

[sourcecode language=”text”]
SearchData("data")
[/sourcecode]

…returns a table of ten rows, the default maximum number of rows returned:

image

[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:

image

Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

[sourcecode language=”text”]
Table.View(
null,
[
GetType = () =>
type table [title = Text.Type],
GetRows = () =>
SearchData("Data"),
OnTake = (count as number) =>
SearchData("Data", count)
]
)
[/sourcecode]

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:

image

The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:

image

As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:

[sourcecode language=”text”]
Table.FirstN(SearchForDataWithFolding,3)
[/sourcecode]

Shows a row limit of three passed back to the web service:

image

There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:

image

Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

[sourcecode language=”text” highlight=”9,10,11″]
Table.View(
null,
[
GetType = () =>
type table [title = Text.Type],
GetRows = () =>
SearchData("Data"),
OnTake = (count as number) =>
if count=0
then
#table(type table [title = Text.Type], {})
else
SearchData("Data", count)
]
)
[/sourcecode]

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

6 thoughts on “Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

  1. Thanks for taking of your time to write these blogs!

    I got to this one because I was reading “Speed Up Data Refresh Performance In Power BI Desktop Using Table.View”.

    I am extremely lost here. I have ten MS SQL connections on a five pages Power BI files. My queries take between 1-10 seconds to run in SQL. Yet, Power BI takes over 2 minutes to refresh. I have no clue as to where, or how, could I try using your code.

Leave a Reply