Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

(query as text, optional rowlimit as number) as table =>
let
    QueryRecord = 
        [q=query],
    AddRowLimit = 
        if 
            rowlimit=null 
        then 
            QueryRecord 
        else 
            Record.AddField(
                QueryRecord,
                "rows",
                Text.From(rowlimit)
                ),
    CallWebService = 
        Web.Contents(
        "https://data.gov.uk/api", 
            [
                RelativePath="3/action/package_search", 
                Query= AddRowLimit
            ]
        ),
    Source = 
        Json.Document(CallWebService)[result][results],
    ToTable = 
        Table.FromList(
            Source, 
            Splitter.SplitByNothing(), 
            null, 
            null, 
            ExtraValues.Error
            ),
    Expand = 
        Table.ExpandRecordColumn(
            ToTable, 
            "Column1", 
            {"title"},
            {"title"}
            ),
    ChangeType = 
        Table.TransformColumnTypes(
            Expand,
            {{"title", type text}}
            )
in
    ChangeType

 

The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:

SearchData("data")

…returns a table of ten rows, the default maximum number of rows returned:

image

[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:

image

Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                SearchData("Data", count)
        ]
        )

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:

image

The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:

image

As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:

Table.FirstN(SearchForDataWithFolding,3)

Shows a row limit of three passed back to the web service:

image

There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:

image

Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                if count=0 
                then 
                #table(type table [title = Text.Type], {}) 
                else 
                SearchData("Data", count)
        ]
        )

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

Using OpenApi.Document() To Create A Power BI Custom Connector For The Power BI REST API

The idea of creating a Power BI custom connector for the Power BI REST API is not a new one: Miguel Escobar wrote one earlier this year (see here for the source code and documentation). However the Power BI REST API has a lot of endpoints so writing code to support them all would be extremely time-consuming, and there is another option: using the new OpenApi.Document() M function – only available for custom connectors, and not in Power BI Desktop or Excel – to read the API definition from an Open API (aka Swagger) definition file.

Full documentation and samples for using OpenApi.Document are available here, and an Open API definition file for the Power BI REST API is available here. The Power BI REST API uses Azure Active Directory authentication and handling AAD authentication in custom connectors is well documented here; Miguel’s sample shows how to adapt this code for Power BI.

After that, just download the Power BI REST API definition, save it as a .json file and add it to your Visual Studio custom connector project and set the Build Action property of the file to Compile:

image

You can then use the Extension.Contents() M function to read the contents of the file in code and pass that to OpenApi.Document(). The last problem to solve is that OpenApi.Document() only supports basic and anonymous authentication by default, so support for OAuth2 has to be handled using the ManualCredentials option. Here’s the M function definition from my demo project showing how to do this:

[DataSource.Kind="OAuth2Demo", Publish="OAuth2Demo.Publish"]
shared OAuth2.OpenAPIDemo = () =>
    let
        OAuthCredential = Extension.CurrentCredential(),
        OAuthToken = OAuthCredential[access_token],        
        SecurityHeaders = [ Authorization = "Bearer " & OAuthToken],
        PBISwagger = Extension.Contents("PBISwagger.json"),
        CallWebService = 
 	 OpenApi.Document(
	  PBISwagger,
	  [ Headers = SecurityHeaders, ManualCredentials = true ]
	 )
    in
        CallWebService;

Once you build your custom connector, you see all of the endpoints (or at least those that support GET and POST requests) exposed as functions in the Navigation table:

image

It looks super-easy to build a custom connector this way, but is it a actually good idea? I can see a few pros and cons:

  • As you’ll find if you try this, not everything ‘just works’ – there are a few functions that return errors.
  • In this case I embedded the Open API definition file in the project but it’s also possible to download it dynamically using Web.Contents(). This means that whenever the published Open API definition file is updated the custom connector also updates automatically. Are you sure the published Open API definition file will get updated when the API changes, though? What if it doesn’t, or there is an error in it?
  • Connecting to an API is all very well but it’s very important that a custom connector delivers data in a format that is modelled appropriately for Power BI, and very few APIs do this on their own. Pretty much every custom connector I have built has a lot of code in it to transform the data output by the API into something like a star schema, and the need to do this cancels out all the magic automatic stuff that OpenApi.Document() does for you.

The CredentialQuery Option For Web.Contents() In Power BI Custom Data Connectors

There’s a new, as yet undocumented, option for the Web.Contents() M function that is only available for custom data connectors and not in Power BI Desktop: CredentialQueryOption. It does the same thing as the Query option, but it stops the values you are passing to the query parameter from being logged in Power BI’s diagnostic logs, so it’s useful if you are passing sensitive information such as passwords.

For example, consider the following call to Web.Contents() used in a custom data connector:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 Query=[q="cows"]
]
)

When the custom data connector is used in Power BI Desktop, if diagnostic logging is turned on in the Options dialog:

image

…then, if you look in the log files you’ll see the entire url called including the search term “cows”:

image

However, if you change the M code to:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 CredentialQuery=[q="cows"]
]
)

So that the CredentialQuery option is used instead of the Query option, the search term will not appear in the log file.

Thanks to Curt Hagenlocher for providing this information.

Invoking M Functions In Parallel Using List.ParallelInvoke()

I was looking at the list of M functions supported in custom connectors and not in Power BI Desktop (using the technique I blogged about here) in the latest version of the Power Query SDK when I came across an intriguing new function: List.ParallelInvoke(). It doesn’t seem to be documented anywhere, but I think I’ve worked out what it does and it’s very exciting!

Consider the following M function, declared in a custom connector:

SlowFunction = () as number =>
    Function.InvokeAfter(()=>1, #duration(0,0,0,5));

When you call it, it waits 5 seconds and returns the value 1. If you call it three times and sum up the results, as follows:

List.Sum({SlowFunction(), SlowFunction(), SlowFunction()})

…then after 15 seconds you get the value 3 back.

Now, consider the following expression:

List.Sum(
 List.ParallelInvoke(
  {SlowFunction, SlowFunction, SlowFunction}
 )
)

When this is evaluated in a custom connector, you get the value 3 back after 5 seconds – so it looks like List.ParallelInvoke() allows you to invoke a list of functions in parallel. There’s also an optional second parameter called concurrency, which seems to control the amount of parallelism. So, for example:

List.Sum(
 List.ParallelInvoke(
  {SlowFunction, SlowFunction, SlowFunction},
  2
 )
)

…returns after 10 seconds, suggesting that only two function calls at a time are invoked in parallel.

I can imagine all kinds of uses for this, for example making multiple parallel calls to data sources or doing expensive calculations in parallel. I wonder if it will ever be allowed to be used outside custom connectors?

UPDATE: see Curt Hagenlocher’s comment below for some important information about this function.

Running M Queries In Visual Studio With The Power Query SDK

Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars Schreiber’s M extension for Notepad++ (see here for details) or the M extension for Visual Studio Code (available from the Visual Studio Marketplace here; more details on Brett Powell’s blog here), but the trouble with them is that you have to copy the code back into Excel or Power BI to run it. What many people don’t realise, however, is that it is possible to write M code and have IntelliSense, formatting, keyword highlighting and also the ability to execute your own M queries, using the Power Query SDK in Visual Studio.

The Power Query SDK (which you can download here) supports Visual Studio 2015 and 2017 and is intended for people who are writing custom Data Connectors for Power BI. To let you test your Data Connector you can create a .pq file containing M code, and this in fact allows you to run any M query you want whether you’re building a Data Connector or not.

Here’s how. First, install the Power Query SDK and then open Visual Studio and create a new project. Find the Power Query template, select the PQ file option and give your file a name:

image

Then, in the .pq file that is created, you can enter an M query and then either press the Start button on the toolbar or hit F5 to run the query. The output of the query is shown on the Output tab in the M Query Output window:

image

Right-clicking on your project in the Solution Explorer pane and selecting Properties brings up a Properties dialog with various properties that control how your queries behave:

image

image

Many of the properties are self-explanatory, at least for anyone used to writing M in Power BI or Excel. FastCombine turns off data privacy checks. Allow Native Query lets you use M queries that contain ‘native’ queries (for example your own SQL queries if you’re using a SQL Server data source), as Cédric Charlier shows here. A few of them, such as Legacy Redirects, I have no idea about yet (I should really ask someone…). Turning on Show Engine Traces displays engine trace information in the Log tab of the M Query Output pane; turning on Show User Traces displays trace information generated by the use of the Diagnostics.Trace() function in your code in the Log tab. You can save the contents of the Log tab to a text file.

image

Error messages are displayed on the Errors tab of the M Query Output pane:

image

When you have a query that connects to an external data source, the first time you try to run your query you will be prompted to set the credentials used to connect to that data source (as you would in Power BI Desktop), and the data privacy level for the data source, on the Errors pane:

image

The query won’t actually run this first time though; you’ll need to hit Start/F5 again to see the results. If you close the project and then reopen it you will need to enter credentials again; alternatively, on the Credentials tab you can save the credentials used for a data source to a .crd file which can then be reloaded when you reopen your project. You can also edit and delete credentials on the Credentials tab.

image

If I’m honest it’s all very basic but it does the job. The main thing that I miss from writing M code in Power BI is the Query Editor UI – when I write M code there I only write about 50% of it manually, the rest I generate by clicking buttons in the UI because it’s faster. Give me the Query Editor (or the ribbonless version of it that comes with SSDT, because Visual Studio doesn’t support ribbons apparently) and I’ll be happy. Even better, give me the improved code editing functionality in the Advanced Editor in Power BI Desktop and Excel that we’ve been promised!

The Extension.Contents() M Function

Following on from my post last week about M functions that are only available in custom data extensions, here’s a quick explanation of one of those functions: Extension.Contents().

Basically, it allows you to access the contents of any file you include in the .mez file of your custom data connector. Say you have a text file called MyTextFile.txt:

image

If you create a new Power BI Custom Data Connector project using the SDK, you can add this file to your project in Visual Studio like any other file:

image

Next, select the file and in the Visual Studio Properties pane set the Build Action property to Compile:

image

Setting this property means that when your custom data connector is built, this file is included inside it (the .mez file is just a zip file – if you unzip it you’ll now find this file inside).

Next, in the .pq file that contains the M code for your custom data connector, you can access the contents of this file as binary data using Extension.Contents(“MyTextFile.txt”). Here’s an example function for use in a custom data connector that does this:

[DataSource.Kind="ExtensionContentsDemo", 
Publish="ExtensionContentsDemo.Publish"]
shared ExtensionContentsDemo.Contents = () =>
    let
        GetFileContents = Extension.Contents("MyTextFile.txt"),
        ConvertToText = Text.FromBinary(GetFileContents)
    in
        ConvertToText;

image

In the let expression here the GetFileContents step returns the contents of the text file as binary data and the ConvertToText step calls Text.FromBinary() to turn the binary data into a text value.

When this function is, in turn, called it returns the text from the text file. Here’s a screenshot of a query run from Power BI Desktop (after the custom data connector has been compiled and loaded into Power BI) that does this:

image

Which M Functions Are Only Available To Custom Data Connectors?

Here’s one for all you M geeks out there. If you look at the example code for custom data connectors on the Power BI custom data connectors GitHub repo it’s clear that there are several M functions that are available in custom data connectors that aren’t available in Power BI Desktop. But what are they? As I’m sure you know, you can get a list of all the functions, types and enums available in M using the #shared keyword. Well, I created a simple custom data connector that calls #shared and returned a table of all the functions, types and enums available to a custom data connector, and then compared that table with what #shared returns when you run it in Power BI Desktop. This made it easy to find a list of M functions, types and enums that are only available in custom data connectors, and here are all 34:

  • CryptoAlgorithm.Type
  • CryptoAlgorithm.SHA1
  • CryptoAlgorithm.SHA256
  • Crypto.CreateHmac
  • Crypto.CreateHash
  • Web.SignForOAuth1
  • OAuth1.Type
  • OAuth1.HMACSHA1
  • OAuth1.RSASHA1
  • Extension.Module
  • Extension.CurrentCredential
  • Extension.CurrentApplication
  • Extension.CredentialError
  • Extension.LoadString
  • Extension.Contents
  • Credential.AccessDenied
  • Credential.AccessForbidden
  • Credential.EncryptionNotSupported
  • Credential.NativeQueryPermission
  • Error.Unexpected
  • Uri.Type
  • Binary.End
  • Action.Type
  • Action.Sequence
  • Action.Return
  • Action.Try
  • Action.DoNothing
  • ValueAction.Replace
  • ValueAction.NativeStatement
  • TableAction.InsertRows
  • TableAction.UpdateRows
  • TableAction.DeleteRows
  • WebAction.Request
  • Delta.Since

Some of these, like the Action functions, are documented in the Power Query function reference, and as I said others are mentioned in the Power BI custom data connectors GitHub repo, but there’s definitely some detective work to do here…

%d bloggers like this: