Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

(query as text, optional rowlimit as number) as table =>
let
    QueryRecord = 
        [q=query],
    AddRowLimit = 
        if 
            rowlimit=null 
        then 
            QueryRecord 
        else 
            Record.AddField(
                QueryRecord,
                "rows",
                Text.From(rowlimit)
                ),
    CallWebService = 
        Web.Contents(
        "https://data.gov.uk/api", 
            [
                RelativePath="3/action/package_search", 
                Query= AddRowLimit
            ]
        ),
    Source = 
        Json.Document(CallWebService)[result][results],
    ToTable = 
        Table.FromList(
            Source, 
            Splitter.SplitByNothing(), 
            null, 
            null, 
            ExtraValues.Error
            ),
    Expand = 
        Table.ExpandRecordColumn(
            ToTable, 
            "Column1", 
            {"title"},
            {"title"}
            ),
    ChangeType = 
        Table.TransformColumnTypes(
            Expand,
            {{"title", type text}}
            )
in
    ChangeType

 

The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:

SearchData("data")

…returns a table of ten rows, the default maximum number of rows returned:

image

[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:

image

Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                SearchData("Data", count)
        ]
        )

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:

image

The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:

image

As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:

Table.FirstN(SearchForDataWithFolding,3)

Shows a row limit of three passed back to the web service:

image

There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:

image

Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

Table.View(
        null, 
        [
        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
                SearchData("Data"),
        OnTake = (count as number) => 
                if count=0 
                then 
                #table(type table [title = Text.Type], {}) 
                else 
                SearchData("Data", count)
        ]
        )

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

Nested Display Folders In Power BI

The ability to add columns and measures to display folders in the new Modelling view is one of my favourite features of the November 2018 release of Power BI Desktop: it makes complex models a lot more user-friendly. Being an old-school SSAS developer, I immediately wondered if they worked in the same way that they do in SSAS – and the answer is yes and no.

The good news is that display folders can be nested. Say you have a table called MyTable with three columns called A, B and C. This is what it looks like in the Fields pane in Power BI Desktop by default:

image

If you select column A in the Fields pane and enter a name (in this case “My Folder”) in the Display folder property in the new modelling view, you’ll see column A is now displayed in a folder in the Fields pane:

image

That’s obvious. It is also possible to create nested display folders though, and you do this by entering multiple folder names in the Display folder property separated by a backslash \ character. For example, entering “My Folder\My Subfolder” in the Display folder property for column A like so:

image

…makes column A appear in a folder called My Subfolder which is itself in a folder called My Folder.

To give another example, if column B is given a Display folder property of “My Folder\My Other Subfolder” the result is this:

image

Note that while the screenshots above are from the new Modelling view, in the Fields pane in other places such as the Report view the folders are displayed in alphabetical order:

image

You may need to use numbers in your folder names to ensure that everything gets sorted the way you want:

image

[I’ve just noticed that in the screenshot above column C looks like it’s in the same folder as column B, which it isn’t – that needs fixing and I’ll report it as a bug]

Unfortunately it does not seem to be possible to make a column or measure appear in multiple folders at once (which was possible in SSAS by using a semi-colon delimited list of folder names), but I don’t think many people would want to do that anyway.

Creating Map Small Multiples In Power BI With The Azure Maps API

Since my post last week on using the Google Image Charts API to create sparklines and small multiples in Power BI has proved very popular, I thought I would do a follow-up showing how to use the Azure Maps API to create map small multiples. Here’s an example of what’s possible, a table from a sample report I built that displays crimes committed in London (sourced from here) in June 2018 with one row for each crime and a map column displaying the location of the crime:

image

You can find out how to sign up for an Azure Maps account here; it isn’t free to use but you do get 250,000 free map renders per month (which should be more than enough for Power BI use) and any use over that is extremely cheap. Full details on pricing can be found here.

Here’s what the source data in my dataset looks like:

image

The only important column is the Center column, which contains the longitude of the crime location followed by a comma followed by the latitude of the crime location in a single text value.

With the data in this format you can call the Get Map Image API relatively easily in DAX using a measure something like this:

Map = 
var BaseURL = 
    "https://atlas.microsoft.com/map/static/png"
var SubscriptionKey = 
    "?subscription-key="
    &
    "insert your key here"
var ApiVersion = 
    "&api-version=1.0"
var Layer = 
    "&layer=hybrid"
var Center = 
    "&center=" & SELECTEDVALUE('London Crime'[Center])
var ZoomLevel = 
    "&zoom=16"
var HeightWidth = 
    "&height=150&width=150"
return
    IF(
        HASONEVALUE('London Crime'[Center]),
        BaseURL & SubscriptionKey & ApiVersion &
        Layer & Center & ZoomLevel & HeightWidth
    )

You’ll need to paste your Azure Maps API key in on the line highlighted above and set the Data Category for the measure to Image URL. The maximum possible height of an image in a table or matrix in Power BI is, as far as I can see, 150 pixels so that’s why the code above requests an image that is 150×150. You may want to experiment with different zoom levels and layer types to see what looks best on your report.

You can view the sample report here and download a copy of the report (without my API key in) here.

Creating Sparklines And Small Multiples In Power BI Using The Google Image Charts API

The excellent work that David Eldersveld has been doing recently on using SVG images in Power BI has generated a lot of interest; now that the August 2018 release of Power BI Desktop allows you to set data categories on measures – and specifically the Image URL data category – David has been able to apply his work to the creation of sparklines and small multiples in Power BI. He’s already published an example in the Quick Measures gallery here; you might also want to check out this blog post and quick measure by Tom Martens too.

All this got me thinking: what if there was a simple web service that could generate charts, one that didn’t need any authentication, could work with simple GET requests and was free to use? Well guess what, there is one created by Google: the Google Image Charts API. It turns out that it makes it super easy to create sparklines and small multiples in Power BI with very little DAX needed.

Here are two examples of what’s possible with some simple DAX in a measure:

SparklineGoogle

ChangeChartTypeGoogle

Both examples use data from a table called Sales with the following contents:

image

The first example, the Sales Sparkline measure, has the following definition:

Sales Sparkline =
var BaseURL =
    "https://chart.googleapis.com/chart?"
var ChartType =
    "cht=ls"
var ChartScale =
    "&chds=0,10"
var ChartSize =
    "&chs=150x150"
var SeriesColour =
    "&chco=000000"
var ChartData =
    "&chd=t:" &
    CONCATENATEX(Sales, Sales[Sales], ",", Sales[Month])
return
    IF(
        HASONEVALUE(Sales[Product]),
        BaseURL & ChartType & ChartScale &
	ChartSize & SeriesColour & ChartData
    )

All I’m doing is building up the URL to call the API from the parameters listed here. The second measure is basically the same as the first, but the chart type is driven by a selection in a slicer. It’s not bullet-proof code but it shows how easy it is to use the API.

You can see the report live here, and download the sample .pbix file here.

I’m not going to pretend that I’m a data visualisation expert and I know these examples are very basic; however, I think there are a lot of possibilities here for someone with better data visualisation skills than me.

Now for the bad news: the Google Image Charts API has been deprecated since 2012! Here’s the statement on the Google website:

While the dynamic and interactive Google Charts are actively maintained, we officially deprecated the static Google Image Charts way back in 2012. This gives us the right to turn it off without notice, although we have no plans to do so.

Also, there’s a fair usage policy too:

There’s no limit to the number of calls per day you can make to the Google Chart API. However, we reserve the right to block any use that we regard as abusive.

As a result I would not be confident about using it in production. If thousands of Power BI users started using the API it might prompt Google to block traffic from Power BI or turn off the API completely!

I do think Microsoft really needs to address the lack of native support for sparklines and small multiples in Power BI – the fact we still don’t have them is frankly embarrassing (even Power View had them, for crying out loud). Although one solution could be a DAX function that returned a chart image – basically taking David’s work and wrapping it up in a more user-friendly form – another approach would be for Microsoft to build on the Report Page Tooltips functionality. Since we can already design mini reports and make them appear as tooltips, why not make it possible for the same mini reports to appear in a cell in a table or matrix? You could then have measures that returned the name of a report page as a text value, and a new data category for them that tells Power BI to display the contents of the report page in the cell instead of the text. That way you could create small multiples from any visual or set of visuals.

UPDATE: thanks also to Tom Martens for reminding me to point out that this technique involves sending data to a third party, something which might cause legal problems for you.

Colour Names Supported In Power BI Conditional Formatting

When I read about the new conditional formatting by values feature in the August 2018 release of Power BI Desktop, my first thought was to write a blog post on how DAX can  be used to generate the hex values for colours – but then Daniil Masyluk wrote an excellent post on that (and more) yesterday. I then got curious about what colour names are supported when you use a text name rather than a hex code, and Amanda Cofsky of the dev team told me that the official list is the one here:

https://www.w3schools.com/cssref/css_colors.asp

…although there are other names that might work, assuming they don’t contain punctuation. I loaded that list into Power BI and built the following report:

image

I’ve certainly learnt a few new colour names from this!

Comparing The Performance Of CSV And Excel Data Sources In Power Query

My posts from two weeks ago (see here and here) on using Process Monitor to troubleshoot the performance of Power Query queries made me wonder about another question: how does the performance of reading data from CSV files compare to the performance of reading data from Excel files? I think most experienced Power Query users in either Power BI or Excel know that Excel data sources perform pretty badly but I had never done any proper tests. I’m not going to pretend that this post is a definitive answer to this question (and once again, I would be interested to hear your experiences) but hopefully it will be thought-provoking.

To start off, I took the 153.6MB CSV file used in my last few posts and built a simple query that applied a filter on one text column, then removed all but three columns. The query ran in 9 seconds and using the technique from my last post I was able to draw the following graph from a Process Monitor log file showing how Power Query reads data from the file:

image

Nothing very surprising there. Then I opened the same CSV file in Excel and saved the data as an xlsx file with one worksheet and no tables or named ranges; the resulting file was only 80.6MB. Finally I created a duplicate of the first query and pointed it to the Excel file instead. The resulting query ran in 59 seconds – around 6 times slower! Here’s a comparison with the performance of this query with the first query:

image

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

In any case you don’t need to study this too hard to understand that the performance of reading data from an xlsx format Excel file is terrible compared to the performance of reading data from a CSV. So, if you have a choice between these two formats, for example when downloading data, it seems fair to say that you should always choose CSV over xlsx.

OData Performance Improvements In The June 2018 Power BI Desktop Release

In the June 2018 release of Power BI Desktop there were a number of improvements made to the way the Power Query engine handles OData data sources. You can read about them here:

https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-june-2018-feature-summary/#oData

However, while testing them out, I noticed one important point that the announcement didn’t make: an existing Power Query query will only benefit from these changes if you make a small change to its M code, adding the Implementation=”2.0” option to the OData.Feed() function.

Take the following M query, running on the UK Houses of Parliament OData API:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
    Source{[
        Name="GovernmentOrganisation",
        Signature="table"
        ]}[Data],
    #"Expanded GroupHasPosition" = 
        Table.ExpandTableColumn(
            GovernmentOrganisation_table, 
            "GroupHasPosition", 
            {"PositionName"}, 
            {"PositionName"}),
    #"Filtered Rows" = 
        Table.SelectRows(
            #"Expanded GroupHasPosition", 
            each [PositionName] = "Chancellor of the Exchequer")
in
    #"Filtered Rows"

It queries a table called GovernmentOrganisation, expands a column called GroupHasPosition and then filters on one of the expanded columns, PositionName, to only return the rows where PositionName contains the text “Chancellor of the Exchequer”.

image

Using Fiddler in the way I describe here, I can see that when this query runs the engine first tries to fold the filter on “Chancellor of the Exchequer” and then when this request returns an error, it defaults to a very slow approach that involves making multiple requests to the API:

image

However, if you change the code above so that the OData.Feed() function uses the Implementation=”2.0” option like so:

let
    Source = OData.Feed(
        "https://api.parliament.uk/odata", 
        null, 
        [Implementation="2.0"]),
    GovernmentOrganisation_table =
    Source{[
        Name="GovernmentOrganisation",
        Signature="table"
        ]}[Data],
    #"Expanded GroupHasPosition" = 
        Table.ExpandTableColumn(
            GovernmentOrganisation_table, 
            "GroupHasPosition", 
            {"PositionName"}, 
            {"PositionName"}),
    #"Filtered Rows" = 
        Table.SelectRows(
            #"Expanded GroupHasPosition", 
            each [PositionName] = "Chancellor of the Exchequer")
in
    #"Filtered Rows"

…then Fiddler shows a completely different request made:

image

In this case it does not try to fold the filter, but it at least does the expansion in a single request. The performance of the resulting query is a lot better.

It looks like all new code generated by the Power Query Editor uses the Implementation=”2.0” option for OData.Feed() so it will get the benefits described in the post on the Power BI blog. Existing M code won’t have this option set though so you will need to update it appropriately. As always you should test thoroughly after making these changes to make sure you do indeed make your query run faster and get the same behaviour as you had before.

%d bloggers like this: