How Query Folding And The New Power BI Dataflows Connector Can Help Dataset Refresh Performance

You may have noticed that a new dataflows connector was announced in the August 2021 release of Power BI Desktop, and that it now supports query folding between a dataset and a dataflow – which you may be surprised to learn was not possible before. In this post I thought I’d take a look at how much of an improvement in performance this can make to dataset refresh performance.

For my tests I created a new PPU workspace and a dataflow, and made sure the Enhanced Compute Engine was turned on for the dataflow on the Settings page:

Query folding will only happen if the Enhanced Compute Engine is set to “On”, and won’t happen with the “Optimized” setting. The Enhanced Compute Engine is only available with PPU and Premium.

For my data source I used a CSV file with a million rows in and seven integer columns. I then created two tables in my dataflow like so:

The Source table simply connects to the CSV file, uses the first row as the headers, then sets the data type on each column. The second table called Output – which contains no tranformations at all – is needed for the data to be stored in the Enhanced Compute Engine, and the lightning icon in the top-left corner of the table in the diagram shows this is the case.

Next, in Power BI Desktop, I created a Power Query query that used the old Power BI dataflows connector:

If you have any existing datasets that connect to dataflows, this is the connector you will have used – it is based on the PowerBI.Dataflows function. My query connected to the Output table and filtered the rows to where column A is less than 100. Here’s the M code, slightly edited to remove all the ugly GUIDs:

    Source = PowerBI.Dataflows(null),
    ws = Source{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output1 = df{[entity="Output"]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output1, each [A] < 100)
    #"Filtered Rows"

Remember, this connector does not support query folding. Using this technique to measure how long the query ran when the results from the query were loaded into the dataset, I could see it took almost 12.5 seconds to get the data for this query:

In fact the performance in Desktop is worse: when refresh was taking place, I could see Power BI downloading 108MB of data even though the original source file is only 54MB.

Why is the data downloaded twice? I strongly suspect it’s because of this issue – because, of course, no query folding is happening. So the performance in Desktop is really even worse.

I then created the same query with the new dataflows connector:

This connector uses the PowerPlatform.Dataflows function; it’s not new, but what is new is that you can now access Power BI dataflows using it.

Here’s the M code, again cleaned up to remove GUIDS:

    Source = PowerPlatform.Dataflows(null),
    Workspaces = Source{[Id="Workspaces"]}[Data],
    ws = Workspaces{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output_ = df{[entity="Output",version=""]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output_, each [A] < 100)
    #"Filtered Rows"

When this query was loaded into the dataset, it only took 4 seconds:

This is a lot faster, and Power BI Desktop was a lot more responsive during development too.

It’s reasonable to assume that query folding is happening in this query and the filter on [A]<100 is now taking place inside the Enhanced Compute Engine rather than in Power BI Desktop. But how can you be sure query folding is happening? The “View Native Query” option is greyed out, but of course this does not mean that query folding is not happening. However, if you use Query Diagnostics, hidden away in the Data Source Query column of the detailed diagnostics query, you can see a SQL query with the WHERE clause you would expect:

In conclusion, you can see that the new dataflows connector can give you some big improvements for dataset refresh performance and a much better development experience in Power BI Desktop. Query folding support also means that you can now use dataset incremental refresh when using a dataflow as a source. However, you will need to use Premium or PPU, you may also need to make some changes to your dataflow to make sure it can take advantage of the Enhanced Compute Engine, and you will also need to update any existing Power Query queries to use the new connector. I think the potential performance gains are worth making these changes though. If you do make these changes in your dataflows and find that it helps, please leave a comment!

Query Pending Event In Power BI Performance Analyzer

A quick note for anyone like me who spends too much time looking at the JSON exports from Performance Analyzer in Power BI Desktop: you may have noticed an event called Query Pending that isn’t (as yet) documented in the Word doc that explains the format of these JSON files.

It turns out that it’s not that interesting – it’s an event that has been added as part of an effort to make sure there are events to cover the whole of the query lifecycle. After the DAX queries for each visual in your report are generated they are added to a queue before they are executed. In some cases there could be several queries in the queue waiting to be executed, in which case they are said to be “pending”, and the Query Pending event tells you how long a query is in this pending state.

I haven’t seen a duration of longer than a couple of milliseconds for this event though, so you probably don’t need to worry much about it. If you ever do see a long Query Pending event please leave a comment – I’m curious to know what the cause might be.

[Thanks to John Vulner and Jon Ludwig for this information]

Speed Up Power Query In Power BI Desktop By Increasing Or Decreasing The Number Of Evaluation Containers

Last week I showed how the new MaxEvaluationWorkingSetInMB registry setting could increase the performance of memory-hungry Power Query queries in Power BI Desktop. In this post I’ll show how the other new registry setting, ForegroundEvaluationContainerCount, can also help performance. Before I carry on I recommend you read the documentation on these new registry settings if you haven’t done so already.

To illustrate the effect of this setting I created ten identical Power Query queries feeding an Import mode dataset in a new .pbix file, each of which read data from the same 150MB CSV file, apply the a filter and then count the number of rows returned. These queries don’t require a large amount of memory but do take a couple of seconds to execute:

With ForegroundEvaluationContainerCount not set, refreshing the entire dataset (with background queries disabled) initially showed ten active evaluation containers:

I’m pretty sure these containers were used to determine the schemas of the tables returned (see here for more background); these were then joined by ten more containers which I assume were actually used by the refresh:

With these default settings refresh took 18 seconds according to Profiler.

With ForegroundEvaluationContainerCount set to 3:

This time there were never more than three evaluation containers active at any one time:

…and refresh took 24 seconds.

So we’ve proved that by setting ForegroundEvaluationContainerCount to a low value we can limit the amount of parallelism and, in this case, make performance worse. So why would you ever want to limit the amount of parallelism like this? The maximum amount of memory available to an evaluation container isn’t just controlled by the MaxEvaluationWorkingSetInMB registry setting; as the docs say, the effective maximum is also determined by the number of evaluation containers used. So reducing the amount of parallelism can increase the amount of memory available to each evaluation container and possibly increase performance.

I then created twenty new copies of the Power Query query, bringing the total number of queries in the pbix file to thirty, and set removed the ForegroundEvaluationContainerCount registry key to go back to using the default settings. During refresh I saw that no more than twenty evaluation containers were active – as expected, because the docs state that with the default settings no more than twenty containers will be used. I’ll spare you the screenshot. Refresh took 62 seconds.

Then I set ForegroundEvaluationContainerCount to 30 and refreshed. This time I could see thirty evaluation containers being used during refresh, and refresh took 55 seconds – not a massive improvement, but an improvement that I’m pretty sure can be attributed to the increased parallelism (I suspect that there was some other bottleneck here, possibly IO).

In conclusion the ForegroundEvaluationContainerCount registry setting is another useful tool to improve refresh performance for Import mode datasets (it’s also useful for DirectQuery but that’s something for a future post) in Power BI Desktop. Finding the optimal value to set it too is not straightforward though and is likely to involve a lot of experimentation. As always, please let me know how you get on using it.

Speed Up Power Query In Power BI Desktop By Allocating More Memory To Evaluation Containers

A really useful new Power Query performance enhancement was added to Power BI Desktop in an update to the May release via the Microsoft Store a week or so ago (if you’re not installing Power BI Desktop through the Microsoft Store you’ll have to wait for the June release I’m afraid). You can read the documentation here:

However if you have just read the docs you may be wondering what these two new registry key settings actually do. In this post I’m only going to talk about one, MaxEvaluationWorkingSetInMB; I’ll leave ForegroundEvaluationContainerCount for a future post.

At various times in the past I have blogged about how, when you run a Power Query query, the query itself is executed inside a separate process called an evaluation (or mashup) container and how this process has a limit on the amount of memory it can use. Some transformations such as sorting a table, doing a group by, pivoting and unpivoting require an entire table of data to be held in memory and if these operations require more memory than the evaluation container is able to use then it starts paging and query performance gets a lot worse. This post provides more details:

Two things have now changed though. First of all, the default of amount of memory available to an evaluation container in Power BI Desktop has been increased from 256MB to 432MB. This on its own will make many Power Query queries run a lot faster. Secondly, it is now possible to define how much memory an evaluation container can use yourself via the new MaxEvaluationWorkingSetInMB registry setting described in the documentation.

Here’s an example that shows how much of an impact this can have. In Power BI Desktop I created a Power Query query that reads data from a csv file with around one million rows in it and then sorts the resulting table by the values in one column:

  Source = Csv.Document(
      Delimiter  = ",", 
      Columns    = 16, 
      Encoding   = 1252, 
      QuoteStyle = QuoteStyle.None
  #"Sorted Rows" = Table.Sort(
    {{"Column2", Order.Ascending}}
  #"Sorted Rows"

Using SQL Server Profiler in the way described here, I found that the Power Query query took almost 87 seconds to start returning data and a further 19 seconds to return all the data:

What’s more, in Task Manager I could see that the evaluation container doing the work was limited to using around 423MB of RAM:

I then used Regedit to set MaxEvaluationWorkingSetInMB to 4096, giving each evaluation container a maximum of 4GB of RAM to use:

After restarting Desktop I reran the same query. This time Task Manager showed the evaluation container doing the work using around 1.2GB of RAM:

…and Profiler showed that the query started returning data after only 14 seconds and returned all the data in a further 12 seconds:

As you can see, that’s a massive performance improvement. Before you get too excited about this, though, a few things need to be made clear.

First, this setting only affects the performance of Power Query queries in Power BI Desktop. It does not affect the performance of queries in the Power BI Service, although there is another setting that (I think) will have the same effect for queries that go through an on-premises data gateway – but that’s yet another for a future post. So while this will make development much quicker and easier it won’t make dataset refreshes in the Power BI Service quicker.

Second, you need to be very careful when changing this setting. There’s no safety net here – you can set MaxEvaluationWorkingSetInMB to whatever value you want – and so some care is needed. When a dataset is refreshed then multiple evaluation containers may be used to handle the Power Query transformations, each of which can use the amount of memory specified by MaxEvaluationWorkingSetInMB. Since there’s a finite amount of memory on your development PC it’s important you don’t set MaxEvaluationWorkingSetInMB too high because if you do there’s a risk that Power BI will try to use more memory than you have available and bring your PC to a grinding halt. What’s more there’s no way of knowing how much memory any given query will need without some experimentation, so my advice is that if you do change MaxEvaluationWorkingSetInMB you should only increase it by a small amount and then increase it only if you are sure you need it.

I’d love to hear how much changing this setting improves the performance of your queries. If it does prove to be useful to a large number of people I hope we can get it added to the Options dialog in Power BI Desktop (which is much more convenient than changing a registry key); I also think it would be very useful in Excel Power Query. Please leave a comment with your findings!

Video: Advanced Analytics Features In Power BI

Following on from my last post, another SQLBits session of mine I wanted to highlight was “Advanced Analytics Features In Power BI”. The subject is a bit outside my normal area of expertise but it’s also one that I don’t think gets enough attention: it’s all the features available in Power BI reports that can help you explain why something happened rather than just what happened. Things I talk about include:

  • Adding forecasts to line charts
  • Symmetry shading, ratio lines and clustering on scatter charts
  • The “Explain the increase” and “Find where this distribution is different” features
  • The Key Influencers and Decomposition Tree visuals
  • Custom visuals such as Sanddance
  • Natural language querying with Q&A

Creating Basic Greater Than Or Less Than Rules With Power BI Conditional Formatting

Although the conditional formatting by rules feature in Power BI was released a long time ago, one very common cause of confusion is with how to implement basic “greater than” or “less than” rules. For example, say you have a table with the following data in it:


…and you want to highlight the rows where Sales are greater than or equal to 150. So you click on the table, go to the Conditional Formatting options for the Sales field, turn on Background Color formatting and click Advanced Controls:


…then choose to format by rules. You see this dialog:


In particular, the part of this dialog where you set up the rule:


…seems to suggest that you need to enter a “is greater than or equal to” condition and a “is less than” condition for the rule to be valid. This is not true, and you don’t need to enter some arbitrarily large number for the “is less than” condition to make it work. The following screenshot shows how you can set up a rule to highlight all rows where Sales are greater than or equal to 150:


The two things to notice are:

  • I’ve entered 150 in the first condition, as you would expect
  • I have deleted the 0 from the second “is less than” condition, leaving the textbox empty (meaning that the text “Maximum” is visible but greyed out)

Here’s the result:


Job done. What about a slightly more complex but nonetheless common scenario where values greater than 0 are shown as green, values equal to 0 as yellow and values less than 0 as red (with maybe some special handling for blanks too)? Here’s a sample table:


(The Dummy column is only there to make sure the Oranges row, which contains a blank value, is visible in the table)

Here’s a set of rules that does what we need:


And here’s the output:


Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

(query as text, optional rowlimit as number) as table =&gt;
    QueryRecord = 
    AddRowLimit = 
    CallWebService = 
                Query= AddRowLimit
    Source = 
    ToTable = 
    Expand = 
    ChangeType = 
            {{"title", type text}}


The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:


…returns a table of ten rows, the default maximum number of rows returned:


[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:


Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
        OnTake = (count as number) => 
                SearchData("Data", count)

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:


The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:


As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:


Shows a row limit of three passed back to the web service:


There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:


Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
        OnTake = (count as number) => 
                if count=0 
                #table(type table [title = Text.Type], {}) 
                SearchData("Data", count)

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

Nested Display Folders In Power BI

The ability to add columns and measures to display folders in the new Modelling view is one of my favourite features of the November 2018 release of Power BI Desktop: it makes complex models a lot more user-friendly. Being an old-school SSAS developer, I immediately wondered if they worked in the same way that they do in SSAS – and the answer is yes and no.

The good news is that display folders can be nested. Say you have a table called MyTable with three columns called A, B and C. This is what it looks like in the Fields pane in Power BI Desktop by default:


If you select column A in the Fields pane and enter a name (in this case “My Folder”) in the Display folder property in the new modelling view, you’ll see column A is now displayed in a folder in the Fields pane:


That’s obvious. It is also possible to create nested display folders though, and you do this by entering multiple folder names in the Display folder property separated by a backslash \ character. For example, entering “My Folder\My Subfolder” in the Display folder property for column A like so:


…makes column A appear in a folder called My Subfolder which is itself in a folder called My Folder.

To give another example, if column B is given a Display folder property of “My Folder\My Other Subfolder” the result is this:


Note that while the screenshots above are from the new Modelling view, in the Fields pane in other places such as the Report view the folders are displayed in alphabetical order:


You may need to use numbers in your folder names to ensure that everything gets sorted the way you want:


[I’ve just noticed that in the screenshot above column C looks like it’s in the same folder as column B, which it isn’t – that needs fixing and I’ll report it as a bug]

UPDATE: it’s now possible to make a measure appear in more than one folder too:

Creating Map Small Multiples In Power BI With The Azure Maps API

Since my post last week on using the Google Image Charts API to create sparklines and small multiples in Power BI has proved very popular, I thought I would do a follow-up showing how to use the Azure Maps API to create map small multiples. Here’s an example of what’s possible, a table from a sample report I built that displays crimes committed in London (sourced from here) in June 2018 with one row for each crime and a map column displaying the location of the crime:


You can find out how to sign up for an Azure Maps account here; it isn’t free to use but you do get 250,000 free map renders per month (which should be more than enough for Power BI use) and any use over that is extremely cheap. Full details on pricing can be found here.

Here’s what the source data in my dataset looks like:


The only important column is the Center column, which contains the longitude of the crime location followed by a comma followed by the latitude of the crime location in a single text value.

With the data in this format you can call the Get Map Image API relatively easily in DAX using a measure something like this:

Map = 
var BaseURL = 
var SubscriptionKey = 
    "insert your key here"
var ApiVersion = 
var Layer = 
var Center = 
    "&center=" & SELECTEDVALUE('London Crime'[Center])
var ZoomLevel = 
var HeightWidth = 
        HASONEVALUE('London Crime'[Center]),
        BaseURL & SubscriptionKey & ApiVersion &
        Layer & Center & ZoomLevel & HeightWidth

You’ll need to paste your Azure Maps API key in on the line highlighted above and set the Data Category for the measure to Image URL. The maximum possible height of an image in a table or matrix in Power BI is, as far as I can see, 150 pixels so that’s why the code above requests an image that is 150×150. You may want to experiment with different zoom levels and layer types to see what looks best on your report.

You can view the sample report here and download a copy of the report (without my API key in) here.

Creating Sparklines And Small Multiples In Power BI Using The Google Image Charts API

The excellent work that David Eldersveld has been doing recently on using SVG images in Power BI has generated a lot of interest; now that the August 2018 release of Power BI Desktop allows you to set data categories on measures – and specifically the Image URL data category – David has been able to apply his work to the creation of sparklines and small multiples in Power BI. He’s already published an example in the Quick Measures gallery here; you might also want to check out this blog post and quick measure by Tom Martens too.

All this got me thinking: what if there was a simple web service that could generate charts, one that didn’t need any authentication, could work with simple GET requests and was free to use? Well guess what, there is one created by Google: the Google Image Charts API. It turns out that it makes it super easy to create sparklines and small multiples in Power BI with very little DAX needed.

Here are two examples of what’s possible with some simple DAX in a measure:



Both examples use data from a table called Sales with the following contents:


The first example, the Sales Sparkline measure, has the following definition:

Sales Sparkline = 
var BaseURL = 
var ChartType = 
var ChartScale = 
var ChartSize = 
var SeriesColour = 
var ChartData = 
    "&chd=t:" & 
    CONCATENATEX(Sales, Sales[Sales], ",", Sales[Month])
        BaseURL & ChartType & ChartScale & 
	ChartSize & SeriesColour & ChartData

All I’m doing is building up the URL to call the API from the parameters listed here. The second measure is basically the same as the first, but the chart type is driven by a selection in a slicer. It’s not bullet-proof code but it shows how easy it is to use the API.

You can see the report live here, and download the sample .pbix file here.

I’m not going to pretend that I’m a data visualisation expert and I know these examples are very basic; however, I think there are a lot of possibilities here for someone with better data visualisation skills than me.

Now for the bad news: the Google Image Charts API has been deprecated since 2012! Here’s the statement on the Google website:

While the dynamic and interactive Google Charts are actively maintained, we officially deprecated the static Google Image Charts way back in 2012. This gives us the right to turn it off without notice, although we have no plans to do so.

Also, there’s a fair usage policy too:

There’s no limit to the number of calls per day you can make to the Google Chart API. However, we reserve the right to block any use that we regard as abusive.

As a result I would not be confident about using it in production. If thousands of Power BI users started using the API it might prompt Google to block traffic from Power BI or turn off the API completely!

I do think Microsoft really needs to address the lack of native support for sparklines and small multiples in Power BI – the fact we still don’t have them is frankly embarrassing (even Power View had them, for crying out loud). Although one solution could be a DAX function that returned a chart image – basically taking David’s work and wrapping it up in a more user-friendly form – another approach would be for Microsoft to build on the Report Page Tooltips functionality. Since we can already design mini reports and make them appear as tooltips, why not make it possible for the same mini reports to appear in a cell in a table or matrix? You could then have measures that returned the name of a report page as a text value, and a new data category for them that tells Power BI to display the contents of the report page in the cell instead of the text. That way you could create small multiples from any visual or set of visuals.

UPDATE: thanks also to Tom Martens for reminding me to point out that this technique involves sending data to a third party, something which might cause legal problems for you.

%d bloggers like this: