Speed Up Power Query In Power BI Desktop By Increasing Or Decreasing The Number Of Evaluation Containers

Last week I showed how the new MaxEvaluationWorkingSetInMB registry setting could increase the performance of memory-hungry Power Query queries in Power BI Desktop. In this post I’ll show how the other new registry setting, ForegroundEvaluationContainerCount, can also help performance. Before I carry on I recommend you read the documentation on these new registry settings if you haven’t done so already.

To illustrate the effect of this setting I created ten identical Power Query queries feeding an Import mode dataset in a new .pbix file, each of which read data from the same 150MB CSV file, apply the a filter and then count the number of rows returned. These queries don’t require a large amount of memory but do take a couple of seconds to execute:

With ForegroundEvaluationContainerCount not set, refreshing the entire dataset (with background queries disabled) initially showed ten active evaluation containers:

I’m pretty sure these containers were used to determine the schemas of the tables returned (see here for more background); these were then joined by ten more containers which I assume were actually used by the refresh:

With these default settings refresh took 18 seconds according to Profiler.

With ForegroundEvaluationContainerCount set to 3:

This time there were never more than three evaluation containers active at any one time:

…and refresh took 24 seconds.

So we’ve proved that by setting ForegroundEvaluationContainerCount to a low value we can limit the amount of parallelism and, in this case, make performance worse. So why would you ever want to limit the amount of parallelism like this? The maximum amount of memory available to an evaluation container isn’t just controlled by the MaxEvaluationWorkingSetInMB registry setting; as the docs say, the effective maximum is also determined by the number of evaluation containers used. So reducing the amount of parallelism can increase the amount of memory available to each evaluation container and possibly increase performance.

I then created twenty new copies of the Power Query query, bringing the total number of queries in the pbix file to thirty, and set removed the ForegroundEvaluationContainerCount registry key to go back to using the default settings. During refresh I saw that no more than twenty evaluation containers were active – as expected, because the docs state that with the default settings no more than twenty containers will be used. I’ll spare you the screenshot. Refresh took 62 seconds.

Then I set ForegroundEvaluationContainerCount to 30 and refreshed. This time I could see thirty evaluation containers being used during refresh, and refresh took 55 seconds – not a massive improvement, but an improvement that I’m pretty sure can be attributed to the increased parallelism (I suspect that there was some other bottleneck here, possibly IO).

In conclusion the ForegroundEvaluationContainerCount registry setting is another useful tool to improve refresh performance for Import mode datasets (it’s also useful for DirectQuery but that’s something for a future post) in Power BI Desktop. Finding the optimal value to set it too is not straightforward though and is likely to involve a lot of experimentation. As always, please let me know how you get on using it.

Speed Up Power Query In Power BI Desktop By Allocating More Memory To Evaluation Containers

A really useful new Power Query performance enhancement was added to Power BI Desktop in an update to the May release via the Microsoft Store a week or so ago (if you’re not installing Power BI Desktop through the Microsoft Store you’ll have to wait for the June release I’m afraid). You can read the documentation here:


However if you have just read the docs you may be wondering what these two new registry key settings actually do. In this post I’m only going to talk about one, MaxEvaluationWorkingSetInMB; I’ll leave ForegroundEvaluationContainerCount for a future post.

At various times in the past I have blogged about how, when you run a Power Query query, the query itself is executed inside a separate process called an evaluation (or mashup) container and how this process has a limit on the amount of memory it can use. Some transformations such as sorting a table, doing a group by, pivoting and unpivoting require an entire table of data to be held in memory and if these operations require more memory than the evaluation container is able to use then it starts paging and query performance gets a lot worse. This post provides more details:


Two things have now changed though. First of all, the default of amount of memory available to an evaluation container in Power BI Desktop has been increased from 256MB to 432MB. This on its own will make many Power Query queries run a lot faster. Secondly, it is now possible to define how much memory an evaluation container can use yourself via the new MaxEvaluationWorkingSetInMB registry setting described in the documentation.

Here’s an example that shows how much of an impact this can have. In Power BI Desktop I created a Power Query query that reads data from a csv file with around one million rows in it and then sorts the resulting table by the values in one column:

  Source = Csv.Document(
      Delimiter  = ",", 
      Columns    = 16, 
      Encoding   = 1252, 
      QuoteStyle = QuoteStyle.None
  #"Sorted Rows" = Table.Sort(
    {{"Column2", Order.Ascending}}
  #"Sorted Rows"

Using SQL Server Profiler in the way described here, I found that the Power Query query took almost 87 seconds to start returning data and a further 19 seconds to return all the data:

What’s more, in Task Manager I could see that the evaluation container doing the work was limited to using around 423MB of RAM:

I then used Regedit to set MaxEvaluationWorkingSetInMB to 4096, giving each evaluation container a maximum of 4GB of RAM to use:

After restarting Desktop I reran the same query. This time Task Manager showed the evaluation container doing the work using around 1.2GB of RAM:

…and Profiler showed that the query started returning data after only 14 seconds and returned all the data in a further 12 seconds:

As you can see, that’s a massive performance improvement. Before you get too excited about this, though, a few things need to be made clear.

First, this setting only affects the performance of Power Query queries in Power BI Desktop. It does not affect the performance of queries in the Power BI Service, although there is another setting that (I think) will have the same effect for queries that go through an on-premises data gateway – but that’s yet another for a future post. So while this will make development much quicker and easier it won’t make dataset refreshes in the Power BI Service quicker.

Second, you need to be very careful when changing this setting. There’s no safety net here – you can set MaxEvaluationWorkingSetInMB to whatever value you want – and so some care is needed. When a dataset is refreshed then multiple evaluation containers may be used to handle the Power Query transformations, each of which can use the amount of memory specified by MaxEvaluationWorkingSetInMB. Since there’s a finite amount of memory on your development PC it’s important you don’t set MaxEvaluationWorkingSetInMB too high because if you do there’s a risk that Power BI will try to use more memory than you have available and bring your PC to a grinding halt. What’s more there’s no way of knowing how much memory any given query will need without some experimentation, so my advice is that if you do change MaxEvaluationWorkingSetInMB you should only increase it by a small amount and then increase it only if you are sure you need it.

I’d love to hear how much changing this setting improves the performance of your queries. If it does prove to be useful to a large number of people I hope we can get it added to the Options dialog in Power BI Desktop (which is much more convenient than changing a registry key); I also think it would be very useful in Excel Power Query. Please leave a comment with your findings!

Video: Advanced Analytics Features In Power BI

Following on from my last post, another SQLBits session of mine I wanted to highlight was “Advanced Analytics Features In Power BI”. The subject is a bit outside my normal area of expertise but it’s also one that I don’t think gets enough attention: it’s all the features available in Power BI reports that can help you explain why something happened rather than just what happened. Things I talk about include:

  • Adding forecasts to line charts
  • Symmetry shading, ratio lines and clustering on scatter charts
  • The “Explain the increase” and “Find where this distribution is different” features
  • The Key Influencers and Decomposition Tree visuals
  • Custom visuals such as Sanddance
  • Natural language querying with Q&A

Creating Basic Greater Than Or Less Than Rules With Power BI Conditional Formatting

Although the conditional formatting by rules feature in Power BI was released a long time ago, one very common cause of confusion is with how to implement basic “greater than” or “less than” rules. For example, say you have a table with the following data in it:


…and you want to highlight the rows where Sales are greater than or equal to 150. So you click on the table, go to the Conditional Formatting options for the Sales field, turn on Background Color formatting and click Advanced Controls:


…then choose to format by rules. You see this dialog:


In particular, the part of this dialog where you set up the rule:


…seems to suggest that you need to enter a “is greater than or equal to” condition and a “is less than” condition for the rule to be valid. This is not true, and you don’t need to enter some arbitrarily large number for the “is less than” condition to make it work. The following screenshot shows how you can set up a rule to highlight all rows where Sales are greater than or equal to 150:


The two things to notice are:

  • I’ve entered 150 in the first condition, as you would expect
  • I have deleted the 0 from the second “is less than” condition, leaving the textbox empty (meaning that the text “Maximum” is visible but greyed out)

Here’s the result:


Job done. What about a slightly more complex but nonetheless common scenario where values greater than 0 are shown as green, values equal to 0 as yellow and values less than 0 as red (with maybe some special handling for blanks too)? Here’s a sample table:


(The Dummy column is only there to make sure the Oranges row, which contains a blank value, is visible in the table)

Here’s a set of rules that does what we need:


And here’s the output:


Implementing Basic Query Folding On A Web Service In Power Query/M And Power BI

The more advanced Power Query/M developers among you will know about query folding, the way that the Power Query engine pushes as much of the heavy-lifting of a query back to a data source.  You may also know that it’s possible to implement query folding yourself inside a custom data connector, and there’s a very detailed (but perhaps a little intimidating) tutorial on how to do it here. I’ve been doing a lot of custom data extension development recently and have been learning how to implement query folding myself so I thought another, simple, demonstration of how it works with web services might be useful for those of you in the same situation.

For this example I’m going to use the web service I used in my blog post on Web.Contents() and the Query and RelativePath options, a blog post you should reread before you carry on here because it provides some useful background. It’s the metadata API for the UK government’s open data website and allows you to search for open data datasets. It doesn’t require any authentication so you’ll be able to run all the queries in this post yourself. Note that all the example M code in this post works in Power BI Desktop and does not need to be used in a custom data connector.

Consider the following M function, which I’m going to call SearchData:

(query as text, optional rowlimit as number) as table =>
    QueryRecord = 
    AddRowLimit = 
    CallWebService = 
                Query= AddRowLimit
    Source = 
    ToTable = 
    Expand = 
    ChangeType = 
            {{"title", type text}}


The function takes two parameters:

  • query, the search term to use
  • rowlimit, an optional parameter that limits the number of rows returned by the search

It then calls the API, gets the search results in JSON format and converts it to a table with just one column containing the title of the datasets returned.

Calling the function with the search term “data” and now row limit, like so:


…returns a table of ten rows, the default maximum number of rows returned:


[Note to self: I really need to check out the “Treasure data” dataset]

Using Fiddler to inspect the calls make from the Power Query engine back to the web service (I describe how to do this here) when the query is loaded into a Power BI dataset reveals the following:


Everything is pretty much as you would expect: every time this query is used, no matter how it is used, the same request is used to get data.

Query folding is implemented using the Table.View() M function, and here’s an example of how it can be used with the table above (let’s say this new query is called SearchForDataWithFolding):

        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
        OnTake = (count as number) => 
                SearchData("Data", count)

In the second parameter of Table.View() in this example there are three records in the handler field:

  • GetType, which is called when the Power Query engine needs to know about the data types of the columns of the table returned by this expression. In this case it’s a table with one text column.
  • GetRows, which is called when the Power Query engine wants all the rows from the table (for example when it’s loading data into the dataset)
  • OnTake, which is called when the Power Query engine only wants the top n rows from the table; in this case it provides the top n through the count parameter, and I’m passing that back to my SearchData function via the rowlimit parameter.

Other handlers can be implemented too, but for this web service it only really makes sense to implement OnTake because that’s the only operation that can be folded back.

Here’s what it returns in the Query Editor:


The first thing to point out is that, in the Query Editor, it returns more than ten rows – it returns one thousand rows. Fiddler confirms this:


As the official documentation states at the bottom of this page:

The Power Query experience will always perform an OnTake of 1000 rows when displaying previews in the navigator and query editor, so your users might see significant performance improvements when working with larger data sets.

Similarly, using the Table.FirstN() function on the rows of this table, as follows:


Shows a row limit of three passed back to the web service:


There’s something else interesting to note when the query is loaded into the dataset. Fiddler now shows two calls to the web service:


Two calls to the web service are being made: the first asks for zero rows, the second asks for all the data with no row limit. In this case the following change to SearchForDataWithFolding stops the first call happening and results in only one call to the web service:

        GetType = () => 
                type table [title = Text.Type],
        GetRows = () => 
        OnTake = (count as number) => 
                if count=0 
                #table(type table [title = Text.Type], {}) 
                SearchData("Data", count)

But why is the Power Query engine making this call? Why didn’t it make it on the other query? Did it make the same call twice in the other query but did it cache the result of the first call and then reuse it? Is it trying to find out what columns this query returns? Hmm, a subject for future research I think.

You can download the sample pbix file for this post here.

Nested Display Folders In Power BI

The ability to add columns and measures to display folders in the new Modelling view is one of my favourite features of the November 2018 release of Power BI Desktop: it makes complex models a lot more user-friendly. Being an old-school SSAS developer, I immediately wondered if they worked in the same way that they do in SSAS – and the answer is yes and no.

The good news is that display folders can be nested. Say you have a table called MyTable with three columns called A, B and C. This is what it looks like in the Fields pane in Power BI Desktop by default:


If you select column A in the Fields pane and enter a name (in this case “My Folder”) in the Display folder property in the new modelling view, you’ll see column A is now displayed in a folder in the Fields pane:


That’s obvious. It is also possible to create nested display folders though, and you do this by entering multiple folder names in the Display folder property separated by a backslash \ character. For example, entering “My Folder\My Subfolder” in the Display folder property for column A like so:


…makes column A appear in a folder called My Subfolder which is itself in a folder called My Folder.

To give another example, if column B is given a Display folder property of “My Folder\My Other Subfolder” the result is this:


Note that while the screenshots above are from the new Modelling view, in the Fields pane in other places such as the Report view the folders are displayed in alphabetical order:


You may need to use numbers in your folder names to ensure that everything gets sorted the way you want:


[I’ve just noticed that in the screenshot above column C looks like it’s in the same folder as column B, which it isn’t – that needs fixing and I’ll report it as a bug]

UPDATE: it’s now possible to make a measure appear in more than one folder too: https://blog.crossjoin.co.uk/2020/03/16/making-one-power-bi-measure-appear-in-multiple-folders/

Creating Map Small Multiples In Power BI With The Azure Maps API

Since my post last week on using the Google Image Charts API to create sparklines and small multiples in Power BI has proved very popular, I thought I would do a follow-up showing how to use the Azure Maps API to create map small multiples. Here’s an example of what’s possible, a table from a sample report I built that displays crimes committed in London (sourced from here) in June 2018 with one row for each crime and a map column displaying the location of the crime:


You can find out how to sign up for an Azure Maps account here; it isn’t free to use but you do get 250,000 free map renders per month (which should be more than enough for Power BI use) and any use over that is extremely cheap. Full details on pricing can be found here.

Here’s what the source data in my dataset looks like:


The only important column is the Center column, which contains the longitude of the crime location followed by a comma followed by the latitude of the crime location in a single text value.

With the data in this format you can call the Get Map Image API relatively easily in DAX using a measure something like this:

Map = 
var BaseURL = 
var SubscriptionKey = 
    "insert your key here"
var ApiVersion = 
var Layer = 
var Center = 
    "&center=" & SELECTEDVALUE('London Crime'[Center])
var ZoomLevel = 
var HeightWidth = 
        HASONEVALUE('London Crime'[Center]),
        BaseURL & SubscriptionKey & ApiVersion &
        Layer & Center & ZoomLevel & HeightWidth

You’ll need to paste your Azure Maps API key in on the line highlighted above and set the Data Category for the measure to Image URL. The maximum possible height of an image in a table or matrix in Power BI is, as far as I can see, 150 pixels so that’s why the code above requests an image that is 150×150. You may want to experiment with different zoom levels and layer types to see what looks best on your report.

You can view the sample report here and download a copy of the report (without my API key in) here.

Creating Sparklines And Small Multiples In Power BI Using The Google Image Charts API

The excellent work that David Eldersveld has been doing recently on using SVG images in Power BI has generated a lot of interest; now that the August 2018 release of Power BI Desktop allows you to set data categories on measures – and specifically the Image URL data category – David has been able to apply his work to the creation of sparklines and small multiples in Power BI. He’s already published an example in the Quick Measures gallery here; you might also want to check out this blog post and quick measure by Tom Martens too.

All this got me thinking: what if there was a simple web service that could generate charts, one that didn’t need any authentication, could work with simple GET requests and was free to use? Well guess what, there is one created by Google: the Google Image Charts API. It turns out that it makes it super easy to create sparklines and small multiples in Power BI with very little DAX needed.

Here are two examples of what’s possible with some simple DAX in a measure:



Both examples use data from a table called Sales with the following contents:


The first example, the Sales Sparkline measure, has the following definition:

Sales Sparkline = 
var BaseURL = 
var ChartType = 
var ChartScale = 
var ChartSize = 
var SeriesColour = 
var ChartData = 
    "&chd=t:" & 
    CONCATENATEX(Sales, Sales[Sales], ",", Sales[Month])
        BaseURL & ChartType & ChartScale & 
	ChartSize & SeriesColour & ChartData

All I’m doing is building up the URL to call the API from the parameters listed here. The second measure is basically the same as the first, but the chart type is driven by a selection in a slicer. It’s not bullet-proof code but it shows how easy it is to use the API.

You can see the report live here, and download the sample .pbix file here.

I’m not going to pretend that I’m a data visualisation expert and I know these examples are very basic; however, I think there are a lot of possibilities here for someone with better data visualisation skills than me.

Now for the bad news: the Google Image Charts API has been deprecated since 2012! Here’s the statement on the Google website:

While the dynamic and interactive Google Charts are actively maintained, we officially deprecated the static Google Image Charts way back in 2012. This gives us the right to turn it off without notice, although we have no plans to do so.

Also, there’s a fair usage policy too:

There’s no limit to the number of calls per day you can make to the Google Chart API. However, we reserve the right to block any use that we regard as abusive.

As a result I would not be confident about using it in production. If thousands of Power BI users started using the API it might prompt Google to block traffic from Power BI or turn off the API completely!

I do think Microsoft really needs to address the lack of native support for sparklines and small multiples in Power BI – the fact we still don’t have them is frankly embarrassing (even Power View had them, for crying out loud). Although one solution could be a DAX function that returned a chart image – basically taking David’s work and wrapping it up in a more user-friendly form – another approach would be for Microsoft to build on the Report Page Tooltips functionality. Since we can already design mini reports and make them appear as tooltips, why not make it possible for the same mini reports to appear in a cell in a table or matrix? You could then have measures that returned the name of a report page as a text value, and a new data category for them that tells Power BI to display the contents of the report page in the cell instead of the text. That way you could create small multiples from any visual or set of visuals.

UPDATE: thanks also to Tom Martens for reminding me to point out that this technique involves sending data to a third party, something which might cause legal problems for you.

Colour Names Supported In Power BI Conditional Formatting

When I read about the new conditional formatting by values feature in the August 2018 release of Power BI Desktop, my first thought was to write a blog post on how DAX can  be used to generate the hex values for colours – but then Daniil Masyluk wrote an excellent post on that (and more) yesterday. I then got curious about what colour names are supported when you use a text name rather than a hex code, and Amanda Cofsky of the dev team told me that the official list is the one here:


…although there are other names that might work, assuming they don’t contain punctuation. I loaded that list into Power BI and built the following report:


I’ve certainly learnt a few new colour names from this!

Comparing The Performance Of CSV And Excel Data Sources In Power Query

My posts from two weeks ago (see here and here) on using Process Monitor to troubleshoot the performance of Power Query queries made me wonder about another question: how does the performance of reading data from CSV files compare to the performance of reading data from Excel files? I think most experienced Power Query users in either Power BI or Excel know that Excel data sources perform pretty badly but I had never done any proper tests. I’m not going to pretend that this post is a definitive answer to this question (and once again, I would be interested to hear your experiences) but hopefully it will be thought-provoking.

To start off, I took the 153.6MB CSV file used in my last few posts and built a simple query that applied a filter on one text column, then removed all but three columns. The query ran in 9 seconds and using the technique from my last post I was able to draw the following graph from a Process Monitor log file showing how Power Query reads data from the file:


Nothing very surprising there. Then I opened the same CSV file in Excel and saved the data as an xlsx file with one worksheet and no tables or named ranges; the resulting file was only 80.6MB. Finally I created a duplicate of the first query and pointed it to the Excel file instead. The resulting query ran in 59 seconds – around 6 times slower! Here’s a comparison with the performance of this query with the first query:


The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

In any case you don’t need to study this too hard to understand that the performance of reading data from an xlsx format Excel file is terrible compared to the performance of reading data from a CSV. So, if you have a choice between these two formats, for example when downloading data, it seems fair to say that you should always choose CSV over xlsx.

%d bloggers like this: