Speed Up Power Query In Power BI Desktop By Increasing Or Decreasing The Number Of Evaluation Containers

Last week I showed how the new MaxEvaluationWorkingSetInMB registry setting could increase the performance of memory-hungry Power Query queries in Power BI Desktop. In this post I’ll show how the other new registry setting, ForegroundEvaluationContainerCount, can also help performance. Before I carry on I recommend you read the documentation on these new registry settings if you haven’t done so already.

To illustrate the effect of this setting I created ten identical Power Query queries feeding an Import mode dataset in a new .pbix file, each of which read data from the same 150MB CSV file, apply the a filter and then count the number of rows returned. These queries don’t require a large amount of memory but do take a couple of seconds to execute:

With ForegroundEvaluationContainerCount not set, refreshing the entire dataset (with background queries disabled) initially showed ten active evaluation containers:

I’m pretty sure these containers were used to determine the schemas of the tables returned (see here for more background); these were then joined by ten more containers which I assume were actually used by the refresh:

With these default settings refresh took 18 seconds according to Profiler.

With ForegroundEvaluationContainerCount set to 3:

This time there were never more than three evaluation containers active at any one time:

…and refresh took 24 seconds.

So we’ve proved that by setting ForegroundEvaluationContainerCount to a low value we can limit the amount of parallelism and, in this case, make performance worse. So why would you ever want to limit the amount of parallelism like this? The maximum amount of memory available to an evaluation container isn’t just controlled by the MaxEvaluationWorkingSetInMB registry setting; as the docs say, the effective maximum is also determined by the number of evaluation containers used. So reducing the amount of parallelism can increase the amount of memory available to each evaluation container and possibly increase performance.

I then created twenty new copies of the Power Query query, bringing the total number of queries in the pbix file to thirty, and set removed the ForegroundEvaluationContainerCount registry key to go back to using the default settings. During refresh I saw that no more than twenty evaluation containers were active – as expected, because the docs state that with the default settings no more than twenty containers will be used. I’ll spare you the screenshot. Refresh took 62 seconds.

Then I set ForegroundEvaluationContainerCount to 30 and refreshed. This time I could see thirty evaluation containers being used during refresh, and refresh took 55 seconds – not a massive improvement, but an improvement that I’m pretty sure can be attributed to the increased parallelism (I suspect that there was some other bottleneck here, possibly IO).

In conclusion the ForegroundEvaluationContainerCount registry setting is another useful tool to improve refresh performance for Import mode datasets (it’s also useful for DirectQuery but that’s something for a future post) in Power BI Desktop. Finding the optimal value to set it too is not straightforward though and is likely to involve a lot of experimentation. As always, please let me know how you get on using it.

What The New Visio Web App And Licensing Announcement Means For Power BI

There was an interesting announcement today regarding Visio:

https://www.microsoft.com/en-us/microsoft-365/blog/2021/06/09/bringing-visio-to-microsoft-365-diagramming-for-everyone/

In summary there will soon be a lightweight, web-based version of Visio available to anyone with a Microsoft 365 Business, Office 365 E1/E3/E5, F3, A1, A3 or A5 subscription. Previously Visio was not part of the main M365 plans and was only available as a separate purchase.

So what? As a Power BI user, why should I care? Well the Visio custom visual for Power BI has been around a long time now and it’s really powerful. Unfortunately it’s very rarely used because Power BI developers don’t usually have Visio licences – but this is exactly what is about to change. With these licensing changes pretty much everyone who uses Power BI will have access to the new lightweight Visio web app. It’s not as sophisticated as desktop Visio but I’ll be honest, I’m no Visio expert and it’s good enough for me and really easy to use. As a result this is going to unlock the power of the Power BI Visio visual for a much, much larger number of people!

To get an idea of what you can do with the Power BI Visio visual, this video is a good place to start:

Speed Up Power Query In Power BI Desktop By Allocating More Memory To Evaluation Containers

A really useful new Power Query performance enhancement was added to Power BI Desktop in an update to the May release via the Microsoft Store a week or so ago (if you’re not installing Power BI Desktop through the Microsoft Store you’ll have to wait for the June release I’m afraid). You can read the documentation here:

https://docs.microsoft.com/en-us/power-bi/create-reports/desktop-evaluation-configuration

However if you have just read the docs you may be wondering what these two new registry key settings actually do. In this post I’m only going to talk about one, MaxEvaluationWorkingSetInMB; I’ll leave ForegroundEvaluationContainerCount for a future post.

At various times in the past I have blogged about how, when you run a Power Query query, the query itself is executed inside a separate process called an evaluation (or mashup) container and how this process has a limit on the amount of memory it can use. Some transformations such as sorting a table, doing a group by, pivoting and unpivoting require an entire table of data to be held in memory and if these operations require more memory than the evaluation container is able to use then it starts paging and query performance gets a lot worse. This post provides more details:

https://blog.crossjoin.co.uk/2020/05/21/monitoring-power-query-memory-usage-with-query-diagnostics-in-power-bi/

Two things have now changed though. First of all, the default of amount of memory available to an evaluation container in Power BI Desktop has been increased from 256MB to 432MB. This on its own will make many Power Query queries run a lot faster. Secondly, it is now possible to define how much memory an evaluation container can use yourself via the new MaxEvaluationWorkingSetInMB registry setting described in the documentation.

Here’s an example that shows how much of an impact this can have. In Power BI Desktop I created a Power Query query that reads data from a csv file with around one million rows in it and then sorts the resulting table by the values in one column:

let
  Source = Csv.Document(
    File.Contents("C:\demo.csv"), 
    [
      Delimiter  = ",", 
      Columns    = 16, 
      Encoding   = 1252, 
      QuoteStyle = QuoteStyle.None
    ]
  ), 
  #"Sorted Rows" = Table.Sort(
    Source, 
    {{"Column2", Order.Ascending}}
  )
in
  #"Sorted Rows"

Using SQL Server Profiler in the way described here, I found that the Power Query query took almost 87 seconds to start returning data and a further 19 seconds to return all the data:

What’s more, in Task Manager I could see that the evaluation container doing the work was limited to using around 423MB of RAM:

I then used Regedit to set MaxEvaluationWorkingSetInMB to 4096, giving each evaluation container a maximum of 4GB of RAM to use:

After restarting Desktop I reran the same query. This time Task Manager showed the evaluation container doing the work using around 1.2GB of RAM:

…and Profiler showed that the query started returning data after only 14 seconds and returned all the data in a further 12 seconds:

As you can see, that’s a massive performance improvement. Before you get too excited about this, though, a few things need to be made clear.

First, this setting only affects the performance of Power Query queries in Power BI Desktop. It does not affect the performance of queries in the Power BI Service, although there is another setting that (I think) will have the same effect for queries that go through an on-premises data gateway – but that’s yet another for a future post. So while this will make development much quicker and easier it won’t make dataset refreshes in the Power BI Service quicker.

Second, you need to be very careful when changing this setting. There’s no safety net here – you can set MaxEvaluationWorkingSetInMB to whatever value you want – and so some care is needed. When a dataset is refreshed then multiple evaluation containers may be used to handle the Power Query transformations, each of which can use the amount of memory specified by MaxEvaluationWorkingSetInMB. Since there’s a finite amount of memory on your development PC it’s important you don’t set MaxEvaluationWorkingSetInMB too high because if you do there’s a risk that Power BI will try to use more memory than you have available and bring your PC to a grinding halt. What’s more there’s no way of knowing how much memory any given query will need without some experimentation, so my advice is that if you do change MaxEvaluationWorkingSetInMB you should only increase it by a small amount and then increase it only if you are sure you need it.

I’d love to hear how much changing this setting improves the performance of your queries. If it does prove to be useful to a large number of people I hope we can get it added to the Options dialog in Power BI Desktop (which is much more convenient than changing a registry key); I also think it would be very useful in Excel Power Query. Please leave a comment with your findings!

A Look At Lobe – A Free, Easy-To-Use Tool For Training Machine Learning Models

A few months ago I heard about a new tool from Microsoft called Lobe which makes it easy to train machine learning models. It’s nothing to do with Power BI but I find anything to do with self-service data analytics interesting, and when I finally got round to playing with it today I thought it was so much fun that it deserved a blog post.

You can download it and learn more at https://www.lobe.ai/ and there’s a great ten minute video describing how to use it here:

The most impressive thing about it is not what it does but how it does it: a lot of tools claim to make machine learning easy for non-technical users but Lobe really is easy to use. My AI/ML knowledge is very basic but I got up and running with it extremely quickly.

To test it out I downloaded lots of pictures of English churches and trained a model to detect whether the church had a tower or a spire. After I labelled the pictures appropriately:

…Lobe was able to train the model:

I could test it inside the tool. The model was able to tell whether a church had a tower:

…or a spire:

…very reliably!

If I have one criticism it’s that when you want to use your model things get a lot more technical, at least compared to something like AI Builder for Power Apps and Power Automate, but I guess that’s because it is just a tool for training models. There have been some recent improvements here though (see this blog post) and Lobe does provide a local API for testing purposes that can be consumed in Power BI with some custom M code.

Here’s an example of how to call the local API in Power Query:

let
  Source = Folder.Files("C:\Churches"),
  #"Removed Other Columns"
    = Table.SelectColumns(
    Source,
    {"Content", "Name"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "CallAPI",
    each Text.FromBinary(
      Web.Contents(

        //Insert Lobe Connect URL here                              
        "http://localhost...",
        [
          Content = Json.FromValue(
            [
              image = Binary.ToText(
                [Content],
                BinaryEncoding.Base64
              )
            ]
          ),
          Headers = [
            #"Content-Type"
              = "application/json"
          ]
        ]
      )
    )
  ),
  #"Parsed JSON"
    = Table.TransformColumns(
    #"Added Custom",
    {{"CallAPI", Json.Document}}
  ),
  #"Expanded CallAPI"
    = Table.ExpandRecordColumn(
    #"Parsed JSON",
    "CallAPI",
    {"predictions"},
    {"predictions"}
  ),
  #"Expanded predictions"
    = Table.ExpandListColumn(
    #"Expanded CallAPI",
    "predictions"
  ),
  #"Expanded predictions1"
    = Table.ExpandRecordColumn(
    #"Expanded predictions",
    "predictions",
    {"label", "confidence"},
    {"label", "confidence"}
  ),
  #"Pivoted Column" = Table.Pivot(
    #"Expanded predictions1",
    List.Distinct(
      #"Expanded predictions1"[label]
    ),
    "label",
    "confidence",
    List.Sum
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Pivoted Column",
    {
      {"Tower", type number},
      {"Spire", type number}
    }
  ),
  #"Removed Columns"
    = Table.RemoveColumns(
    #"Changed Type",
    {"Content"}
  )
in
  #"Removed Columns"

You can export models to a variety of other places for production use, including Azure Functions and Azure Machine Learning.

Definitely something to keep an eye on, especially because it will soon be able to do object detection and data classification as well as image classification.

Ten Reasons Why I’m Excited About The New Power BI/Excel Integration Features

My favourite Power BI announcement at the Microsoft Business Applications Summit was, without a doubt, that Excel PivotTables connected to Power BI datasets will very soon work in the browser and not just on the desktop. This is something I have wanted for a long time, way before I joined Microsoft, so this is a feature I have a personal interest in. However I also think it’s an incredibly important step forward for Power BI in general and in this post I’ll outline the reasons why.

Before we carry on please make sure you read this post on the Excel blog which has more details on all of the new Power BI/Excel integration features that are being released. Quick summary: if you’re reading this in late May 2021 you probably won’t have all of this functionality available in your tenant yet but it is coming very soon.

So why exactly am I excited?

It makes Excel a third option for building Power BI reports

Up to now, if you wanted to build Power BI reports and share them with other people online you had two choices: regular Power BI reports and paginated reports. Now Excel gives you a third option: you can upload Power BI-connected Excel workbooks to a Power BI workspace, make them available via a Power BI app, and not only will they be fully interactive but the data in them will also update automatically when the data in your dataset updates.

PivotTables are the best way to explore Power BI data

Why do we need Excel as an option for building reports on data stored in Power BI? The first reason is data exploration. Excel PivotTables are a much better way to explore your data than any other method in Power BI, in my opinion. Why try to recreate an Excel PivotTable using a matrix visual in a regular Power BI report when you can give your users the real thing?

Cube functions also now work in the browser – and they make it easy to design financial reports

The Excel cube functions (CubeMember, CubeValue etc) are, I think, the best-kept feature in Excel. While PivotTables are great for exploring data they aren’t always so great when you want to build highly-formatted reports. The Excel cube functions make it easy to bind individual cells in a worksheet to individual values in your dataset and because they’re just like any other Excel function they allow you to use all of Excel’s formatting and charting functionality. This then makes it possible to build certain types of report, such as financial reports, much more easily. If you want to learn more about them check out this video from Peter Myers – it shows how to use them with Analysis Services but they work just the same when connected to a Power BI dataset.

Organisational data types make it easy to access Power BI data

While the Excel cube functions are very powerful they are also somewhat difficult to use and sometimes suffer from performance problems. The new organisational data types in Excel do something very similar and while they don’t yet have all the features you need to build complex reports they are also a lot easier to understand for most business users.

Excel formulas are easier than DAX for a many calculations

Everyone knows DAX can be hard sometimes. However, once you’ve got the data you need from your dataset into Excel using a PivotTable, cube functions or organisational data types you can then do your own calculations on that data using regular Excel formulas. This not only allows business users to add their own calculations easily but for BI professionals it could be the case that an Excel formula is easier to write and faster to execute than the equivalent DAX.

Excel can visualise data in ways that Power BI can’t

Excel is a very mature data visualisation tool and it has some types of chart and some formatting options that aren’t (yet) available in Power BI’s native visuals. One example that springs to mind is that you can add error bars to a bar chart in Excel; another is sparklines, although they are coming to Power BI later this year.

Power Pivot reports will also work in the browser

Even if you don’t have a Power BI pro licence, if you have a commercial version of Excel you’ll have Power Pivot and the Excel Data Model. And guess what, Power Pivot reports also now work in the browser!

Collaborate in real-time with your colleagues in Excel Online

With Excel reports connected to Power BI stored in OneDrive for Business or a SharePoint document library you get great features for collaboration and co-authoring, so you and your colleagues can analyse data together even if you’re not in the same room.

There’s a lot of other cool stuff happening in Excel right now

The Excel team are on a hot streak at the moment: dynamic arrays, LAMBDAs, LET, the beginnings of Power Query on the Mac and lots more cool new stuff has been delivered recently. If you’re only familiar with the Excel features you learned on a course 20 years ago you’re missing out on some really powerful functionality for data analysis.

Everyone knows Excel!

Last of all, it goes without saying that Excel is by far the most popular tool for working with data in the world. Everyone has it, everyone knows it and everyone wants to use it. As Power BI people we all know how difficult it is to persuade our users to abandon their old Excel habits, so why not meet them halfway? Storing data in a Power BI dataset solves many of the problems of using Excel as a reporting tool: no more manual exports, old-of-date data or multiple versions of the truth. Using Excel to build reports on top of a Power BI dataset may be much easier to learn and accept for many business users – at least at first – than learning how to build reports in Power BI Desktop.

Video: Advanced Analytics Features In Power BI

Following on from my last post, another SQLBits session of mine I wanted to highlight was “Advanced Analytics Features In Power BI”. The subject is a bit outside my normal area of expertise but it’s also one that I don’t think gets enough attention: it’s all the features available in Power BI reports that can help you explain why something happened rather than just what happened. Things I talk about include:

  • Adding forecasts to line charts
  • Symmetry shading, ratio lines and clustering on scatter charts
  • The “Explain the increase” and “Find where this distribution is different” features
  • The Key Influencers and Decomposition Tree visuals
  • Custom visuals such as Sanddance
  • Natural language querying with Q&A

Video: Performance Tuning Power BI Dataset Refresh

The team at SQLBits have been publishing all the session recordings from their last (online) conference on their YouTube channel. There’s a lot of great content there to check out and this post is to highlight one of my sessions, “Performance tuning Power BI dataset refresh”.

In this session I look at all of the factors that can influence how long it takes to import data into Power BI and what you can do to make it faster. Topics covered include:

  • Choosing a dataset storage mode
  • The importance of good data modelling
  • How the type of data source you use effects how quickly data can load
  • Ways to measure refresh performance, such as using SQL Server Profiler and Power Query Query Diagnostics
  • Power Query options that can influence refresh times such as disabling data previews
  • Query folding in the Power Query engine
  • Vertipaq engine features that affect refresh, such as calculated columns and calculated tables
  • How dataflows can help refresh performance

Power Query And Power BI Connectivity Announcements At The Microsoft Business Applications Summit

There were a lot of exciting announcements at the Microsoft Business Applications Summit this week but if you only watched the keynotes or read the recap on the Power BI blog you will have missed all the Power Query-related news in the “Data Prep in Power BI, Power Platform and Excel using Power Query” session:

https://mymbas.microsoft.com/sessions/1332f59f-a051-4a06-ae50-8f3185501a88

It covers all the new things that have happened in Power Query over the last few months such as Diagram View and, more importantly, talks about what’s going to happen in the next few months. It’s relatively short but for those of you with no time or patience, here’s a summary of the roadmap announcements:

[BTW “Power Query Online” is the browser-based version of Power Query that is used in Power BI dataflows]

My highlights are:

  • The ability to create a dataflow quickly by uploading a file to Power Query Online without needing to use a gateway to connect to a file on-premises, useful for one-time import scenarios.
  • Multi-value M parameter support – useful for dynamic M parameters and other things I can’t talk about yet 😉
  • The things that Miguel talks about regarding “easier design experiences” for Synapse are kept intentionally vague but it’s worth listening to carefully to what he says here!
  • Native SQL support for Snowflake, BigQuery and Redshift – this is really useful for anyone who wants to use DirectQuery with these databases because it will allow you to write your own SQL query and use it as the source of a table, rather than having to use a table or a view.
  • AAD based Single Sign-On support for Redshift and BigQuery (similar to what we have today for Snowflake) will also be very important for DirectQuery, because it means that the identity of the user running the report can be passed back to the database.
  • A dataflows connector for Excel Power Query – which means, at last, you’ll be able to get data from a dataflow direct into Excel. This will make a lot of Excel users very happy, I think: a lot of the time all users want is a table of data dumped to Excel and dataflows will be a great way to do provide them with that.

Last of all, the session showcases the great new home for all things Power Query – http://www.powerquery.com/ – which has great resources, newly-updated documentation and a blog. Make sure you check it out!

Power BI, Excel Organisation Data Types And Images

Excel Organisation data types were released last year (see here for details), but did you know that you can now use them to bring images as well as text and numbers into Excel? Here’s a super-simple example that shows you how to do this.

Here’s a table called ‘Fruit With Image’ in a dataset that I have published to the Power BI Service:

Notice that the Data Category property on the Image column, which contains the URL of a picture of each type of fruit listed, to “Image URL” (for more details on what this does see here). If I use this table in a Power BI report, I see the name of each fruit and a picture:

So far no surprises. I can also set this table up as a Featured Table (for more details see here) so it can be used as the source for an Organisation Data Type in Excel:

The cool thing is that when I type these fruit names into Excel and mark them as the “Fruit With Image” data type (see here for more details), I can then access the Image field and it will show the image that the URL points to inside a cell:

Measuring DirectQuery Performance In Power BI

If you have a slow DirectQuery report in Power BI one of the first questions you need to ask is how long the SQL queries that Power BI generates take to run. This is a more complicated question to answer than you might think, though, and in this post I’ll explain why.

I happen to have access to some of the famous New York taxi data in a Snowflake database, and in there is a table with trip data that has 173 million rows that I have a built a Power BI dataset from. The data and the database used are not really important here though – what is important is that it’s DirectQuery and a large-ish amount of data. Here’s a report page with a single table visual on it, showing passenger count aggregated by the hack license field:

It’s slow, but how slow? Here’s what Performance Analyzer shows when I refresh the table:

The DAX query takes 5.4 seconds but the Direct Query time is only 3.3 seconds – and the numbers don’t seem to add up. Here’s what Profiler captures for the same refresh shown in Performance Analyzer:

This shows there’s a gap of 2 seconds between the DirectQuery End event and the Query End event. What if I paste the DAX query into DAX Studio? Here’s what the Server Timings tab shows:

This is a different query execution to the two examples above, both of which show data for the same execution, which explains why the numbers are slightly different here – but again there seems to be an extra second of stuff happening and DAX Studio suggests that it’s in the Formula Engine.

So what is going on? The answer lies in understanding what the DirectQuery End Profiler event actually measures: it’s the amount of time between the Analysis Services engine handing a query over to the Power Query engine and the Analysis Services engine receiving the first row in the resultset back, including the time taken for the Power Query engine to fold the query.

Therefore if it takes a long time to get all the rows in the resultset then that could explain what’s going on here. Unfortunately there’s no way of knowing from Profiler events how long this takes – but there is another way. Going back to Performance Analyzer, if you export the data from it to JSON (by clicking the Export button) and load it into Power Query, you can see more detail about a DirectQuery query execution. Here’s the data from the first execution above:

[There’s a very good paper documenting what’s in the Performance Analyzer JSON file here]

Looking at the record in the metrics column for the Execute Direct Query event you can see the same 3.2 second duration shown above in Profiler. Notice that there are two other metrics here as well: RowsRead, which is the total number of rows returned by the resultset; and DataReadDuration, which is the amount of time to read these rows after the first row has been received plus some other Analysis Services Engine operations such as encoding of column values, joining with unpushed semijoins, projections of aggregations such as Average and saving the resultset to the in-memory cache. In this case the SQL query has returned 43191 rows and this takes 1.95 seconds – which explains the gap between the end of the Execute Direct Query event and the end of the query.

One last question: why this SQL query is returning so many rows when the DAX query is only asking for the top 502 rows?

The reason is that, at the time of writing at least, the Analysis Services engine can only push a top(n) operation down to a DirectQuery SQL query in very simple scenarios where there are no measures and no aggregation involved – and in this case we’re summing up values. As a result, if you’re using DirectQuery mode and have a visual like this that can potentially display a large number of rows and includes a measure or aggregated values, you may end up with slow performance.

[Thanks to Jeffrey Wang for providing the information in this post]

%d bloggers like this: