Power BI Dataflow Performance, Premium Per User And The Enhanced Compute Engine

Over the years I have written a lot about Power BI/Power Query performance but it has always been in the context of loading data direct into datasets, not dataflows. A lot of cool things have been happening in dataflows recently, though, and now that Premium Per User has made Premium features to a much wider audience I thought it would be interesting to look at an example of how PPU can help dataflow performance and specifically how and when the Enhanced Compute Engine can make dataflow refresh faster.

Using the same CSV file that I used in my posts from last year on optimising the performance of merges in Power Query, a file with one million rows and seven numeric columns named A, B, C, D, E, F and G, I created the following dataflow in a shared capacity (ie non-Premium) workspace:

The queries called First and Second are identical and just load all the data from the (same) source CSV file; they also have their load disabled. The query called Merge does an inner join between these two queries on the column called A:

The Merge query has its load enabled so it’s the only output of the dataflow; after it has joined the data it expands the nested columns returned and sets the data types on all the output columns.

Refreshing this dataflow in shared capacity took on average 150 seconds.

I then moved the workspace to Premium Per User capacity and without making any changes, I refreshed again.

Refreshing the same dataflow in PPU took on average 73 seconds.

So the first finding is that moving the dataflow to PPU more than halved the refresh time, which is a pretty good result.

However, at this point the Enhanced Compute Engine is not being used – so, to enable it, I enabled loading for the First and Second queries which in turn made the Merge query a Computed Table (what used to be a Computed Entity before the recent terminology changes, indicated by the lightning bolt icon):

For a full explanation of when the Enhanced Compute Engine can and can’t be used see Matthew Roche’s blog post here; basically it loads data into a SQL-based cache which Computed Tables can then leverage which means that data access is faster and the Power Query engine can push transformations back to it via query folding. The only other change I made was to set data types on the columns in the output of First and Second.

Refreshing this new version of the dataflow in PPU took on average 90 seconds

So performance was worse – but why? Enabling loading on First and Second means that more work is done at refresh time because their output needs to be ingested twice (once into ADLSgen2 and once into the SQL cache used by the Enhanced Compute Engine) before the Enhanced Compute Engine can access it. In this case the extra work needed to load First and Second outweighs the performance gains from using the Enhanced Compute Engine. The new metrics available from the dataflow’s Refresh History provide some insight into this (I strongly recommend you read the docs on these metrics here); here’s some of the data from one of the refresh history CSV files loaded into Excel:

In this particular case the overall refresh time of the dataflow was 88 seconds. First and Second refreshed in parallel – First taking 48 seconds and Second taking 51 seconds – and once they had both finished, Merge could refresh and only took 36 seconds to join the output of First and Second. So in this case Merge is indeed faster (36 seconds compared to 73 seconds before) as a result of using the Enhanced Compute Engine but that improvement isn’t enough to cancel out the additional time needed to load the data returned by First and Second into it.

What about a scenario where the Enhanced Compute Engine does make a positive difference? Take a look at the following dataflow, a slight variation on the dataflow above:

There are now three new tables: Ouput Table 1, Output Table 2 and Output Table 3. Each of these tables gets the maximum value from a different column in the table returned by Merge. Note that there are no Computed Tables in this dataflow so the Enhanced Compute Engine is not used, and that First, Second and Merge have load disabled.

Refreshing this dataflow on PPU took on average 95 seconds

Here are the refresh metrics for one of the refreshes:

As you can see, the three tables were refreshed in parallel and took between 84 and 93 seconds. It’s important to remember that for each of these tables the source data was loaded and the Merge query evaluated independently, which explains why they each take so long. The fact that Merge is evaluated three times when this dataflow refreshes is counter-intuitive but really important here – for more details see this post on how queries are evaluated in Power Query.

Now consider this version of the same dataflow where First, Second and Merge have their load enabled, making Merge, Output Table 1, Output Table 2 and Output Table 3 all Computed Tables.

Refreshing this dataflow on PPU took on average 88 seconds

Not a massive improvement, but an improvement. Now look at how different the refresh metrics are:

In this case Output Table 1, Output Table 2 and Output Table 3 only take 1 second to evaluate, but that’s because they are working from data cached in the Enhanced Compute Engine – the table returned by Merge – and the transformations in them fold. The Merge table also uses data cached in the Enhanced Compute Engine: the tables returned by First and Second. What’s more, because Merge is a Computed Table it is only evaluated once in this dataflow. Loading the data for First and Second takes 52 seconds and 50 seconds respectively and Merge takes 35 seconds. In this case the hit of loading the data into the Enhanced Compute Engine is worth taking.

In conclusion, there are two things that these tests have shown:

  • Moving your dataflow to PPU can make a big difference to refresh performance.
  • The Enhanced Compute Engine can make dataflow refresh faster but not in all cases: you need to understand how it works, and in some cases the overhead of loading the data into it outweighs the performance advantages it gives you for any transformations later on. Use the information in Refresh History to work out what’s happening for your dataflow.

5 responses

  1. Thanks for the article Chris and suggested background reading. I saw a benefit from PPU trial, but I thought the benefit was largely down the delay between triggering and actually starting. The shared queries seemed to be delayed from start, especially when using a scheduled refresh or power automate refresh.

  2. Thanks for doing this structured performance tests. But as of now we only see the difference from a none none loading perspective of the merge table. Shouldn’t you also run the last test not using PPU?

    • Sorry, I meant “not using Enhanced Compute Engine”. I.E. by splitting the last test into two dataflows with the load/merge into one DF and the three output tables into one DF, and this while not using PPU. This way the load/merge will be cached by you will not benefit from the enhanced compute engine

  3. Pingback: Improving Dataflow Performance in Power BI – Curated SQL

  4. Interesting post, thanks. I’m wondering whether if I upgrade to PPU I can use Premium dataflows in reports that are published in Pro workspaces?
    Also, did you repeat your Compute Engine tests more than once? I’ve just read that the first time is always slower but subsequent refreshes are much faster. “While using the Compute engine, there is an approximate 10% to 20% initial increase in time for data ingestion.This only applied to the first dataflow that is on the compute engine, and reads data from the data source. Subsequent dataflows that use the source dataflow will not incur the same penalty”
    (https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-features-limitations)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: