DirectQuery parallelism in Power BI

Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read the blog post here:

https://powerbi.microsoft.com/en-za/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

A few of us on the Power BI CAT team have tested this out with customers and seen some great results, so I thought I’d write a post illustrating the effect this optimisation can have and explaining when it can and can’t help.

For the purposes of this blog post I built a DirectQuery dataset (the source and the design of the dataset are irrelevant) using the New York Taxi data. I then created a table with five measures on columns; the precise definitions of the measures don’t matter much either, except for the fact that they each generate a single Storage Engine request and horizontal fusion cannot combine these requests in any way.

Next, I published the dataset to the Power BI Service, connected to it from Tabular Editor, set the compatibility level property to 1569 and saved the change:

Having done this the Max Parallelism Per Query property became visible, and I set it to 1:

1 is not the default value for this property but it does give the same behaviour as the default at the time of writing – it ensures that the Storage Engine requests for a single DAX query are always executed one at a time.

Next I connected DAX Studio up to the XMLA Endpoint for the workspace and ran the DAX query generated by the table visual above with Server Timings enabled. The new timeline view in DAX Studio 3.0.6 (see Marco’s video for more details) does a great job of showing when the Storage Engine queries – in this case the SQL queries sent to the data source – are made:

As you can see, in this case it’s clear that the six Storage Engine queries/SQL queries generated by this one DAX query are all executed in sequence. Also notice that the duration of the DAX query was 3 seconds.

I then changed the Max Parallelism Per Query property to 6 and reran the same query from DAX Studio. Here’s what the timeline looked like:

The Storage Engine queries/SQL queries now all run in parallel and the overall duration of the DAX query is just 0.9 seconds!

This is great, but as always there are a few things that need to be highlighted. First, Power BI may not be able to run all the Storage Engine queries in parallel if there are certain types of dependency in your DAX. In the example above there were no dependencies between the measures – none of them referenced any other measures, and they all used fairly simple expressions – but in the real world that might not be the case. To illustrate this I created a new measure with the following definition:

MEASURE WithDependencies =
    IF (
        CALCULATE ( [Distinct Medallions], ALL ( 'GEOGRAPHY'[STATE] ) ) > 0,
        [Trip Count],
        [Monday Trips]
    )

I then used it in a new table visual and captured the DAX query generated:

Here’s what the DAX Studio Server Timings showed for this query with Max Parallelism Per Query set to 6:

This time there is some parallelism but the first Storage Engine query/SQL query has to complete before the last two can be executed. I wrote this measure specifically to get this behaviour so you may be able to rewrite your DAX to get better parallelism, but it’s something to be aware of.

One last thing to mention: increasing parallelism here may lead to worse query performance overall if you end up running into bottlenecks elsewhere in your architecture (see here for an example). You may need to increase the “maximum number of connections per data source” property on your dataset to allow more queries to run in parallel, and if that results in additional load on the data source then you may need to scale up or scale out there too. Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.

13 thoughts on “DirectQuery Parallelisation In Power BI – Some Examples”

“Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.”

I think this part warrants its own blog post. Don’t remember where I heard it (maybe in Amir’s AMA?) but I remember someone saying that visuals are rendered in batches of five. So if you have ten visuals on the page five of them have to wait their turn until the queries for the other five have run. More visuals – more waiting in the “Other” timing bucket. My recommendation to our developers is always “Use fewer visuals” and they all put their fingers in the ears and sing “lalala”.

Would be great if you could cover what _really_ happens.

Chris Webb says:

March 19, 2023 at 9:33 pm

It used to be the case that visuals were rendered in batches (and I think it still does happen if something like Zscaler forces traffic via Http1.1 – see my blog post on that subject) but it changed about two years ago.

Loading...

Reply

Pingback: Parallelization in DirectQuery – Curated SQL

Hi Chris, love the post, as always.

Did Msft remove this option? I can’t find the setting now.

Thanks for the post. I’m finding I have to redo the Tabular Editor updates after a dataset republish–any way around this? Maybe I can at least write a script in the C# tab to do this? Thanks again!

jeffshieldsdev says:

September 29, 2023 at 9:57 pm

Oops, I did find a script to do this here: https://powerbi.microsoft.com/en-us/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/

Loading...

Reply

Pingback: Why I Love Tabular Editor - Greyskull Analytics

hmmm this just stopped working for me…Compatibility Level: 1569, Max Parallelism Per Query: 10, but I’m seeing a very much waterfall consecutive execution in DAX Studio Server Timings tab, and higher total SE compute.

I tried refreshing my PBIX in Power BI Desktop and republishing to Premium Capacity-backed Workspace–and then again setting Compatibility Level and Max Parallelism Per Query before rerunning the DAX query in DAX Studio.

Pingback: New limits for the "Maximum connections per data source" property in Power BI DirectQuery mode

Does this only work via the XMLA-endpoint (in other words only for premium capacities)?

Pingback: Measuring the total time spent querying sources in Power BI DirectQuery mode

Pingback: Performance implications of different ways of fully expanding a Power BI matrix visual

Pingback: Power BI, parallelism and dependencies between SQL queries in DirectQuery mode

lbendlin says:

March 19, 2023 at 7:14 pm

“Remember also that the DAX queries for a report page are, and always have been run in parallel so the Storage Engine queries for different DAX queries will also be run in parallel; you’ll need to watch out for report pages with a large number of visuals on them.”

I think this part warrants its own blog post. Don’t remember where I heard it (maybe in Amir’s AMA?) but I remember someone saying that visuals are rendered in batches of five. So if you have ten visuals on the page five of them have to wait their turn until the queries for the other five have run. More visuals – more waiting in the “Other” timing bucket. My recommendation to our developers is always “Use fewer visuals” and they all put their fingers in the ears and sing “lalala”.

Would be great if you could cover what _really_ happens.

Loading...

1. Chris Webb says:
  
  March 19, 2023 at 9:33 pm
  
  It used to be the case that visuals were rendered in batches (and I think it still does happen if something like Zscaler forces traffic via Http1.1 – see my blog post on that subject) but it changed about two years ago.
  
  Loading...
  
Pingback: Parallelization in DirectQuery – Curated SQL
Fred says:

April 5, 2023 at 7:37 pm

Hi Chris, love the post, as always.

Did Msft remove this option? I can’t find the setting now.

Loading...

jeffshieldsdev says:

September 20, 2023 at 4:29 pm

Thanks for the post. I’m finding I have to redo the Tabular Editor updates after a dataset republish–any way around this? Maybe I can at least write a script in the C# tab to do this? Thanks again!

Loading...

1. jeffshieldsdev says:
  
  September 29, 2023 at 9:57 pm
  
  Oops, I did find a script to do this here: https://powerbi.microsoft.com/en-us/blog/query-parallelization-helps-to-boost-power-bi-dataset-performance-in-directquery-mode/
  
  Loading...
  
Pingback: Why I Love Tabular Editor - Greyskull Analytics
jeffshieldsdev says:

October 25, 2023 at 8:17 pm

hmmm this just stopped working for me…Compatibility Level: 1569, Max Parallelism Per Query: 10, but I’m seeing a very much waterfall consecutive execution in DAX Studio Server Timings tab, and higher total SE compute.

I tried refreshing my PBIX in Power BI Desktop and republishing to Premium Capacity-backed Workspace–and then again setting Compatibility Level and Max Parallelism Per Query before rerunning the DAX query in DAX Studio.

Loading...

Pingback: New limits for the "Maximum connections per data source" property in Power BI DirectQuery mode
Wouter De Raeve says:

January 22, 2024 at 10:05 pm

Does this only work via the XMLA-endpoint (in other words only for premium capacities)?

Loading...

Pingback: Measuring the total time spent querying sources in Power BI DirectQuery mode
Pingback: Performance implications of different ways of fully expanding a Power BI matrix visual
Pingback: Power BI, parallelism and dependencies between SQL queries in DirectQuery mode

DirectQuery Parallelisation In Power BI – Some Examples

Like this:

Published by Chris Webb

13 thoughts on “DirectQuery Parallelisation In Power BI – Some Examples”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Chris Webb

13 thoughts on “DirectQuery Parallelisation In Power BI – Some Examples”

Leave a ReplyCancel reply

Discover more from Chris Webb's BI Blog