Diagnosing Power BI DirectQuery performance problems caused by SQL queries that return large resultsets

One very common cause of Power BI performance problems is having a table with a large number of rows on your report. It’s a problem I wrote about here, and while I used an Import mode for my example in that post I also mentioned that this can be an even bigger problem in DirectQuery mode: while the DAX query for the table visual might have a TOPN filter that asks for 502 rows, the query going back to the DirectQuery data source (usually a SQL query) may not have that filter applied and could return a much larger number of rows, which could then take a long time for Power BI to read. I wrote about this in more detail here and showed how you can diagnose the problem in Performance Analyzer by looking at the Execute DirectQuery event and ActualQueryDuration, RowsRead and DataReadDuration metrics. But now I have a custom visual to display Performance Analyzer export data, what does this look like? Also, what can Execution Metrics tell us?

Using a simple DirectQuery model built from the ContosoDW SQL Server sample:

…I built a report with a table visual whose DAX query triggered a single SQL query that returned 475038 rows:

Here’s what a Profiler trace that included the Execution Metrics event showed me:

Some things to notice here:

The DAX query takes 3.5 seconds, as seen in the Duration column for the Query End event and the durationMs Execution Metric
The DirectQuery End event has a Duration of 2 seconds, leaving a gap of 1.5 seconds that needs to be explained
This Duration of 2 seconds for the DirectQuery End event matches to the externalQueryExecutionTimeMs Exection Metric, which is 2.054 seconds, but the docs only say that this is the “Total time spent on executing all external datasource queries during the request” which is a bit vague
The actual explanation for the gap comes from the directQueryIterationTimeMs Execution Metric which is 1.1 seconds, although this is still 0.4 seconds short of the 1.5 second gap mentioned above
The directQueryTotalRows Execution Metric shows that 475038 rows were returned by the SQL query
Execution Metrics provide an aggregated summary of metrics at the DAX query level; in this case there is only one SQL query generated but if (as is often the case) there was more than one, it would be hard to know what each SQL query was contributing to the problem

Here’s what my custom visual showed with data from Performance Analyzer for the same DAX query:

Now here’s the same visual with the tooltip from the Execute Direct Query event which shows some of the metrics associated with that event, shown:

This shows something very similar to what the Execution Metrics event in Profiler showed:

The Execute DAX Query event has a duration of 3.5 seconds
The Execute Direct Query event has a duration of 2.1 seconds – meaning that again there is a gap to be explained, a gap where no activity seems to be taking place in the visual (clearly visible in the first of the two screenshots immediately above – the tooltip obscures this gap)
As mentioned in my older blog post, this gap is explained by the DataReadDuration metric (documented here) from the Execute Direct Query event – which, as shown in the tooltip in the screenshot above, is 1.1 seconds
The amount of time it takes to read all the data from a large resultset can only be measured from the client (ie Power BI) side – a lot of customers I work with measure SQL query performance on the server side and see fast queries, but a fast query that returns a large number of rows that all need to be sent to Power BI can of course be slow
The ActualQueryDuration metric, also shown in the tooltip, gives you the amount of time it took to get the first row back from the SQL query
Unlike the Execution Metrics Profiler event, this DataReadDuration metric is available for each SQL query generated by a DAX query, which means you can tell exactly which SQL query/queries are causing problems

What can we learn from this? Apart from the fact that table visuals with vertical scrollbars can be a really bad thing, any time you have a DirectQuery model that generates SQL queries that return a very large number of rows, you could be paying a very heavy price to read all those rows – especially if you are getting close to the Max Intermediate Row Set Count limit, which is set to 1 million rows by default. What can you do about this? Apart from redesigning your report, I blogged about a technique here where aggregations can help for scenarios involving degenerate dimensions; using the new calendar-based time intelligence feature can also help to reduce the number of rows returned by SQL queries, as I described here. In general you’ll have to try to tune the DAX in your measures and your model to see what you can do to optimise the SQL queries Power BI generates so they return fewer rows.

Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets

Like this:

Published by Chris Webb

2 thoughts on “Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Chris Webb

2 thoughts on “Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets”

Leave a ReplyCancel reply

Discover more from Chris Webb's BI Blog